Abstract
RNA structural switches are key regulators of gene expression in bacteria, but their characterization in Metazoa remains limited. Here, we present SwitchSeeker, a comprehensive computational and experimental approach for systematic identification of functional RNA structural switches. We applied SwitchSeeker to the human transcriptome and identified 245 putative RNA switches. To validate our approach, we characterized a previously unknown RNA switch in the 3ʹ untranslated region of the RORC (RAR-related orphan receptor C) transcript. In vivo dimethyl sulfate (DMS) mutational profiling with sequencing (DMS-MaPseq), coupled with cryogenic electron microscopy, confirmed its existence as two alternative structural conformations. Furthermore, we used genome-scale CRISPR screens to identify trans factors that regulate gene expression through this RNA structural switch. We found that nonsense-mediated messenger RNA decay acts on this element in a conformation-specific manner. SwitchSeeker provides an unbiased, experimentally driven method for discovering RNA structural switches that shape the eukaryotic gene expression landscape.
Subject terms: Riboswitches, Computational platforms and environments, Systems biology
SwitchSeeker combines computational and experimental techniques to identify functional RNA structural switches. Applied to the human transcriptome, it identified a novel RNA switch in the 3ʹUTR of RORC, linked to nonsense-mediated decay.
Main
Gene expression is regulated at the RNA level in all kingdoms of life. Some of the oldest groups of RNA-based regulatory mechanisms are ribozymes (catalytically active RNA molecules) and RNA structural switches (elements that adopt two mutually exclusive conformations, each leading to different gene-regulatory outcomes)1–3. In bacteria, a subset of RNA switches, termed riboswitches, control gene expression by binding small molecule ligands that induce RNA conformational changes4,5. The discovery of RNA switches in eukaryotes, however, has been more challenging. While a number of thiamine pyrophosphate-sensing riboswitches have been identified in plants and fungi6, only two human RNA switches are known: the protein-dependent RNA switch in vascular endothelial growth factor-A (VEGFA), and m6A modification-based switches7,8. Therefore, the overall impact of RNA switches on gene expression in higher eukaryotes remains unclear, despite their ubiquity in other domains of life. Here, we introduce SwitchSeeker, a systematic computational and experimental framework for unbiased discovery of RNA structural switches in any transcriptome.
While several RNA switch detection software packages have been developed, most identify new switch sequences based on their homology to one of the 40 known RNA switch families9. The small minority of tools enabling de novo prediction of RNA switches lack experimental verification of RNA structure and function10,11. Therefore, there is an unmet need for scalable methods of detecting eukaryotic RNA switches and assessing the extent to which they carry out regulatory functions in gene expression control. The approach we introduce here relies on integrating multiple computational and experimental methods: RNA switches are first predicted in silico, then structurally and functionally characterized in vivo, which in turn informs the next iteration of in silico predictions. First, we developed a computational model called SwitchFinder for de novo RNA switch detection, and showed that it identifies RNA switches from novel families with higher accuracy than existing models. Combining SwitchFinder with a set of high-throughput experimental techniques, we set up an end-to-end iterative predict-and-validate platform that we term SwitchSeeker. We applied SwitchSeeker to the human transcriptome to identify putative RNA switches, which we then characterized structurally and functionally using massively parallel assays in vivo. By iteratively improving the SwitchFinder predictions with experimental data, we ultimately report 245 high-confidence and functional RNA structural switches.
Finally, we selected the top scoring switch, located in the 3ʹ untranslated region (3ʹUTR) of the RORC (RAR-related orphan receptor C) transcript, for further analysis. We used dimethyl sulfate (DMS) mutational profiling with sequencing (DMS-MaPseq) structural probing and single-particle cryogenic electron microscopy (cryo-EM) to confirm that the predicted switch populates alternate molecular conformations. We then performed genome-scale CRISPR-interference (CRISPRi) screens, which showed that one of the two conformations reduces RORC gene expression through activation of the noncanonical nonsense-mediated decay (NMD) pathway. Taken together, our framework provides new insights into the role of RNA structural switches in shaping the human transcriptome, and outlines a broader approach for future comprehensive characterization of RNA switches regulating eukaryotic gene expression across cell types and organisms.
Results
Systematic annotation of human RNA structural switches
We define RNA structural switches as regulatory RNA elements that affect the expression of the host RNA molecule through conformational shifts. To discover new eukaryotic RNA switch families, we devised an approach called SwitchFinder that, unlike most existing methods12–17, does not depend on known sequence motifs. Instead, SwitchFinder uses the RNA sequence to generate an ensemble of secondary structures and their corresponding energy landscape using a Boltzmann equilibrium probability distribution18. It prioritizes the sequences that show RNA switch-like features, such as having two local minima in close proximity with a relatively small barrier in between (Fig. 1a and Extended Data Fig. 1a,b). This approach ensures that RNA switches are identified in a generalizable and family-agnostic way, which we validated by demonstrating its high performance on held-out Rfam families (Fig. 1b and Extended Data Fig. 1c). We compared the performance of SwitchFinder to SwiSpot, the state-of-the-art method for family-agnostic riboswitch prediction10, and observed a performance improvement of 44% on average across all RNA switch families except cyclic di-GMP-II (Fig. 1c). By relying on biophysical features of the folding energy landscape, SwitchFinder captures a wider variety of RNA switches compared with the existing methods.
To confirm that SwitchFinder is not overly tailored to bacterial riboswitches, we tested it on eukaryotic and synthetic riboswitches, including those sensing theophylline19 and specific RNA-binding proteins20. Additionally, we applied SwitchFinder to ribosomal RNAs to ensure its ability to distinguish RNA switches from nonswitching but highly structured RNAs. This analysis showed that SwitchFinder could distinguish true riboswitches from shuffled controls much more effectively than it could do so with ribosomal RNAs, and that it performed even better on eukaryotic and synthetic riboswitches than it did on bacterial riboswitches (Fig. 1d). Altogether, these benchmarking results gave us high confidence that SwitchFinder could nominate new eukaryotic RNA switches that would expand our understanding of RNA structural switching in gene regulation.
Discovery of RNA switches with regulatory function in the human transcriptome
Messenger RNA secondary structure in the cell is highly dynamic21–23 and compartment dependent24; therefore, we reasoned that the SwitchFinder predictions may be greatly improved with experimental measurements of RNA secondary structure from living cells. To counteract the limitations of in silico RNA folding predictions in complex eukaryotic transcriptomes25, we enhanced SwitchFinder by allowing the incorporation of in vivo RNA secondary structure probing data to refine the model’s energy terms, resulting in an iterative cycle of computational prediction and experimental validation that we name SwitchSeeker. First, we applied the SwitchFinder model using naive in silico folding to the entirety of the 3ʹUTRs of the human transcriptome, and chose the 3,750 top candidate switches (of length ≤186 nucleotides) as putative switch elements. To identify the RNA switches that are both functional and structurally bi-stable in the cell, we independently performed two high-throughput in vivo screens: a ‘structure screen’ that differentiates RNAs that exist as an ensemble of two mutually exclusive conformations from those that exist only in a single conformation, and a ‘functional screen’ that measures the effect of candidate RNA switches on the expression of a reporter gene.
For the structure screen, we performed an in vivo DMS-MaPseq assay on HEK293 cells expressing a library of the 3,750 candidate RNA switches in a reporter gene context to identify bi-stable RNA structures in the initial pool of 3,750 candidate switches (Extended Data Fig. 2b,c)26,27. The accessibility of a single nucleotide in the DMS-MaPseq data is measured as a population average of multiple RNA molecules that represent different minima in the Gibbs free energy landscape. If one conformation dominates the landscape, it dominates the DMS-MaPseq reactivity profile; however, if multiple conformations coexist, they all contribute to the reactivity profile28,29. SwitchSeeker exploits this distinction in nucleotide accessibility to find RNA switches that coexist in a balanced state between two conformations in vivo.
For the functional screen, we implemented a massively parallel reporter assay (MPRA)30 to functionally interrogate RNA switches in HEK293 cells. We cloned the library of 3,750 candidate RNA switch sequences or cognate scrambled control sequences into a dual enhanced green fluorescent protein (eGFP)–mCherry fluorescent reporter, directly downstream of the eGFP open reading frame (ORF; Extended Data Fig. 2d). This enabled us to use eGFP fluorescence to measure the effect of candidate RNA switches on gene expression while using the unaffected mCherry fluorescence as an endogenous control. We transduced HEK293 cells with this synthetic library and sequenced DNA and RNA derived from eight bins of cells sorted by flow cytometry according to their eGFP : mCherry expression ratio (Extended Data Fig. 2e, see Methods). Of the candidate RNA switches tested, 536 (14%) caused significant downregulation of eGFP relative to their scrambled control, and 538 (14%) showed a significant upregulation (Fig. 2b). While our study focused on characterizing the RNA switches that act in the context of 3ʹUTRs, the SwitchSeeker framework can be readily applied to the study of other types of RNA switches with the use of appropriate reporter constructs.
In the second iteration of SwitchSeeker, guided by in vivo RNA structure data, we refined our predictions, eliminating false positives and focusing on switches with consistent structural configurations in vivo. Comparing outcomes of this iteration with the first iteration, we found a significant increase in the proportion of regulatory active switches (P = 1 × 10−6, Extended Data Fig. 2f), validating the enhanced accuracy through in vivo data integration. This process prioritized 1,454 putative RNA switches that occupy two alternative conformational minima and are regulatory active in vivo.
Having identified a large set of candidate RNA switches that affect gene expression, we aimed to assess the degree to which the two stable conformations show divergent regulatory function. For this, we extended our MPRA to include targeted mutations designed to shift the equilibrium between the two conformations of each candidate RNA switch. This was achieved by either disrupting or strengthening conformation-specific stem loops by introducing either individual mutations or reciprocal mutation pairs (Fig. 2c). This additional screen enabled us to identify bona fide RNA switches with strong conformation-dependent activity. We found 245 RNA switches that differentially regulated reporter gene expression when locked in a specific structural conformation. An example candidate switch (located in the 3ʹUTR of TCF7 (transcription factor 7)) is shown in Fig. 2d: the TCF7 RNA switch landscape has two local minima, corresponding to two alternative conformations supported by in vivo DMS-MaPseq data (Fig. 2d, bottom). Two mutations in different parts of the switch sequence that favor conformation 1 resulted in lower expression of the eGFP reporter (top). Conversely, two mutations that favor conformation 2 increased eGFP expression. This observation indicates that the two conformations of the TCF7 RNA switch elicit divergent regulatory functions.
A bi-stable RNA switch in the 3ʹUTR of RORC
To demonstrate the validity of SwitchSeeker’s predictions, we aimed to biochemically characterize one of the identified RNA switches. We selected the switch that had the most pronounced difference in regulatory functions between its two conformations: a 186 nucleotide element located in the 3ʹUTR of the RORC mRNA. Based on the predicted secondary structures, we designated the three regions involved in the base pairing as ‘Box 1’ (61–69 nucleotides), ‘Box 2’ (73–81 nucleotides) and ‘Box 3’ (116–123 nucleotides). Our data indicate that Box 1 can form base pairs either with Box 2 or with Box 3, resulting in two mutually exclusive conformations that each exert distinct effects on gene expression (Fig. 3a). To confirm that the RORC RNA switch exists as an ensemble of two stable conformations, we designed mutation–rescue pairs of sequences that first shift the equilibrium towards one conformation (mutation), and then shift it towards the other conformation (rescue) (Fig. 3b and Supplementary Data Files), and used in vitro RNA SHAPE (selective 2ʹ-hydroxyl acylation analyzed by primer extension)31 to monitor the resultant RNA structures. We found that mutating Box 3 (117-AC) reduced the reactivity of the Box 2 region (Fig. 3c), supporting the idea that Box 1 would switch its contacts from Box 3 to Box 2, thereby stabilizing conformation 2. Introducing the rescue mutation (65-GT,117-AC) into Box 1 restored the original reactivity profile of the element. Complementary experiments using the mutation (77-GA) to stabilize conformation 1, and the rescue mutation (63-TC,77-GA) to stabilize conformation 2, had a similar outcome. Even though we did not observe a substantial decrease in reactivity of Box 3 upon the 77-GA mutation, the rescue significantly increased its reactivity (Extended Data Fig. 3a,b). These findings support the role of the three highlighted regions in forming an ensemble of states in which Box 2 and Box 3 compete for base pairing to Box 1.
To extend our in vitro observations to living cells, we performed high-coverage DMS-MaPseq of the RORC switch in vivo in the reporter context (Extended Data Fig. 3c). Using a DMS concentration sufficient to cause multiple modifications to the same RNA molecule, we implemented the DRACO computational approach28, which identified two distinct clusters in both biological replicates, representing the two conformations, at relative proportions of 27% to 73% (Fig. 3d and Extended Data Fig. 3e). The profiles of these clusters were distinct (P = 0.18 and P = 0.72 in replicates 1 and 2, respectively) but showed high correlation within each cluster across replicates (Extended Data Fig. 3d). To ascertain whether sequence mutations similarly influence the conformational equilibrium in vivo, we conducted DMS-MaPseq on the two rescue mutant sequences (Extended Data Fig. 3f). This analysis corroborated our SHAPE findings: the (63-TC,77-GA) mutation stabilized conformation 2, while the (65-GT,117-AC) mutation favored conformation 1. The alignment of in vitro SHAPE and in vivo DMS-MaPseq results reinforces the notion that the RORC switch consistently exhibits its conformational dynamics across both experimental settings.
To determine whether the RORC element functions as a dynamic RNA switch or simply represents a static equilibrium of two conformations, we investigated whether the proportions of its alternative conformations change inside cells. To this end, we introduced a reporter containing the RORC sequence into five cell lines representing diverse genetic backgrounds: LNCaP (prostate), MCF-7 (breast), HepG2 (liver), ZR-75-1 (breast), 293T (kidney) and LS174T (colon). Using DMS-MaPseq, we assessed the conformational dynamics of the RORC switch in these cell lines. Our findings confirm not only that the relative proportions of the two conformations vary among these cell lines but they also demonstrate a strong anticorrelation in the accessibility of Boxes 2 and 3 (R = −0.75) (Fig. 3e). This anticorrelation supports the hypothesis of their competitive base pairing with Box 1, further suggesting dynamic switching behavior.
To extend our analysis from the reporter to the endogenous context, we performed DMS-MaPseq targeting the endogenous RORC mRNA across the same five cell lines. This approach yielded similar observations: a strong anticorrelation in accessibility (R = −0.81, Fig. 3f) and variability in the relative proportions of the two conformations. Importantly, the conformational ratios across cell lines were highly correlated between the reporter and endogenous contexts (R = 0.93, Extended Data Fig. 3g), demonstrating the high relevance of the reporter screening approach to understanding the behavior of RNA switches in the context of their endogenous mRNA. These data strongly support the hypothesis that the RORC element functions as an RNA switch, adopting two alternative conformations, the balance of which is influenced by the cellular landscape.
Finally, we used single-particle cryo-EM to investigate the tertiary structures of the two RORC RNA switch conformations that we identified using SHAPE and DMS-MaPseq. Micrographs of the reference RORC RNA switch contain a mixture of compact and extended particles, with features suggestive of RNA secondary structure (Fig. 4a and Extended Data Fig. 4a–c), including elongated tertiary features consistent with A-form helices, as well as bends and junctions consistent with complex RNA folding (Extended Data Fig. 4d–f). Strikingly, particles of the conformation 1 mutant (77-GA) appear more extended, while those of the conformation 2 mutant (117-AC) are mostly compact (Fig. 4a). Cryo-EM image processing shows that reference RORC RNA can be classified into three structural classes (Classes A, B, and C), with the Class B structure absent in the (77-GA) mutant and Class A absent in the (117-AC) mutant (Fig. 4b). This analysis suggests that Class A can be assigned to the more extended conformation 1, and Class B to the compact conformation 2 (Fig. 4b). We propose that Class C, which is present in all three datasets, represents a folding intermediate lacking the tertiary interactions made by either Boxes 2 or 3. Although the extreme flexibility of the RNA limits the resolution of the reconstructions to approximately 10 Å (Extended Data Fig. 5g–i), it is sufficient for discrimination of these different RNA folds. These results confirm that the RORC RNA switch indeed adopts distinct tertiary structures in solution and that the designed mutations heavily bias toward one conformation or the other.
Alternative conformations of the RORC RNA switch play divergent roles in gene regulation
Having validated that the RORC RNA switch can adopt two stable conformations, we next explored the distinct regulatory activities of each conformation. We engineered HEK293 cell lines to express eGFP reporters carrying RORC switch variants in the 3ʹUTR and assessed eGFP expression changes using flow cytometry. To specifically lock the switch in each conformation, we implemented two parallel strategies: for conformation 1, one strategy involved mutating Box 2 to prevent its pairing with Box 1 (mutant ‘73-CCCTATGA’), and another introduced mutations into both Boxes 1 and 3 to disrupt their interaction with Box 2 (mutant ‘61-TATATAA,116-TTATATA’). Remarkably, both strategies, despite modifying different parts of the sequence, induced similar eGFP expression changes for each conformation: both mutants that stabilized conformation 1 increased reporter gene expression (Fig. 5a), while analogous strategies applied to stabilize conformation 2 decreased expression. We then investigated whether the modulation in gene expression was primarily influenced by the RNA’s secondary structure rather than its sequence composition. Using cell lines stably expressing mutants from our earlier rescue–mutation experiments (Fig. 3b), we evaluated the impact on eGFP expression. Across three tested mutation–rescue pairs, the mutants favoring conformation 2 consistently showed reduced eGFP expression compared with those favoring conformation 1 (Fig. 5b). These findings from the reciprocal mutation–rescue experiments underscore the pivotal role of RNA secondary structure in the specific regulatory functions of the RORC RNA switch.
The RORC gene encodes the nuclear receptor ROR-γ that plays a crucial role in T-helper (Th)17 cell differentiation, a key process in the immune response, which is also implicated in autoimmune diseases32,33. To explore the functional impact of the RORC RNA switch in Th17 cells, we introduced into primary human CD4+ T cells a reporter construct carrying the RORC RNA switch sequence in the eGFP 3ʹUTR. We then differentiated these cells into Th17 cells (Extended Data Fig. 6, ref. 34). Incorporating the native RORC RNA switch markedly reduced eGFP expression compared with a control with a scrambled sequence (Fig. 5c). Additionally, altering the switch’s conformation with a 77-GA mutation (towards conformation 1) weakened this repression, confirming the activity of the RORC RNA switch in Th17 cells.
Having demonstrated the distinct regulatory effects of the RORC RNA switch’s two conformations, we next asked whether their relative proportions in different cell types would result in differential regulation of the RORC transcript. To assess this, we compared the stability of the reporter mRNA containing the RORC switch between cell lines following inhibition of RNA polymerase II with α-amanitin. We discovered a strong correlation between the conformational ratio and reporter mRNA stability, indicating that higher proportions of conformation 1 resulted in higher stability, whereas higher proportions of conformation 2 resulted in lower stability (R = 0.85, P = 0.03, Fig. 5d). We extended this analysis to the endogenous RORC mRNA, where we observed a similar strong correlation (R = 0.96, P = 0.004, Fig. 5e).
Next, we investigated whether, instead of sequence mutations, trans-acting agents such as antisense oligonucleotides (ASOs) complementary to parts of the RNA switch sequence could shift the equilibrium between the two conformations and thereby influence gene expression35. We designed two ASOs to target the Box 2 region, aiming to shift the equilibrium towards conformation 1, which we would expect to increase the levels of RORC mRNA expression. We transfected three cell lines, representing different conformational ratios (LNCaP, MCF-7 and LS174T), with these ASOs carrying either 2ʹ-O-(2-methoxyethyl) (2-MOE) oligoribonucleotides or locked nucleic acids. In both cases, ASO treatment led to a significant increase in RORC mRNA levels compared with nontargeting control ASO (Fig. 5f). Notably, this effect was more pronounced in cell lines with a higher proportion of conformation 2 (LNCaP, P = 0.006; MCF-7, P = 0.005) compared with those with a lower proportion (LS174T, P = 0.71). Together, these data further underscore the link between structural conformation and resultant gene expression, solidifying the role of the RORC element as a regulatory switch in its native gene context.
Genome-scale genetic screens reveal molecular mechanisms underlying the RORC RNA switch
To investigate how the RORC RNA switch influences gene expression at the molecular level, we performed genome-wide CRISPRi screens in Jurkat T cells expressing one of two eGFP reporter constructs: one with the native RORC switch and another with the 77-GA mutation that favors conformation 1 (Extended Data Fig. 7a). These screens were intended to identify gene products, the depletion of which altered RORC RNA switch-mediated control of reporter gene expression, indicating their functional connection to the RNA switch mechanism36. We focused on identifying two gene groups: those essential for repression induced by the RORC switch (as indicated by an increase in reporter gene expression), and those affecting the conformational dynamics of the switch (as indicated by a change in the ratio of reporter expression between the native switch and the 77-GA mutant).
To identify factors influencing the RORC RNA switch’s repressive function, we analyzed the abundance of single-guide RNAs in cells with high versus low reporter gene expression in both screens. This analysis highlighted the NMD pathway, with top hits including core NMD factors such as SMG8, UPF1, UPF2 and UPF3B (Fig. 6a). Pathways associated with general gene expression, including ribosome biogenesis and endoplasmic reticulum stress, were also notable (Extended Data Fig. 7b). To pinpoint factors affecting the divergent activities of the switch’s two conformations, we compared the distribution of sgRNAs across the high and low reporter expression bins between cells expressing the native switch and the 77-GA mutant. This comparison reinforced the central role of the NMD pathway (Fig. 6b), given that the knockdown of NMD components lessened the reporter expression difference between the native and mutant switch. Surprisingly, while knockdowns of SURF complex (that is, SMG1–UPF1-eRF1–eRF3; the complex that initiates NMD on stalled ribosomes37) components produced strong effects, the exon–junction complex (EJC) components did not produce significant changes in either screen, suggesting that the RORC RNA switch operates via a noncanonical EJC-independent NMD pathway38,39. Moreover, our findings suggest that the NMD pathway acts preferentially on conformation 2 of the RORC RNA switch, as evidenced by the stronger increase in expression of the 77-GA mutant compared with the native RORC sequence.
To confirm these results, we applied CRISPRi to individually knock down NMD factors in cells expressing the reference switch, the 77-GA mutant, or a scrambled sequence. Knockdowns of SURF complex members, but not EJC components, significantly affected the switch’s repressive function, confirming our genome-wide screen results (Fig. 6c,d). Furthermore, reducing SURF complex expression also diminished the expression difference between the reference and 77-GA mutant, primarily by increasing reporter expression in the mutant (Extended Data Fig. 7d). This evidence indicates that NMD predominantly acts on conformation 2 of the RORC RNA switch.
Given its affinity for structured RNAs40, we reasoned that UPF1 might bind the two RORC RNA switch conformations with different affinities. To test this, we mixed together the reference and the Box 2 mutant (77-GA) reporter lines at a 1:1 ratio and measured UPF1 binding using CLIP-qPCR (cross-linking and immunoprecipitation followed by qPCR). The reference RORC UTR sequence (containing a mixture of conformations 1 and 2) had significantly stronger binding to UPF1 than its 77-GA mutant that could form only conformation 1 (Fig. 6e). Similarly, we observed a strong preference for UPF1 to bind to a mutant 116-CCCTAAG that favors conformation 2 than to the 77-GA mutant, and this effect was even more pronounced than the difference between reference and 77-GA (logarithm of fold change of 1.12 versus 0.41). Together, these results underscore the preference of UPF1 to bind to conformation 2 of the RORC switch (Extended Data Fig. 7e).
We reasoned that conformation-specific NMD would deplete mRNA molecules with conformation 1, thereby resulting in a relative increase in the proportion of conformation 2. To test this, we used NMDI14, a molecule that disrupts SMG7–UPF1 interactions, to inhibit NMD41. Assessing the accessibility of Boxes 2 and 3 in endogenous RORC mRNA using DMS-MaPseq, we found a significant decrease in the accessibility of Box 2 upon NMD inhibition (P = 0.03, Fig. 6f), indicative of a shift towards conformation 2, possibly due to slower decay and accumulation of mRNAs in this conformation. Hence, inhibiting NMD led to a shift in the relative proportions of the two conformations.
Having demonstrated the conformation-specific effect of NMD on the RORC switch in the reporter context, we sought to extend our analysis to the endogenous RORC mRNA. We knocked down UPF1 in various cell lines and assessed the levels of endogenous RORC mRNA using quantitative polymerase chain reaction with reverse transcription. UPF1 knockdown in various cell lines led to a substantial increase in RORC mRNA expression, notably more pronounced in cell lines with a higher prevalence of conformation 2 (LNCaP, P = 0.005; MCF-7, P = 0.02) compared with those with a lower prevalence (LS174T, P = 0.09) (Fig. 6g). This result emphasizes the role of UPF1 in regulating endogenous RORC mRNA stability in a conformation-dependent manner.
Considering the NMD pathway’s role in directing proteins translated from aberrant mRNA to proteasomal degradation42, we reasoned that the RORC RNA switch might similarly target its gene product. To test this, we treated reporter cells with the proteasome inhibitors carfilzomib and bortezomib, each acting through different mechanisms. Proteasome inhibition resulted in a significantly greater increase in eGFP expression in cells expressing the RORC switch compared with the control (Fig. 6h,i), indicating that NMD-induced proteasomal degradation of the switch-containing gene product contributes to the observed effect on gene expression.
We propose that UPF1 preferentially recognizes switch conformation 2 over conformation 1, and that the recruitment of the SURF complex by UPF1 consequently leads to decreased gene expression through proteasome-mediated degradation of translation products and mRNA decay, preventing repeated rounds of translation (Fig. 7b). Moreover, sequence mutations that influence the conformational equilibrium not only alter the RNA’s energy landscape but also modulate SURF recruitment and RNA stability, reflecting the nuanced control of gene repression by the switch. The mechanisms underlying the switching between conformations, however, remain an area for further investigation.
Collectively, we show that the RORC RNA switch influences gene expression through conformation-specific engagement of NMD factors that lead to control of mRNA and protein stability. Importantly, the RORC switch is only one example out of 245 functionally validated human RNA switches identified in this work, emphasizing the power of our SwitchSeeker approach to illuminate new areas of eukaryotic RNA biology.
Discussion
Historically, RNA switches were identified primarily through biochemical experimentation, measuring direct ligand interactions43,44, and comparative genomics to identify conserved noncoding regions that act as cis-regulatory elements in bacteria45,46. These methods, however, present challenges in eukaryotic contexts due to the dynamic nature of mRNA structures and the complexity of eukaryotic gene regulation22,24. Additionally, the vast genomic landscape and low sequence conservation in eukaryotes complicate the direct application of these approaches47–49. While numerous tools and algorithms exist for riboswitch prediction (reviewed in refs. 50,51), few of those focus on de novo discovery that is family-agnostic. The exceptions include SwiSpot10, which focuses on identifying the putative switching sequence, and the conditional probability-based method52. None of these algorithms has been shown to predict functional RNA switches from novel families in eukaryotic genomes. Addressing these challenges, SwitchSeeker integrates biochemistry, systems biology and functional genomics to create a comprehensive platform for RNA switch discovery and characterization in eukaryotes. By covering the entire discovery process, from de novo predictions to the annotation of mechanisms, SwitchSeeker overcomes the limitations of existing methods. Looking forward, its capability to scale across complete transcriptomes sets the stage for a thorough characterization of RNA switches across diverse cell types and organisms, enhancing our understanding of their roles across the tree of life.
Advancements in genomic technologies such as RNA secondary structure probing (DMS-seq, SHAPE-seq) and single-particle cryo-EM have been instrumental in our systematic exploration of RNA switches, enabling us to delve into the diverse conformations of RNA molecules and their three-dimensional structures despite challenges such as size and flexibility28,29,53. This has opened up opportunities to study the functional differences between alternative RNA conformations and their role in gene expression control. Our DMS-MaPseq and cryo-EM data suggest that the RORC 3ʹ mRNA element inhabits a shallow energy landscape with two rugged minima linked to two major molecular conformations (Fig. 7a), thereby validating the SwitchSeeker approach to identifying RNA molecules with bi-stable energy landscapes. Genome-wide CRISPRi screens identified the EJC-independent NMD pathway as a key mediator of the gene regulatory mechanism of the RORC switch. Together, our studies of the RORC switch not only uncover new regulatory biology but also provide a blueprint on how the SwitchSeeker pipeline can enable rapid functional and mechanistic characterization of new RNA switches.
RNA structure is known to influence gene expression in health and disease35, as shown by our recent identification of specific RNA structures that influence splicing in metastatic cancers54. However, dynamic RNA structures such as RNA switches are a relatively unexplored aspect of gene expression control in eukaryotes. Our observations indicate a prevalence of RNA switches in the human transcriptome, suggesting that RNA conformation-dependent gene regulation is a widespread phenomenon. In our study we chose stringent criteria for selecting RNA switches, requiring them to be bi-stable in vivo, meaning that they populate two mutually exclusive structural conformations. However, it is important to note that not all RNA switches may conform to this binary model; some, such as the HIV-1 TAR RNA, have transient but functional conformations55, and others might present multistability, adding layers to regulatory control. Modifications to the SwitchSeeker platform will be necessary to explore these distinct classes of RNA structural elements.
While SwitchSeeker offers a robust framework for identifying functional RNA structural switches, there are several caveats and limitations to consider. First, identifying RNA switches that operate under specific cellular conditions requires structure probing assays to be conducted in those exact conditions, which can be challenging and resource intensive. Additionally, SwitchSeeker does not identify ligands for RNA switches; this necessitates complementary approaches to uncover the specific molecules interacting with these RNA elements. Future technological advancements could significantly enhance the tool’s efficacy. Currently, the absence of high-quality RNA structure datasets across full transcriptomes limits the comprehensive application of SwitchSeeker. The development of such datasets would enable more efficient and accurate RNA switch identification. Moreover, integrating additional functional assays, such as those targeting RNA switches that influence splicing, could broaden the scope and impact of SwitchSeeker.
The known examples of human RNA switch mechanisms include mutually exclusive binding of RNA-binding proteins by two different RNA conformations8 and m6A modification-based switching7. In this study, we introduce a novel switch mechanism that operates via the NMD pathway, suggesting a vast potential for diverse metabolic pathways in RNA switch functionality. SwitchSeeker’s utility lies in its ability to identify and elucidate these mechanisms in high throughput, irrespective of their specific pathways. The modulation of gene expression through shifts in RNA conformation, as achieved with ASOs in this study, opens new possibilities for targeting RNA switches in future therapeutics. SwitchSeeker is available for use and adaptation, and we hope that it will pave the way for many new discoveries in RNA-based regulation in eukaryotes.
Methods
SwitchFinder: detailed description of the algorithm
Conflicting base pairs identification
Conflicting base pairs were detected using a modification of the MIBP algorithm developed by L. Lin and W. McKerrow59. First, a large number of folds (default N = 1,000) is sampled from the Boltzmann distribution. If structure probing data (such as DMS-seq or SHAPE-seq) is provided, the Boltzmann distribution modeling software (part of the RNAstructure package56) incorporates the data as a pseudofree energy change term. Then, the base pairs are filtered: the base pairs that are present in almost all of the folds or are absent from almost all of the folds are removed from the further analysis. Then, mutual information for each pair of base pairs is estimated. To do so, each base pair is represented as a binary vector of length N, where N is the number of folds considered; in this binary vector, a given fold is represented as 1 if this base pair is present there, or as 0 if it is not. Mutual information between each two base pairs is calculated as in ref. 60. This results in an M × M table of mutual information values, where M is the number of base pairs considered. Then, the sum of each row of the square table is calculated. In the resulting vector K of length M, each base pair is represented by a sum of mutual information values across all of the other base pairs. Then, only the base pairs for which the sum of mutual information values passes the threshold of U × MAX(K) are considered, where U is a parameter (default value 0.5). We call the base pairs that pass this threshold the ‘conflicting base pairs’.
Conflicting stems identifications
Once the conflicting base pairs are identified, they are assembled into conflicting stems, or series of conflicting base pairs that directly follow each other and therefore could potentially form a stem-like RNA structure. More specifically, the base pairs (a, b) and (c, d) form a stem if either (a == c − 1) and (b == d + 1), or (a == c + 1) and (b == d − 1). The stem is defined as a pair of intervals ((u, v), (x, y)), where v − u == y − x. Then, the conflicting stems are filtered by length: only the stems that are longer than a certain threshold value (default value: 3) are considered. Among these stems, the stems that directly conflict with each other are identified. Two stems ((u1, v1), (x1, y1)) and ((u2, v2), (x2, y2)) conflict with each other if there is an overlap longer than a threshold value between either (u1, v1) and (u2, v2), or (u1, v1) and (x2, y2), or (x1, y1) and (u2, v2), or (x1, y1) and (x2, y2). The default threshold value is 3. The pairs of conflicting stems are sorted by the average value of their K values (sums of mutual information). The highest scoring pair of conflicting stems is considered the winning prediction, representing the major switch between two of the local minima present in the energy folding landscape of the given sequence. If no pairs of conflicting stems pass the threshold, SwitchFinder reports that no potential switch is identified for the given sequence.
Identifying the two conflicting structures
Given the prediction of the two conflicting stems, the folds that represent the two local minima of the energy folding landscape are predicted. Importantly, SwitchFinder focuses on optimizing the prediction accuracy, as opposed to the commonly used approach of energy minimization61. The MaxExpect program from the RNAstructure package56 is used; the base pairings of each of the conflicting stems are provided as folding constraints (in Connectivity Table format). Furthermore, the two predicted structures are referred to as conformations 1 and 2.
Activation barrier estimation
The RNApathfinder software62 is used to estimate the activation energy needed for a transition between the conformations 1 and 2.
Classifier for prediction of RNA switches
The curated representative alignments for each of the 50 known riboswitch families were downloaded from the Rfam database9. Each sequence is complemented by its shuffled counterpart (while preserving dinucleotide frequencies63). For all of the sequences, the two conflicting conformations, their folding energies and their activation energies are predicted as above. To estimate the performance of SwitchFinder for a given riboswitch family, all of the sequences from this family are placed into the test set, while all of the sequences from the other families are placed into the training set. Then, a linear regression model is trained on the training set, in which the response variable is binary and indicates whether the sequence is a real riboswitch or is a shuffled counterpart, and the predictor variables are the average folding energy of the two conformations and the activation energy of the transition between them. The trained linear regression model is then run on the test set, and its performance is estimated using the receiver operating characteristic curve.
Prediction of RNA switches in human transcriptome
The coordinates of 3ʹUTRs of the human transcriptome were downloaded from UCSC Table Browser64, table tb_wgEncodeGencodeBasicV28lift37. The sequences of 3ʹUTRs were cut into overlapping fragments of 186 nucleotides in length (with overlaps of 93 nucleotides). For all of the sequences, the two conflicting conformations, their folding energies and their activation energies were predicted as above. A linear regression model was trained as described above on all 50 known riboswitch families. The model was applied to the 3ʹUTR fragments from the human genome, and the fragments were sorted according to the model prediction scores. The top 3,750 predictions were selected for further investigation.
Incorporation of in vivo probing data
In vivo probing data, such as DMS-MaPseq, is used to apply pseudoenergy restraints when sampling folds from the Boltzmann distribution (that is, using the –SHAPE parameter in RNAstructure package commands56). To test the hypothesis of whether the in vivo probing data support the presence of two conflicting conformations in a given sequence, the following workflow was used. First, the two conflicting folds were predicted with SwitchFinder using in silico folding only. Then, SwitchFinder was run on the same sequence with the inclusion of in vivo probing data. If the same two conflicting folds were predicted among the top conflicting folds, the probing data were considered supportive of the presence of the two predicted conformations.
Mutation generation
To shift the RNA conformation ensemble towards one or another state, mutations of two types were introduced.
‘Strengthen a stem’ mutations: given two conflicting stems ((u1, v1), (x1, y1)) and ((u2, v2), (x2, y2)), one of the stems (for example, the first one) was changed in a way that would preserve its base pairing but deny the possibility of forming the second stem. To do so, the nucleotides in the interval (u1, v1) were replaced with all possible sequences of equal length, and the nucleotides (x1, y1) were replaced with the reverse complement sequence. Then, the newly generated sequences were filtered by two predetermined criteria: (i) the second stem cannot form more than a fraction of its original base pairs (default value 0.6), and (ii) the modified first stem cannot form long paired stems with any region of the existing sequence (default threshold length 4). The sequences that passed both criteria were ranked by the introduced change in the sequence nucleotide composition; the mutations that changed the nucleotide composition the least were chosen for further analysis. Each mutated sequence was additionally analyzed by SwitchFinder to ensure that the Boltzmann distribution is heavily shifted towards the desired conformation.
‘Weaken a stem’ mutations: given two conflicting stems ((u1, v1), (x1, y1)) and ((u2, v2), (x2, y2)), one of the stems (for example, the second one) was changed in such a way that this stem would not be able to form base pairing, while the base pairing of the other stem (in this example, the first stem) would be preserved. To do so, the nucleotides in either of the intervals (u2, v2) or (x2, y2) were replaced with all possible sequences of equal length. The newly generated sequences were filtered by three predetermined criteria: (i) the first stem stays unchanged, (ii) the second stem cannot form more than a fraction of its original base pairs (default value 0.6), and (iii) the modified part of the sequence cannot form long paired stems with any region of the existing sequence (default threshold length 4). The sequences that passed all of the criteria were ranked by the introduced change in the sequence nucleotide composition: the mutations that changed the nucleotide composition the least were chosen for further analysis. Each mutated sequence was additionally analyzed using SwitchFinder to ensure that the Boltzmann distribution is heavily shifted towards the desired conformation.
Cell culture
All cells were cultured in a 37 °C 5% CO2 humidified incubator. The HEK293 cells (purchased from ATCC, cat. no. CRL-3216) were cultured in DMEM high-glucose medium supplemented with 10% FBS, l-glutamine (4 mM), sodium pyruvate (1 mM), penicillin (100 units ml−1), streptomycin (100 μg ml−1) and amphotericin B (1 μg ml−1) (Gibco). The Jurkat cell line (purchased from ATCC, cat. no. TIB-152) was cultured in RPMI-1640 medium supplemented with 10% FBS, glucose (2 g l−1), l-glutamine (2 mM), 25 mM HEPES, penicillin (100 units ml−1), streptomycin (100 μg ml−1) and amphotericin B (1 μg ml−1) (Gibco). All cell lines were routinely screened for mycoplasma with a PCR-based assay.
Cryo-electron microscopy
Sample preparation and data collection
A total of 3.5 µl target mRNA at an approximate concentration of 1.5 mg ml−1 was applied to gold, 300 mesh transmission electron microscopy grids with a holey carbon substrate of 1.2 µm and 1.3 µm spacing (Quantifoil). The grids were blotted with no. 4 filter papers (Whatman) and plunge frozen in liquid ethane using a Mark IV Vitrobot (Thermo Fisher), with blot times of 4–6 s, blot force of −2, at a temperature of 8 °C and 100% humidity. All grids were glow discharged in an easiGlo (Pelco) with rarefied air for 30 s at 15 mA, no more than 1 h prior to preparation. Duplicate wild-type and mutant RNA specimens were imaged under different conditions on several microscopes as per Data File S8; all were equipped with K3 direct electron detector (DED) cameras (Gatan), and all data collection was performed using SerialEM65. Detailed data collection parameters are listed in Data File S8.
Image processing
Dose-weighted and motion-corrected sums were generated from raw DED movies during data collection using University of California, San Francisco (UCSF) MotionCor266. Images from super-resolution datasets were downsampled to the physical pixel size before further processing. Estimation of the contrast transfer function (CTF) was performed in CTFFIND467, followed by neural net-based particle picking in EMAN268. Two-dimensional (2D) classification, ab initio three-dimensional (3D) classification, and gold-standard refinement were done in cryoSPARC69. CTFs were then re-estimated in cryoSPARC and particles repicked using low-resolution (20 Å) templates generated from chosen 3D classes. Extended datasets were pooled when appropriate, and particle processing was repeated through gold-standard refinement as before. All structure figures were created using UCSF ChimeraX (ref. 70). Further details are given in Data File S7 and Extended Data Fig. 5.
Reporter vector design and library cloning
First, mCherry-P2A-Puro fusion was cloned into the BTV arbovirus backbone (Addgene, cat. no. 84771). Then, the vector was digested with MluI-HF and PacI restriction enzymes (NEB), with the addition of Shrimp Alkaline Phosphatase (NEB). The digested vector was purified with the Zymo DNA Clean and Concentrator-5 kit.
DNA oligonucleotide libraries (one for functional screen and one for massively parallel mutagenesis analysis) consisting of 7,500 sequences in total were synthesized by Agilent. The second strand was synthesized using Klenow Fragment (3ʹ → 5ʹ exo-) (NEB). The double-stranded DNA library was digested with MluI-HF and PacI restriction enzymes (NEB) and run on a 6% TBE (Tris base, boric acid, EDTA) polyacrylamide gel. The band of the corresponding size was cut out and the gel was dissolved in the DNA extraction buffer (10 mM Tris, pH 8, 300 mM NaCl, 1 mM EDTA). The DNA was precipitated with isopropanol. The digested DNA library and the digested vector were ligated with T4 DNA ligase (NEB). The ligation reaction was precipitated with isopropanol and transformed into MegaX DH10B T1R electrocompetent cells (Thermo Fisher). The library was purified with ZymoPURE II Plasmid Maxiprep Kit (Zymo). The representation of individual sequences in the library was verified by sequencing the resulting library on an MiSeq instrument (Illumina).
Massively parallel reporter assay
The DNA library was co-transfected with pCMV-dR8.91 and pMD2.G plasmids using TransIT-Lenti (Mirus) into HEK293 cells, following the manufacturer’s protocol. Virus was collected 48 h after transfection and passed through a 0.45 µm filter. HEK293 cells were then transduced overnight with the filtered virus in the presence of 8 µg ml−1 polybrene (Millipore); the amount of virus used was optimized to ensure an infection rate of ~20%, as determined by flow cytometry The infected cells were selected with 2 µg ml−1 puromycin (Gibco). Cells were collected at 90–95% confluency for sorting and analysis on a BD FACSaria II sorter. The distribution of mCherry : GFP ratios was calculated. For sorting a library into subpopulations, we gated the population into eight bins each containing 12.5% of the total number of cells. A total of 1.2 million cells were collected for each bin to ensure sufficient representation of sequence in the population in two replicates each. For each subpopulation, we extracted genomic DNA and total RNA with the Quick-DNA/RNA Miniprep kit. gDNA was amplified by PCR with Phusion polymerase (NEB) using the primers CAAGCAGAAGACGGCATACGAGAT–i7– GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCACTGCTAGCTAGATGACTAAACGCG and AATGATACGGCGACCACCGAGATCTACAC–i5– ACACTCTTTCCCTACACGACGCTCTTCCGATCTGTGGTCTGGATCCACCGGTCC. Different i7 indexes were used for eight different bins, and different i5 indexes were used for the two replicates. RNA was reverse transcribed with Maxima H Minus Reverse Transcriptase (Thermo Fisher) using primer CTCTTTCCCTACACGACGCTCTTCCGATCTNNNNNNNNNNNTGGTCTGGATCCACCGGTCCGG. The complementary DNA was amplified with Q5 polymerase (NEB) using primers CAAGCAGAAGACGGCATACGAGAT–i7–GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCCTGCTAGCTAGATGACTAAACGC and CAAGCAGAAGACGGCATACGAGAT–i5–GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTTACCCGTCATTGGCTGTCCA. Different i7 indexes were used for eight different bins, and different i5 indexes were used for the two replicates. The amplified DNA libraries were size purified with the Select-a-Size DNA Clean and Concentrator MagBead Kit (Zymo). Deep sequencing was performed using the HiSeq4000 platform (Illumina) at the UCSF Center for Advanced Technologies.
The adapter sequences were removed using cutadapt71. For RNA libraries, the unique molecular identifier (UMI) was then removed from the reads and appended to read names using UMI tools72. The reads were matched to the fragments using the bwa mem command. The reads were counted using featureCounts73. The read counts were normalized using median of ratios normalization74. The one-way chi-squared test was used to estimate how different its distribution across the sorting bins is from the null hypothesis (that is uniform distribution). mRNA stability was estimated by comparing the RNA and DNA read counts with MPRAnalyze75.
Massively parallel mutagenesis analysis
Library design and measurement
For each candidate switch, two alternative conformations were identified using SwitchFinder. Each conformation is defined by a stem structure: ((u1, v1), (x1, y1)) and ((u2, v2), (x2, y2)), representing two conflicting stems. The SwitchFinder mutation generation algorithm was used to design four mutations in the candidate switch sequence: A, ‘strengthen a stem’ mutation favoring conformation 1: the regions (u1, v1) and (x1, y1) are altered while preserving complementarity; B, ‘weaken a stem’ mutation favoring conformation 1: either the region (u2, v2) or (x2, y2) is modified, preserving the regions (u1, v1), (x1, y1); C, ‘strengthen a stem’ mutation favoring conformation 2: the regions (u2, v2), (x2, y2) are changed while maintaining complementarity; and D, ‘weaken a stem’ mutation favoring conformation 2: either the region (u1, v1) or (x1, y1) is altered, ensuring that the regions (u2, v2), (x2, y2) remain intact.
Subsequently, the mutated sequences for selecting candidate RNA switches, along with the reference sequence, were pooled into a single DNA oligonucleotide library. The impact of each sequence on reporter gene expression was evaluated in cells, as outlined in the Massively Parallel Reporter Assay section. Consequently, each candidate RNA switch in the library is represented by its reference sequence, two mutated sequences favoring conformation 1 (A and B), and two mutated sequences favoring conformation 2 (C and D).
Candidate RNA switch ranking
For each candidate RNA switch, its effect on reporter gene expression was assessed in cells, following the protocol described in the Massively Parallel Reporter Assay section. This resulted in 16 measurements, corresponding to normalized read counts in sorting bins 1 (lowest expression) to bin 8 (highest expression), across two replicates; these arrays of counts are referred to as ‘bin_counts’. Measurements were obtained for mutants A, B, C, D, and the reference sequence. Correlations between the effects of mutations designed to favor the same or opposite conformations were computed as follows: correlation_same_1 = Pearsonr(bin_counts(mutant A), bin_counts(mutant B)); correlation_same_2 = Pearsonr(bin_counts(mutant C), bin_counts(mutant D)); correlation_opposite_1 = Pearsonr(bin_counts(mutant A), bin_counts(mutant C)); and correlation_opposite_2 = Pearsonr(bin_counts(mutant A), bin_counts(mutant D)). The score of each candidate switch was then calculated as: score = mean(correlation_same_1, correlation_same_2) − mean(correlation_opposite_1, correlation_opposite_2). Candidate switches were ranked based on this score. Those with a score exceeding the mean + 1 s.d. were considered significant.
DMS-MaPseq
DMS-MaPseq was performed as described in ref. 54. In brief, cells were incubated in culture with 1.5% DMS (Sigma) at room temperature for 7 min, the media was removed, and DMS was quenched with 30% BME (β-mercaptoethanol). Total RNA from DMS-treated cells and untreated cells was then isolated using Trizol (Invitrogen). RNA was reverse transcribed using TGIRT-III reverse transcriptase (InGex) and target-specific primers. PCR was then performed to amplify the desired sequences and to add Illumina-compatible adapters. The libraries were then sequenced on a HiSeq4000 instrument (Illumina).
Pear (v0.9.6) was used to merge the paired reads into a single combined read. The UMI was then removed from the reads and appended to read names using UMI tools (v1.0). The reads were then reverse complemented (fastx toolkit) and mapped to the amplicon sequences using bwa mem (v0.7). The resulting bam files were then sorted and deduplicated (umi_tools, with method flag set to unique). The alignments were then parsed for mutations using the CTK (CLIP Tool Kit) software. The mutation frequency at every position was then reported. The signal normalization was performed using boxplot normalization76. The top 10% of positions with the highest mutation rates were considered outliers77. The clustering of DMS-MaPseq signal was performed with DRACO28.
SHAPE chemical probing of RNAs
Chemical probing and mutate-and-map experiments were carried out as described previously78. In brief, 1.2 pmol RNA was denatured at 95 °C in 50 mM Na-HEPES, pH 8.0, for 3 min, and folded by cooling to room temperature over 20 min, and then adding MgCl2 to a 10 mM concentration. RNA was aliquoted in 15 µl volumes into a 96-well plate and mixed with nuclease-free H2O (control), or chemically modified in the presence of 5 mM 1-methyl-7-nitroisatoic anhydride (1M7)79, for 10 min at room temperature. Chemical modification was stopped by adding 9.75 µl quench and purification mix (1.53 M NaCl, 1.5 µl washed oligo-dT beads, Ambion), 6.4 nM FAM-labeled, reverse-transcriptase primer (/56-FAM/AAAAAAAAAAAAAAAAAAAAGTTGTTCTTGTTGTTTCTTT), and 2.55 M Na-MES. RNA in each well was purified by bead immobilization on a magnetic rack and two washes with 100 µl 70% ethanol. RNA was then resuspended in 2.5 µl nuclease-free water prior to reverse transcription.
RNA was reverse transcribed from annealed fluorescent primer in a reaction containing 1× First Strand Buffer (Thermo Fisher), 5 mM dithiothreitol, 0.8 mM dNTP mix and 20 U SuperScript III Reverse Transcriptase (Thermo Fisher) at 48 °C for 30 min. RNA was hydrolyzed in the presence of 200 mM NaOH at 95 °C for 3 min, then placed on ice for 3 min and quenched with 1 volume 5 M NaCl, 1 volume 2 M HCl, and 1 volume 3 M sodium acetate. cDNA was purified on magnetic beads, then eluted by incubation for 20 min in 11 µl Formamide-ROX350 mix (1,000 µl Hi-Di Formamide (Thermo Fisher) and 8 µl ROX350 ladder (Thermo Fisher)). Samples were then transferred to a 96-well plate in ‘concentrated’ form (4 µl sample + 11 µl ROX mix) and ‘dilute’ form (1 µl sample + 14 µl ROX mix) for saturation correction in downstream analysis. Sample plates were sent to Elim Biopharmaceuticals for analysis by capillary electrophoresis.
Antisense oligonucleotide infection
ASOs were purchased from Integrated DNA Technologies; the Morpholino ASOs were purchased from Gene Tools LLC (see sequences in Data File S9). A total of 95,000 HEK cells were seeded into the wells of a 24-well cell culture-treated plate in a total volume of 500 µl. At 24 h later, either 1 nmol Morpholino ASO together with 3 µl EndoPorter reagent (Gene Tools LLC), or 6 pmol other ASO were added to each well. LNCaP, MCF-7 and LS174T cells were infected with ASOs using Lonza SE Cell Line 4D-Nucleofector X Kit S (cat. no. V4XC-1032) according to the manufacturer’s protocol. At 48 h later, the mCherry and eGFP fluorescence was measured on a BD FACSCelesta Cell Analyzer, or RNA was isolated for RT-qPCR measurement with the Zymo QuickRNA Microprep isolation kit with in-column DNase treatment per the manufacturer’s protocol.
CRISPRi screen
Reporter screens were conducted using established flow cytometry screen protocols80 (Horlbeck et al., 2016; Sidrauski et al., 2015). Jurkat cells with previously verified CRISPRi activity were used (Horlbeck et al., 2018). The CRISPRi-v2 (5 sgRNA/TSS, Addgene cat. no. 83969) sgRNA library was transduced into Jurkat cells at a multiplicity of infection of <0.3 (the percentage of blue fluorescent protein (BFP)-positive cells was ~30%). For the flow-based CRISPRi screen with the Jurkat cells, the sgRNA library virus was transfected at an average of 500-fold coverage after transduction (day 0). Puromycin (1 µg ml−1) selection for positively transduced cells was performed at 48 h (day 2) and 72 h (day 3) after transduction (day 3). On day 11, cells were collected in PBS and sorted with the BD FACSAria Fusion cell sorter. Cells were gated into the 25% of cells with the highest GFP : mCherry fluorescence intensity ratio, and the 25% of cells with the lowest ratio. The screens were performed with two conditions: cells with a reference RORC element–GFP reporter and a mutated 77-23 RORC element–GFP reporter. Screens were additionally performed in duplicate. After sorting, genomic DNA was collected (Macherey-Nagel Midi Prep kit) and amplified using NEB Next Ultra II Q5 master mix and primers containing TruSeq Indexes for next-generation sequencing. Sample libraries were prepared and sequenced on a HiSeq 4000. Guides were then quantified with the published ScreenProcessing (https://github.com/mhorlbeck/ScreenProcessing) method and phenotypes generated with an in-house processing pipeline, iAnalyzer (https://github.com/goodarzilab/iAnalyzer). In brief, iAnalyzer relies on fitting a generalized linear model to each gene. Coefficients from this generalized linear model were z-score normalized to the negative control guides and finally the largest coefficients were analyzed as potential hits. For the comparison of gene phenotypes between the two cell lines, the DESeq2 ratio of ratios test was used57.
CRISPRi-mediated and small interfering RNA-mediated gene knockdown
Jurkat cells expressing the dCas9–KRAB fusion protein were constructed by lentiviral delivery of pMH0006 (Addgene, cat. no. 135448) and FACS isolation of BFP-positive cells.
Guide RNA sequences for CRISPRi-mediated gene knockdown were cloned into pCRISPRia-v2 (Addgene, cat. no. 84832) via BstXI-BlpI sites. After transduction with sgRNA lentivirus, Jurkat cells were selected with 2 µg ml−1 puromycin (Gibco). The fluorescence of eGFP and of mCherry was measured on a BD FACSCelesta Cell Analyzer.
For UPF1 siRNA-mediated knockdown, the TriFECTa DsiRNA Kit from Integrated DNA Technologies (cat. no. hs.Ri.UPF1.13) was used. LNCaP, MCF-7 and LS174T cells were infected with siRNAs using the Lonza SE Cell Line 4D-Nucleofector X Kit S (cat. no. V4XC-1032) according to the manufacturer’s protocol. At 48 h later, RNA was collected using the Zymo QuickRNA Microprep isolation kit with in-column DNase treatment as per the manufacturer’s protocol.
Reporter cell line generation
Mutated or reference sequences of RORC 3ʹUTR were cloned into the dual GFP–mCherry reporter using the MluI-HF and PacI restriction enzymes (NEB) as described above. The reporters were lentivirally delivered to HEK293 and Jurkat cells and analyzed with flow cytometry as described above.
Drug treatment
Jurkat cells were seeded at a density of 0.25 × 107 cells per ml. Either the proteasome inhibitors (Carfilzonib or Bortezomib, Cayman Chemical) or negative control (dimethyl sulfoxide, DMSO) were added at the given concentration. After 24 h of incubation, the fluorescence of eGFP and of mCherry was measured on a BD FACSCelesta Cell Analyzer.
MCF-7 cells were treated either with 50 µM NMDI14 (TargetMol), or with DMSO, for 24 h. Afterwards, cells were treated with DMS as describe above and the RNA was collected as described above.
mRNA stability measurements
Jurkat cells were treated with 10 μg ml−1 α-amanitin (Sigma-Aldrich, cat. no. A2263) for 8–9 h prior to total RNA extractions. Total RNA was isolated using the Zymo QuickRNA Microprep isolation kit with in-column DNase treatment as per the manufacturer’s protocol. mRNA levels were measured with RT-PCR, using 18S ribosomal RNA (transcribed by RNA Pol I) as the control.
T-cell isolation, transduction and Th17 cell differentiation
Th17 cells were derived as described previously34. Plates were coated with 2 µg ml−1 anti-human CD3 (UCSF monoclonal antibody core, clone: OKT-3) and 4 µg ml−1 anti-human CD28 (UCSF monoclonal antibody core, clone: 9.3) in PBS with calcium and magnesium for at least 2 h at 37 °C or overnight at 4 °C with the plate wrapped in parafilm. Human CD4+ T cells were isolated from human peripheral blood using the EasySep human CD4+ T cell isolation kit (17952; STEMCELL) and stimulated in ImmunoCult-XF T-cell expansion medium (10981; STEMCELL) supplemented with 10 mM HEPES, 2 mM l-glutamine, 100 µM 2-MOE, 1 mM sodium pyruvate and 10 ng ml−1 transforming growth factor-β. At 24 h after T-cell isolation and initial stimulation on a 96-well plate, 7 µl lentivirus was added to each sample. After 24 h, the media was removed from each sample without disturbing the cells and replaced with 200 µl fresh media. After 48 h, cells were stimulated with 1.2 µM ionomycin, 25 nM propidium monoazide and 6 µg ml−1 brefeldin-A, resuspended by pipetting, incubated for 4 h at 37 °C, and collected for analysis. Half of each sample was stained for CD4, FoxP3, interleukin (IL)-13, IL-17A, interferon (IFN)-γ and analyzed on a BD LSRFortessa cell analyzer (see below). The other half of the sample was not stained and was analyzed for the expression of eGFP and mCherry on a BD LSRFortessa cell analyzer.
Cultured human T cells were collected, washed and stained with antibodies against cell surface proteins and transcription factors. Cells were fixed and permeabilized with the eBioscience Foxp3/Transcription Factor Staining Buffer Set or the Transcription Factor Buffer Set (BD Biosciences). Extracellular nonspecific binding was blocked with the anti-CD16/CD32 antibody (clone 2.4G2; UCSF Monoclonal Antibody Core). Intracellular nonspecific binding was blocked with anti-CD16/CD32 antibodies) and 2% normal rat serum. Dead cells were stained with Fixable Viability Dye eFluor 780 (eBioscience) or Zombie Violet Fixable Viability Kit (BioLegend). Cells were stained with the following fluorochrome-conjugated anti-human antibodies: anti-CD4 (Invitrogen, cat. no. 17-0049-42), anti-FOXP3 (eBioscience, cat. no. 25-4777-61), anti-IL-13 (eBioscience, cat. no. 11-7136-41), anti-IL-17A (eBioscience, cat. no. 12-7179-42) and anti-IFNγ (BioLegend, cat. no. 502520). All of the antibodies were used at 1:200 dilution. Samples were analyzed on a BD LSRFortessa cell analyzer. Data were analyzed using FlowJo 10.7.1 and BD FACSDiva v9 software.
Analysis of capillary electrophoresis data with HiTRACE
Capillary electrophoresis runs from chemical probing and mutate-and-map experiments were analyzed with the HiTRACE MATLAB package81. Lanes were aligned, bands fitted to Gaussian peaks, background subtracted using the no-modification lane, corrected for signal attenuation, and normalized to the internal hairpin control. The end result of these steps is a numerical array of ‘reactivity’ values for each RNA nucleotide that can be used as weights in structure prediction.
UPF1 targeted CLIP-seq
Jurkat cells expressing RORC reporters (reference, 77-GA mutant variant or 116-CCCTAAG mutant variant) were collected and crosslinked by ultraviolet radiation (400 mJ cm−2). Cells were then lysed with low salt wash buffer (1x PBS, 0.1% SDS, 0.5% sodium deoxycholate, 0.5% IGEPAL). To probe preferential UPF1 binding towards different reporters, lysates from 77-GA mutant cells were mixed with lysates from either wild-type or 116-CCCTAAG mutant cells at a 1:1 ratio prior to immunoprecipitation. Samples were then treated with a high dose (1:3,000 RNase A and 1:100 RNase I) and a low dose (1:15,000 RNase A and 1:500 RNase I) of RNase A and RNase I separately and combined after treatment. To immunoprecipitate UPF1–RNA complex, a UPF1 antibody (Thermo, cat. no. A301-902A) was incubated with Protein A/G beads (Pierce) first and then incubated with the mixed cell lysates for 2 h at 4 °C. Immunoprecipitated RNA fragments were then dephosphorylated (T4 PNK, NEB), polyadenylated and end-labeled with 3ʹ-azido-3ʹ-dUTP and IRDye 800CW DBCO Infrared Dye (LI-COR) on beads. SDS–PAGE was then performed to separate protein–RNA complexes, and RNA fragments were collected from nitrocellulose membrane by proteinase K digestion. cDNA was then synthesized using Takara smarter small RNA sequencing kit reagents with a custom UMI-oligoDT primer (CAAGCAGAAGACGGCATACGAGATNNNNNNNNGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTTTTTTTTTTTTTTT). The RORC reporter locus was then amplified with a custom primer (ACACTCTTTCCCTACACGACGCTCTTCCGATCT TGGGGTGATCCAAATACCACC) and sequencing libraries were then prepared with SeqAmp DNA Polymerase (Takara). Libraries were then sequenced on an illumina Hiseq 4000 sequencer.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Online content
Any methods, additional references, Nature Portfolio reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at 10.1038/s41592-024-02335-1.
Supplementary information
Acknowledgements
The authors thank C. Mathy, A. Natale, M. Imakaev, Y. Gomez, M. Zimanyi, A. Smith and A. Pawluk for helpful discussions. H.G. is an Era of Hope Scholar (W81XWH-2210121) and supported by R01CA240984 and R01CA244634. This work was partly supported by National Institutes of Health (NIH) grants 1R35GM140847 (Y.C.). L.A.G. is funded by an NIH New Innovator Award (DP2 CA239597), a Pew-Stewart Scholars for Cancer Research award and the Goldberg-Benioff Endowed Professorship in Prostate Cancer Translational Biology. Cryo-EM equipment at UCSF is partially supported by NIH grants S10OD020054, S10OD021741 and S10OD026881. Y.C. is an Investigator at Howard Hughes Medical Institute. Sequencing was performed at the UCSF CAT, supported by UCSF PBBR, RRP IMIA and NIH 1S10OD028511-01 grants. A.N. was supported by the DoD PRCRP Horizon Award W81XWH-19-1-0594. L.F. was supported by an NIH training grant T32CA108462-15.
Extended data
Author contributions
M.K. and H.G. designed the study. M.K. developed SwitchFinder. A.B. and C.C. designed a docker environment for SwitchFinder. M.K. and A.N. performed the massively parallel reporter assays. M.K., S.Z. and L.F. performed the DMS-MaPseq experiments. M.K. and C.P. performed the SHAPE experiments. D.A. and Y.C. performed the cryo-EM experiments. M.K. performed the mutagenesis experiments. M.K., K.Y. and J.G. performed the antisense oligonucleotide transfection experiments. M.K., S.K.Z. and K.M.A. performed the Th17 differentiation experiments. M.K, A.W. and L.A.G. performed the CRISPRi screens. M.K. performed the CRISPRi knockdown experiments. M.K. and J.Y. performed the proteasome inhibition experiments. S.Z. performed the CLIP-seq experiments. M.K. and H.G. wrote the manuscript with input from all of the authors.
Peer review
Peer review information
Nature Methods thanks the anonymous reviewers for their contribution to the peer review of this work. Peer reviewer reports are available. Primary Handling Editor: Rita Strack, in collaboration with the Nature Methods team.
Data availability
Sequencing data have been deposited in the Gene Expression Omnibus (GEO accession GSE266070). Cryo-EM density maps have been deposited in EMDB, accession numbers EMD- 42275 (WT Class A), EMD- 42276 (WT Class B), EMD- 42277 (WT Class C), EMD- 42400 (77-GA Class C), EMD- 42401 (77-GA Class A), EMD- 42403 (117-AC Class C) and EMD-42404 (117-AC Class B). Rfam database 14.10 (https://rfam.org/) was used in the study.
Code availability
SwitchFinder source code is available at https://github.com/goodarzilab/SwitchFinder.
Competing interests
M.K. and H.G. are inventors on a provisional patent related to this study. L.A.G. has filed patents on CRISPR functional genomics. The other authors have no competing interests.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
are available for this paper at 10.1038/s41592-024-02335-1.
Supplementary information
The online version contains supplementary material available at 10.1038/s41592-024-02335-1.
References
- 1.Gilbert, W. Origin of life: the RNA world.Nature319, 618 (1986). 10.1038/319618a0 [DOI] [Google Scholar]
- 2.Saad, N. Y. A ribonucleopeptide world at the origin of life. J. Syst. Evol.56, 1–13 (2018). 10.1111/jse.12287 [DOI] [Google Scholar]
- 3.Vitreschak, A. G., Rodionov, D. A., Mironov, A. A. & Gelfand, M. S. Riboswitches: the oldest mechanism for the regulation of gene expression? Trends Genet.20, 44–50 (2004). 10.1016/j.tig.2003.11.008 [DOI] [PubMed] [Google Scholar]
- 4.Sun, E. I. et al. Comparative genomics of metabolic capacities of regulons controlled by cis-regulatory RNA motifs in bacteria. BMC Genomics14, 597 (2013). 10.1186/1471-2164-14-597 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Serganov, A. & Nudler, E. A decade of riboswitches. Cell152, 17–24 (2013). 10.1016/j.cell.2012.12.024 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Wachter, A. Riboswitch-mediated control of gene expression in eukaryotes. RNA Biol.7, 67–76 (2010). 10.4161/rna.7.1.10489 [DOI] [PubMed] [Google Scholar]
- 7.Liu, N. et al. N(6)-methyladenosine-dependent RNA structural switches regulate RNA–protein interactions. Nature518, 560–564 (2015). 10.1038/nature14234 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Ray, P. S. et al. A stress-responsive RNA switch regulates VEGFA expression. Nature457, 915–919 (2009). 10.1038/nature07598 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Kalvari, I. et al. Rfam 14: expanded coverage of metagenomic, viral and microRNA families. Nucleic Acids Res.49, D192–D200 (2021). 10.1093/nar/gkaa1047 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Barsacchi, M., Novoa, E. M., Kellis, M. & Bechini, A. SwiSpot: modeling riboswitches by spotting out switching sequences. Bioinformatics32, 3252–3259 (2016). 10.1093/bioinformatics/btw401 [DOI] [PubMed] [Google Scholar]
- 11.Manzourolajdad, A. & Arnold, J. Secondary structural entropy in RNA switch (Riboswitch) identification. BMC Bioinformatics16, 133 (2015). 10.1186/s12859-015-0523-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Wheeler, T. J. & Eddy, S. R. nhmmer: DNA homology search with profile HMMs. Bioinformatics29, 2487–2489 (2013). 10.1093/bioinformatics/btt403 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Nawrocki, E. P. & Eddy, S. R. Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics29, 2933–2935 (2013). 10.1093/bioinformatics/btt509 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Bengert, P. & Dandekar, T. Riboswitch finder: a tool for identification of riboswitch RNAs. Nucleic Acids Res.32, W154–W159 (2004). 10.1093/nar/gkh352 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Abreu-Goodger, C. & Merino, E. RibEx: a web server for locating riboswitches and other conserved bacterial regulatory elements. Nucleic Acids Res.33, W690–W692 (2005). 10.1093/nar/gki445 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Chang, T.-H. et al. Computational identification of riboswitches based on RNA conserved functional sequences and conformations. RNA15, 1426–1430 (2009). 10.1261/rna.1623809 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Mukherjee, S. & Sengupta, S. Riboswitch Scanner: an efficient pHMM-based web-server to detect riboswitches in genomic sequences. Bioinformatics32, 776–778 (2016). 10.1093/bioinformatics/btv640 [DOI] [PubMed] [Google Scholar]
- 18.Ding, Y. & Lawrence, C. E. A statistical sampling algorithm for RNA secondary structure prediction. Nucleic Acids Res.31, 7280–7301 (2003). 10.1093/nar/gkg938 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Wang, X. et al. Systematic comparison and rational design of theophylline riboswitches for effective gene repression. Microbiol. Spectr.11, e0275222 (2023). 10.1128/spectrum.02752-22 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Vezeau, G. E., Gadila, L. R. & Salis, H. M. Automated design of protein-binding riboswitches for sensing human biomarkers in a cell-free expression system. Nat. Commun.14, 2416 (2023). 10.1038/s41467-023-38098-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Mortimer, S. A., Kidwell, M. A. & Doudna, J. A. Insights into RNA structure and function from genome-wide studies. Nat. Rev. Genet.15, 469–479 (2014). 10.1038/nrg3681 [DOI] [PubMed] [Google Scholar]
- 22.Rouskin, S., Zubradt, M., Washietl, S., Kellis, M. & Weissman, J. S. Genome-wide probing of RNA structure reveals active unfolding of mRNA structures in vivo. Nature505, 701–705 (2014). 10.1038/nature12894 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Leamy, K. A., Assmann, S. M., Mathews, D. H. & Bevilacqua, P. C. Bridging the gap between in vitro and in vivo RNA folding. Q. Rev. Biophys.49, e10 (2016). 10.1017/S003358351600007X [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Sun, L. et al. RNA structure maps across mammalian cellular compartments. Nat. Struct. Mol. Biol.26, 322–330 (2019). 10.1038/s41594-019-0200-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Beaudoin, J.-D. et al. Analyses of mRNA structure dynamics identify embryonic gene regulatory programs. Nat. Struct. Mol. Biol.25, 677–686 (2018). 10.1038/s41594-018-0091-z [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Zubradt, M. et al. DMS-MaPseq for genome-wide or targeted RNA structure probing in vivo. Nat. Methods14, 75–82 (2017). 10.1038/nmeth.4057 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Mortimer, S. A., Trapnell, C., Aviran, S., Pachter, L. & Lucks, J. B. SHAPE-Seq: high-throughput RNA structure analysis. Curr. Protoc. Chem. Biol.4, 275–297 (2012). 10.1002/9780470559277.ch120019 [DOI] [PubMed] [Google Scholar]
- 28.Morandi, E. et al. Genome-scale deconvolution of RNA structure ensembles. Nat. Methods18, 249–252 (2021). 10.1038/s41592-021-01075-w [DOI] [PubMed] [Google Scholar]
- 29.Tomezsko, P. J. et al. Determination of RNA structural diversity and its role in HIV-1 RNA splicing. Nature582, 438–442 (2020). 10.1038/s41586-020-2253-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Oikonomou, P., Goodarzi, H. & Tavazoie, S. Systematic identification of regulatory elements in conserved 3ʹ UTRs of human transcripts. Cell Rep.7, 281–292 (2014). 10.1016/j.celrep.2014.03.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Wilkinson, K. A., Merino, E. J. & Weeks, K. M. Selective 2ʹ-hydroxyl acylation analyzed by primer extension (SHAPE): quantitative RNA structure analysis at single nucleotide resolution. Nat. Protoc.1, 1610–1616 (2006). 10.1038/nprot.2006.249 [DOI] [PubMed] [Google Scholar]
- 32.Eberl, G. RORγt, a multitask nuclear receptor at mucosal surfaces. Mucosal Immunol.10, 27–34 (2017). 10.1038/mi.2016.86 [DOI] [PubMed] [Google Scholar]
- 33.Zhong, C. & Zhu, J. Small-molecule RORγt antagonists: one stone kills two birds. Trends Immunol.38, 229–231 (2017). 10.1016/j.it.2017.02.006 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Montoya, M. M. & Ansel, K. M. Small RNA transfection in primary human Th17 cells by next generation electroporation. J. Vis. Exp. (122), 55546 [DOI] [PMC free article] [PubMed]
- 35.Bose, R., Saleem, I. & Mustoe, A. M. Causes, functions, and therapeutic possibilities of RNA secondary structure ensembles and alternative states. Cell Chem. Biol.31, 17–35 (2024). 10.1016/j.chembiol.2023.12.010 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.de Boer, C. G., Ray, J. P., Hacohen, N. & Regev, A. MAUDE: inferring expression changes in sorting-based CRISPR screens. Genome Biol.21, 134 (2020). 10.1186/s13059-020-02046-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.López-Perrote, A. et al. Human nonsense-mediated mRNA decay factor UPF2 interacts directly with eRF3 and the SURF complex. Nucleic Acids Res.44, 1909–1923 (2016). 10.1093/nar/gkv1527 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Yi, Z. et al. Mammalian UPF3A and UPF3B can activate nonsense-mediated mRNA decay independently of their exon junction complex binding. EMBO J.41, e109202 (2022). 10.15252/embj.2021109202 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Kurosaki, T., Popp, M. W. & Maquat, L. E. Quality and quantity control of gene expression by nonsense-mediated mRNA decay. Nat. Rev. Mol. Cell Biol.20, 406–420 (2019). 10.1038/s41580-019-0126-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Fischer, J. W., Busa, V. F., Shao, Y. & Leung, A. K. L. Structure-mediated RNA decay by UPF1 and G3BP1. Mol. Cell78, 70–84 (2020). 10.1016/j.molcel.2020.01.021 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Martin, L. et al. Identification and characterization of small molecules that inhibit nonsense-mediated RNA decay and suppress nonsense p53 mutations. Cancer Res.74, 3104–3113 (2014). 10.1158/0008-5472.CAN-13-2235 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Kuroha, K., Tatematsu, T. & Inada, T. Upf1 stimulates degradation of the product derived from aberrant messenger RNA containing a specific nonsense mutation by the proteasome. EMBO Rep.10, 1265–1271 (2009). 10.1038/embor.2009.200 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Winkler, W., Nahvi, A. & Breaker, R. R. Thiamine derivatives bind messenger RNAs directly to regulate bacterial gene expression. Nature419, 952–956 (2002). 10.1038/nature01145 [DOI] [PubMed] [Google Scholar]
- 44.Mironov, A. S. et al. Sensing small molecules by nascent RNA: a mechanism to control transcription in bacteria. Cell111, 747–756 (2002). 10.1016/S0092-8674(02)01134-0 [DOI] [PubMed] [Google Scholar]
- 45.Rodionov, D. A. Comparative genomic reconstruction of transcriptional regulatory networks in bacteria. Chem. Rev.107, 3467–3497 (2007). 10.1021/cr068309+ [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Vitreschak, A. G., Rodionov, D. A., Mironov, A. A. & Gelfand, M. S. Regulation of the vitamin B12 metabolism and transport in bacteria by a conserved RNA structural element. RNA9, 1084–1097 (2003). 10.1261/rna.5710303 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Backofen, R., Gorodkin, J., Hofacker, I. L. & Stadler, P. F. Comparative RNA genomics.Methods Mol. Biol.2802, 347–393 (2024). 10.1007/978-1-0716-3838-5_12 [DOI] [PubMed] [Google Scholar]
- 48.Leypold, N. A. & Speicher, M. R. Evolutionary conservation in noncoding genomic regions. Trends Genet.37, 903–918 (2021). 10.1016/j.tig.2021.06.007 [DOI] [PubMed] [Google Scholar]
- 49.Ureta-Vidal, A., Ettwiller, L. & Birney, E. Comparative genomics: genome-wide analysis in metazoan eukaryotes. Nat. Rev. Genet.4, 251–262 (2003). 10.1038/nrg1043 [DOI] [PubMed] [Google Scholar]
- 50.Clote, P. Computational prediction of riboswitches. Methods Enzymol.553, 287–312 (2015). 10.1016/bs.mie.2014.10.063 [DOI] [PubMed] [Google Scholar]
- 51.Antunes, D., Jorge, N. A. N., Caffarena, E. R. & Passetti, F. Using RNA sequence and structure for the prediction of riboswitch aptamer: a comprehensive review of available software and tools. Front. Genet.8, 231 (2017). 10.3389/fgene.2017.00231 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Manzourolajdad, A. & Spouge, J. L. Structural prediction of RNA switches using conditional base-pair probabilities. PLoS One14, e0217625 (2019). 10.1371/journal.pone.0217625 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Kappel, K. et al. Accelerated cryo-EM-guided determination of three-dimensional RNA-only structures. Nat. Methods17, 699–707 (2020). 10.1038/s41592-020-0878-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Fish, L. et al. A prometastatic splicing program regulated by SNRPA1 interactions with structured RNA elements. Science372, eabc7531 (2021). 10.1126/science.abc7531 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Kelly, M. L. et al. RNA conformational propensities determine cellular activity. Preprint at 10.1101/2022.12.05.519207 (2022).
- 56.Reuter, J. S. & Mathews, D. H. RNAstructure: software for RNA secondary structure prediction and analysis. BMC Bioinformatics11, 129 (2010). 10.1186/1471-2105-11-129 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.DESeq2 testing ratio of ratios (RIP-Seq, CLIP-Seq, ribosomal profiling). https://support.bioconductor.org/p/61509/
- 58.Navickas, A. et al. An mRNA processing pathway suppresses metastasis by governing translational control from the nucleus. Nat. Cell Biol.25, 892–903 (2023). 10.1038/s41556-023-01141-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Lin, L., McKerrow, W. H., Richards, B., Phonsom, C. & Lawrence, C. E. Characterization and visualization of RNA secondary structure Boltzmann ensemble via information theory. BMC Bioinformatics19, 82 (2018). 10.1186/s12859-018-2078-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Cover, T. M. & Thomas, J. A. Elements of Information Theory (John Wiley & Sons, 2006).
- 61.Lu, Z. J., Gloor, J. W. & Mathews, D. H. Improved RNA secondary structure prediction by maximizing expected pair accuracy. RNA15, 1805–1813 (2009). 10.1261/rna.1643609 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Dotu, I., Lorenz, W. A., Van Hentenryck, P. & Clote, P. Computing folding pathways between RNA secondary structures. Nucleic Acids Res.38, 1711–1722 (2010). 10.1093/nar/gkp1054 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.wassermanlab/BiasAway. altschulEriksonDinuclShuffle.py. GitHubhttps://github.com/wassermanlab/BiasAway/blob/master/altschulEriksonDinuclShuffle.py (2013).
- 64.Karolchik, D., Hinrichs, A. S. & Kent, W. J. The UCSC Genome Browser. Curr. Protoc. BioinformaticsChapter 1, 1.4.1–1.4.33 (2012). [DOI] [PubMed] [Google Scholar]
- 65.Mastronarde, D. N. SerialEM: a program for automated tilt series acquisition on Tecnai microscopes using prediction of specimen position. Microsc. Microanal.9, 1182–1183 (2003). 10.1017/S1431927603445911 [DOI] [Google Scholar]
- 66.Zheng, S. Q. et al. MotionCor2: anisotropic correction of beam-induced motion for improved cryo-electron microscopy. Nat. Methods14, 331–332 (2017). 10.1038/nmeth.4193 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Ctffind4. https://grigoriefflab.umassmed.edu/ctffind4
- 68.Tang, G. et al. EMAN2: an extensible image processing suite for electron microscopy. J. Struct. Biol.157, 38–46 (2007). 10.1016/j.jsb.2006.05.009 [DOI] [PubMed] [Google Scholar]
- 69.Punjani, A., Rubinstein, J. L., Fleet, D. J. & Brubaker, M. A. cryoSPARC: algorithms for rapid unsupervised cryo-EM structure determination. Nat. Methods14, 290–296 (2017). 10.1038/nmeth.4169 [DOI] [PubMed] [Google Scholar]
- 70.Goddard, T. D. et al. UCSF ChimeraX: meeting modern challenges in visualization and analysis. Protein Sci.27, 14–25 (2018). 10.1002/pro.3235 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J.17, 10–12 (2011). 10.14806/ej.17.1.200 [DOI] [Google Scholar]
- 72.Smith, T., Heger, A. & Sudbery, I. UMI-tools: modeling sequencing errors in unique molecular identifiers to improve quantification accuracy. Genome Res.27, 491–499 (2017). 10.1101/gr.209601.116 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Liao, Y., Smyth, G. K. & Shi, W. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics30, 923–930 (2014). 10.1093/bioinformatics/btt656 [DOI] [PubMed] [Google Scholar]
- 74.Anders, S. & Huber, W. Differential expression analysis for sequence count data. Genome Biol.11, R106 (2010). 10.1186/gb-2010-11-10-r106 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Ashuach, T. et al. MPRAnalyze: statistical framework for massively parallel reporter assays. Genome Biol.20, 183 (2019). 10.1186/s13059-019-1787-z [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Low, J. T. & Weeks, K. M. SHAPE-directed RNA secondary structure prediction. Methods52, 150–158 (2010). 10.1016/j.ymeth.2010.06.007 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Hajdin, C. E. et al. Accurate SHAPE-directed RNA secondary structure modeling, including pseudoknots. Proc. Natl Acad. Sci. USA110, 5498–5503 (2013). 10.1073/pnas.1219988110 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Palka, C., Forino, N. M., Hentschel, J., Das, R. & Stone, M. D. Folding heterogeneity in the essential human telomerase RNA three-way junction. RNA26, 1787–1800 (2020). 10.1261/rna.077255.120 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Turner, R., Shefer, K. & Ares, M. Jr. Safer one-pot synthesis of the ‘SHAPE’ reagent 1-methyl-7-nitroisatoic anhydride (1m7). RNA19, 1857–1863 (2013). 10.1261/rna.042374.113 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Gilbert, L. A. et al. Genome-scale CRISPR-mediated control of gene repression and activation. Cell159, 647–661 (2014). 10.1016/j.cell.2014.09.029 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Yoon, S. et al. HiTRACE: high-throughput robust analysis for capillary electrophoresis. Bioinformatics27, 1798–1805 (2011). 10.1093/bioinformatics/btr277 [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Sequencing data have been deposited in the Gene Expression Omnibus (GEO accession GSE266070). Cryo-EM density maps have been deposited in EMDB, accession numbers EMD- 42275 (WT Class A), EMD- 42276 (WT Class B), EMD- 42277 (WT Class C), EMD- 42400 (77-GA Class C), EMD- 42401 (77-GA Class A), EMD- 42403 (117-AC Class C) and EMD-42404 (117-AC Class B). Rfam database 14.10 (https://rfam.org/) was used in the study.
SwitchFinder source code is available at https://github.com/goodarzilab/SwitchFinder.