Abstract
Over recent years, long-range RNA structure has emerged as a factor that is fundamental to alternative splicing regulation. An increasing number of human disorders are now being associated with splicing defects; hence it is essential to develop methods that assess long-range RNA structure experimentally. RNA in situ conformation sequencing (RIC-seq) is a method that recapitulates RNA structure within physiological RNA–protein complexes. In this work, we juxtapose pairs of conserved complementary regions (PCCRs) that were predicted in silico with the results of RIC-seq experiments conducted in seven human cell lines. We show statistically that RIC-seq support of PCCRs correlates with their properties, such as equilibrium free energy, presence of compensatory substitutions, and occurrence of A-to-I RNA editing sites and forked eCLIP peaks. Exons enclosed in PCCRs that are supported by RIC-seq tend to have weaker splice sites and lower inclusion rates, which is indicative of post-transcriptional splicing regulation mediated by RNA structure. Based on these findings, we prioritize PCCRs according to their RIC-seq support and show, using antisense nucleotides and minigene mutagenesis, that PCCRs in two disease-associated human genes, PHF20L1 and CASK, and also PCCRs in their murine orthologs, impact alternative splicing. In sum, we demonstrate how RIC-seq experiments can be used to discover functional long-range RNA structures, and particularly those that regulate alternative splicing.
Keywords: long-range, RNA interaction, RIC-seq, PCCR, PHF20L1, CASK, splicing
INTRODUCTION
Double-stranded RNA structures, many of which are long-range and span thousands of nucleotides, provide an important layer in the pre-mRNA splicing regulation (Baralle et al. 2019). Understanding of structure-mediated mechanisms that underlie pre-mRNA splicing is not only a fundamental question of molecular biology, but also a key element for the development of effective therapies in splicing-associated human diseases (Garcia-Lopez et al. 2018; Hale et al. 2019; Singh and Singh 2019). Yet, systematic experimental assessment of long-range RNA structures has not been possible until the recent development of high-throughput sequencing protocols based on RNA proximity ligation (Kudla et al. 2020).
RNA base pairings can be assessed in vivo and in vitro by a number of strategies, which can be broadly subdivided into classical RNA structure probing and conformation capture types of methods (Wang et al. 2021; Xu et al. 2022). The use of classical RNA structure probing for long-range RNA structures is limited because it can only tell whether an RNA base is paired, but cannot tell with which other base. In contrast, conformation capture methods use digestion and religation of RNAs crosslinked in macromolecular complexes to assess their spatial proximity and, therefore, are able to identify base-paired nucleotide partners. For instance, PARIS (Lu et al. 2016), LIGR-seq (Sharma et al. 2016), SPLASH (Aw et al. 2016), and COMRADES (Ziv et al. 2018) specifically target double-stranded regions using psoralen derivatives, which intercalate into RNA duplexes and induce reversible crosslinking. However, a new class of RNA in situ conformation sequencing (RIC-seq) methods, in which RNAs are crosslinked through RNA-binding proteins (RBPs), are increasingly being used to identify RNA structures that occur within physiological RNA–RBP complexes (Cai et al. 2020; Cao et al. 2021; Ye et al. 2023).
While computational prediction of long-range RNA structures at the scale of full eukaryotic transcripts is problematic, it still can be approached by comparative methods that measure the rate of compensatory substitutions in evolutionarily conserved regions (Pervouchine 2018). Our recent study cataloged 916,360 pairs of conserved complementary regions (PCCRs) in human protein-coding genes that have a remarkable association with splicing and strong support by RNA editing sites and the so-called forked eCLIP peaks, that is, simultaneous RBP footprints near complementary sequences (Kalmykova et al. 2021). However, only a fraction of structural alignments were convincingly supported by covariations, while the majority of RNA structures with established biological function lacked evolutionary support because the amount of sequence variation was insufficient to estimate their statistical significance through the rate of compensatory substitutions (Kalmykova et al. 2021). This naturally leads to a question of whether computational long-range RNA structure prediction can be improved by taking into account RNA proximity ligation data.
In this work, we study the problem of assessment and prioritization of PCCRs using a panel of RIC-seq experiments. Toward this goal, we correlate the local RIC-seq support with important PCCR properties such as equilibrium free energy, significance of compensatory substitutions, the presence of A-to-I RNA editing sites and forked eCLIP peaks. Furthermore, we characterize exons enclosed in PCCRs and those of their flanking introns, with a focus on exons that are involved in gene expression regulation via unproductive splicing (Lareau et al. 2007). Based on RIC-seq evidence, we identified PCCRs in two disease-associated genes, PHF20L1 and CASK, and demonstrated that they strongly impact alternative splicing in human and murine cell lines. To validate the role of RNA structure, we used an experimental strategy, in which we first probe a PCCR using antisense oligonucleotides (AONs) obstructing one or the other complementary sequence, and then use site-directed mutagenesis in minigenes to prove that long-range RNA structure indeed controls alternative splicing (Kalinina et al. 2021).
RESULTS
Concordance of PCCRs and RNA contacts
To evaluate the support of PCCRs, we analyzed a panel of RIC-seq experiments conducted in human cell lines including GM12878, H1, HeLa, HepG2, IMR90, K562, and hNPC to extract RNA contacts. An RNA contact is characterized by a pair of genomic coordinates, where proximal RNA strands were religated after digestion, and by the number of supporting split reads (see Materials and Methods). In total, approximately 25 million RNA contacts and 46 million supporting split reads were obtained (Supplemental Table S1). We hypothesized that religation of RNA strands near PCCRs should result in RNA contacts shown in Figure 1A, that is, ones that support the PCCR from the outer or from the inner side corresponding to split reads in the collinear (2–3) or chimeric (1–4) orientation, respectively.
FIGURE 1.
RNA contacts. (A) Digestion and religation of RNA strands adjacent to a PCCR result in RNA contacts (in the form of split reads) supporting the PCCR from the inner (2–3) or the outer (1–4) side. (B) The inner (I) and the outer (O) contacts correspond to the inner and outer arcs with respect to the RNA structure. The outer and the inner contacts are represented by neo-junctions and chimeric split reads, respectively.
Accordingly, we selected 802,536 out of 916,360 PCCRs documented in Kalmykova et al. (2021) with a distance between complementary regions of at least 200 nt, and surrounded them by windows with a radius of 100 nt centered at the middle of each complementary region (Fig. 1B). This way of centering allows for a uniform assessment of RNA contacts near PCCRs, which also accounts for contacts that occur within the complementary region itself. As a result, PCCRs were subdivided into four mutually exclusive groups: ones supported by at least one RNA contact in a 100-nt window inside but not outside (I), ones supported by at least one RNA contact outside but not inside (O), ones supported by at least one RNA contact both inside and outside (IO), and the rest of the PCCRs (N).
Since each individual RIC-seq experiment yields sparse RNA contacts (Margasyuk et al. 2023), we first chose to analyze PCCR support by RNA contacts using a presence/absence call, that is, a PCCR was assigned to the I, O, IO, and N groups regardless of the number of supporting RIC-seq reads. This subdivision was done at the level of each individual RIC-seq experiment, as well as for the union of bioreplicates in each cell line, and also for the pooled set of all RIC-seq experiments in all cell lines (Table 1). Remarkably, the number of PCCRs in the pooled IO group was much larger than the expected additive contribution from individual cell lines, that is, many PCCRs were supported inside in one cell line and outside in another. This observation indicates that collective evidence from multiple data sets may be more appropriate than one limited to a particular biological condition.
TABLE 1.
The number of PCCRs in the IO, I, O, and N groups (see text)
To estimate the concordance between bioreplicates, we estimated the overlap between PCCR groups in two bioreplicates of each RIC-seq experiment. As expected, the largest overlap was observed for the cognate classes in the IO, I, and O groups (Supplemental Fig. S1). Next, we compared IO, I, and O classes between RIC-seq replicates in different cell lines. The groups were most similar between bioreplicates, but the degree of similarity was expectedly lower when comparing cell lines with each other (Supplemental Fig. S2). The level of consistency (9%–15%) between bioreplicates indicated once again that each individual RIC-seq experiment yields relatively sparse data, and that aggregating results from several independent RIC-seq experiments could provide a better strategy for further analysis.
Properties of PCCRs supported by RIC-seq
A PCCR is characterized by the equilibrium free energy (ΔG), which is estimated from the parameters of thermodynamic RNA folding (Mathews et al. 1999), the E-value, which scores independent compensatory substitutions on a phylogenetic tree (Rivas et al. 2017), the overlap with A-to-I RNA editing sites, which are indicative of a double-stranded context (Ramaswami and Li 2014; Picardi et al. 2017), and the occurrence of forked eCLIP peaks, which reflect RBP crosslinking near complementary RNA strands (Supplemental Fig. S3). To evaluate the relationship between these metrics and the RIC-seq-derived RNA contacts, we first identified PCCRs that are supported by at least one RNA contact in the IO, I, and O groups in at least k cell lines and compared them to PCCRs that were not supported by RNA contacts (i.e., k = 0).
The stability of PCCRs (i.e., the absolute value of ΔG), the proportion of PCCRs with A-to-I RNA editing sites, and the proportion of PCCRs with forked eCLIP peaks consistently increased with increasing k, while the compensatory substitution support (E-value) did not show a strong dependence on k (Fig. 2 for the IO group, Supplemental Figs. S4, S5 for the I and O groups, respectively). To further characterize PCCRs with different levels of RIC-seq support, we estimated the number of PCCRs that are supported by inner and outer contacts using a bivariate number of cell lines and constructed 95% confidence intervals for their equilibrium free energy (Supplemental Tables S2, S3). We observed a consistent trend of increasing ΔG (by absolute value) with increasing RIC-seq support from either side, with sufficiently many PCCRs having very strong support, for example, more than 300 PCCRs being supported in at least five cell lines by both inside and outside contacts.
FIGURE 2.
Properties of PCCRs in the IO category. The equilibrium free energy ΔG (A), the R-scape E-value (Rivas et al. 2017) (B), the frequency of A-to-I RNA editing sites (C), and the frequency of forked eCLIP peaks (D) near PCCRs supported inside and outside (IO category) in at least k cell lines. See Supplemental Figures S4 and S5 for the I and O categories, respectively. Statistically discernible differences at the 0.01% significance level and nonsignificant differences are denoted by (****) and “ns,” respectively (two-tailed Mann–Whitney test). Whiskers indicate 95% confidence intervals for proportions.
PCCRs were assigned to I, O, IO, and N groups based on the presence of RNA contacts in a 100-nt window. To better characterize the relative position of RNA contacts with respect to PCCRs, we constructed a random forest (RF) classifier that predicts the occurrence of forked eCLIP peaks near PCCR from a more detailed distance information. Toward this goal, we surrounded each complementary region, left (L) and right (R), by three 50-nt bins extending upstream (−3, −2, −1) and downstream (1, 2, 3) from the center of each region (Fig. 3A). Since the number of RNA contacts is higher at shorter distances, we included the distance between the complementary parts of the PCCR, referred to as spread, as a variable in the model to account for its confounding effect (Margasyuk et al. 2023). Spread alone was able to predict eCLIP peaks near PCCR with AUC = 0.66, while the addition of RNA contacts increased the quality of the predictions to AUC = 0.74 (Fig. 3B). The presence of immediately adjacent inner and outer contacts, that is, contacts between bins 1L and −1R, and contacts between bins −1L and 1R, were the two most important features for the classifier (Fig. 3C). This result demonstrates that the overlap between PCCRs and RNA contacts located in their close proximity is nonrandom and consistent with the nested organization of complementary regions and RNA contacts inferred from RIC-seq.
FIGURE 3.
Importance of RNA contacts near PCCRs. (A) For each PCCR, six 50-nt bins centering in the middle of the complementary sequence were chosen, and split read counts of RNA contacts from all RIC-seq experiments were computed for 6 × 6 = 36 combinations. (B) The performance (TPR, true positive rate, vs. FPR, false positive rate) of the random forest classifier predicting the presence of forked eCLIP peaks as a function of spread alone (green), RIC-seq support alone (orange), and spread and RIC-seq support together (blue). AUC is the area under the curve. (C) Feature importance (color map) for 36 bin combinations. The two most important features were the contacts between 1L and −1R and between −1L and 1R, which correspond to the inner and outer contacts immediately adjacent to the PCCR.
Properties of exon's loop-outs supported by RIC-seq
Next, we focused on exons looped out by PCCRs (exon loop-outs) and estimated their median inclusion rates (percent-spliced-in, PSI, or Ψ) in RNA-seq experiments conducted in the same cell lines. As expected, Ψ distribution consistently shifted toward lower values with increasing level of RIC-seq support k (Fig. 4A; Supplemental Fig. S6), in agreement with the finding that exon inclusion drops with increasing stability of the surrounding PCCR (Kalmykova et al. 2021). To dissect differential alternative splicing, we assessed the correlation between the inclusion rate of exons looped out by PCCRs and the number of RIC-seq reads supporting the PCCR. On the other hand, we selected PCCRs that were supported by at least five reads in at least three cell lines and compared the inclusion rates of exons looped out by these PCCRs with the inclusion rates of exons looped out by PCCRs with lower support level. The distribution of Pearson correlation coefficients and the distribution of differences of exon inclusion rates in cell lines with versus without RIC-seq support (ΔΨ) were both shifted toward negative values (Wilcoxon signed-rank test, P-value <10−9), indicating that alternative splicing of an exon can be modulated by PCCRs assembling and disassembling in different cell lines (Fig. 4B). In the interest of future studies, we listed PCCRs with differential RIC-seq support in Supplemental Data File 1 along with the respective Ψ values.
FIGURE 4.
Exons enclosed in PCCRs. (A) The average exon inclusion rate (Ψ) of exons in PCCRs that are supported inside and outside (IO category) in at least k cell lines. CDF represents cumulative probability density function. See also Supplemental Figure S6 for the I and O categories, respectively. (B) The distribution of r(Ψ, support), the Pearson correlation coefficient between exon inclusion rate Ψ and the number of reads supporting exon loop-out by PCCR (top), and the distribution of ΔΨ = Ψh − Ψl, where Ψh and Ψl are the average Ψ values in cell lines with and without RIC-seq support, respectively (bottom). Both distributions significantly depart from zero in the negative direction (Wilcoxon test, P-value <10−9). (C) The proportion of RIC-seq-supported cases among exons looped out by a PCCR, for PTC-containing (PTC+) exons and cassette exons without PTC (PTC−). Whiskers indicate 95% confidence intervals for proportions. (D) The equilibrium free energy (ΔG) of PCCRs looping-out PTC+ and PTC− exons in the supported and unsupported groups. Statistically discernible differences at the 1% significance level and nonsignificant differences are denoted by ** and “ns,” respectively (one-tailed Mann–Whitney test).
Next, we subdivided exons enclosed in PCCRs into two groups, those looped out by a PCCR supported by at least one RNA contact IO in at least one cell line (k ≥ 1), and those looped out by PCCRs without support (k = 0). Since RIC-seq tends to identify RNA contacts at shorter distances, we matched each exon from the first group with a randomly chosen exon from the second group and required additionally that it be looped out by the PCCR with the same spread. Thus, we generated two groups of exons looped out by PCCRs, with and without RIC-seq support, and with the same spread distribution. In what follows, these groups are referred to as supported and unsupported, respectively.
Introns flanking exons in the supported group were on average shorter than those in the unsupported group (Supplemental Fig. S7A). At the same time, supported exons were looped out by PCCRs located closer to their boundaries than unsupported exons (Supplemental Fig. S7B). Lengths of supported and unsupported exons were not significantly different (Supplemental Fig. S7C). When examining splice site strengths, we found that donor and acceptor splice sites of introns flanking supported exons tend to be weaker than those of introns flanking unsupported exons (Supplemental Fig. S8A,B). We hypothesized that weaker splice sites could allow the PCCR to form before flanking introns are spliced out. To address this, we estimated splice site strengths in post-transcriptional and cotranscriptional introns (see Materials and Methods) and found that, indeed, the former tend to have weaker splice sites compared to the latter (Supplemental Fig. S8C).
Next, we focused on a group of exons related to unproductive splicing (Lareau et al. 2007; Pervouchine et al. 2019). These exons, named poison exons, contain a premature termination codon (PTC), and their inclusion triggers the degradation of the mRNA by the nonsense mediated decay (NMD) pathway (Kurosaki et al. 2019). A significantly larger fraction of PTC-containing exons were supported as compared to cassette exons that did not contain a PTC (Fig. 4C). Furthermore, the equilibrium free energies of PCCRs mediating these loop-outs were significantly larger by absolute value for PTC-containing exons in comparison to exons without PTCs within the supported group, while no significant difference was detected in the unsupported group (Fig. 4D). These results indicate that RNA structure is actively involved in the control of unproductive splicing, possibly providing a capacity for regulated skipping of poison exons.
Taken together, these observations suggest that RIC-seq support is correlated with important PCCR properties. Thus, we created a table listing 97,398 PCCRs supported by RIC-seq in at least one cell line (Supplemental Data File 2). From this list, we selected two candidates, a PCCR with ΔG = −23.6 kcal/mol in the PHF20L1 gene (id838701) and a PCCR with ΔG = −21.6 kcal/mol in the CASK gene (id902118), on the basis of RIC-seq support (in at least two cell lines, both IO), evolutionary conservation, presence of a cassette exon inside PCCR, feasibility of cloning, gene expression level, exon inclusion rate, and other considerations (Kalmykova et al. 2021). In the next section, we present experimental validation of the impact of these PCCRs on splicing using AONs and site-directed minigene mutagenesis in human and murine cell lines.
Case studies
PHF20L1
The plant homeodomain finger protein 20-like 1 (PHF20L1) is a histone methylation reader that interacts with mono- and dimethylated lysines in H3K4me1, H4K20me1, H3K27me2, as well as with epigenetic factors, for example, DNMT1 (Kim et al. 2006; Estève et al. 2014; Hou et al. 2020). It is also involved in maintaining the stability of methylated SOX2 and pRb proteins (Carr et al. 2017; Zhang et al. 2018). PHF20L1 is essential for epigenetic inheritance in mammals, cell pluripotency and differentiation, and the maintenance of the G1-S phase checkpoint, while aberrations of its expression are common for breast, colorectal, and ovarian cancers (Wrzeszczynski et al. 2011; Yu et al. 2017; Zhang et al. 2019).
The PHF20L1 gene produces two transcript isoforms, PHF20L1-a and -b, which differ by the inclusion of the alternative cassette exon 6 (Estève et al. 2014). Introns flanking exon 6 contain a pair of conserved complementary regions, R1 and R2, which can form a stable RNA structure (ΔG = −23.6 kcal/mol) and contribute to the regulation of exon 6 alternative splicing (Fig. 5A). This PCCR was identified as supported by RNA contacts observed in RIC-seq experiments in HeLa and HepG2 cell lines (Supplemental Fig. S9).
FIGURE 5.
Case study of PHF20L1. (A) Genomic organization of the cassette exon 6, the complementary sequences R1 and R2, and their respective AONs (AON1 and AON2). (B) The treatment with AON1 or AON2 almost completely suppresses exon 6 skipping; NT (nontreated control); C (control AON); neg (negative control). (C) The scheme of a minigene expressing a fragment of the PHF20L1 gene. Primer locations are indicated by the arrows. (D) Mutagenesis in the minigene. In m1 and m2 mutants, the base pairing between R1 and R2 is disrupted by sequence reversal; in the compensatory double mutant m1m2, it is restored. (E) Exon 6 skipping in m1 and m2 mutants is suppressed, but in m1m2 it returns to that of the WT. Statistically discernible differences at the 1%, and 0.1% significance level are denoted by **, and ***, respectively. Nonsignificant differences are denoted by “ns.”
To evaluate the role of R1/R2 base pairing in alternative splicing of the endogenous transcripts, we designed AONs complementary to R1 and R2, called AON1 and AON2, respectively, and measured the rate of exon 6 inclusion in response to increasing AONs concentrations. Both qualitative and quantitative RT-PCR analysis (Fig. 5B) confirmed that 5 nM or higher concentration of either of two AONs was sufficient to substantially increase exon 6 usage in the endogenous transcript.
Next, we constructed a minigene that contains a part of PHF20L1 gene spanning from exon 5 to exon 7 (Fig. 5C) and introduced mutations that disrupt and restore the base pairing between R1 and R2 (Fig. 5D). Mutations disrupting R1/R2 base pairing, called m1 and m2, promoted exon 6 inclusion, while the compensatory double mutant m1m2, which restores R1/R2 base pairing, also reverted the splicing pattern to that of the wild type (WT) qualitatively and quantitatively (Fig. 5E). The response of exon 6 inclusion to AONs and the reversal of the WT splicing in the compensatory double mutant strongly indicate that exon 6 inclusion is controlled by R1/R2 base pairing.
CASK
CASK encodes a human calcium/calmodulin-dependent serine protein kinase; however, the protein does not possess kinase activity (Cohen et al. 1998) and functions as a scaffolding protein involved in presynaptic and postsynaptic transmission (Li et al. 2002; Tabuchi et al. 2002). In mice, the deletion of CASK is lethal (Wilson et al. 1993), while its inactivation in pancreatic β cells affects glucose homeostasis and insulin sensitivity (Liu et al. 2021). CASK interacts with Tbr-1, a T-box transcription factor involved in forebrain development (Wang et al. 2004) and possibly playing a role in epithelial cell polarity establishment in mammals (Caruana 2002).
Alternative splice isoforms of CASK differ by the inclusion or skipping of cassette exon 19, which was suggested to modulate CASK binding to other proteins across developmental stages and in cell populations with different neuronal activity (Dembowski et al. 2012; Tibbe et al. 2021). Introns flanking exon 19 contain a PCCR formed by R3 and R4, which are located more than 3000 nt apart from each other and potentially loop out exon 19 (Fig. 6A). The R3/R4 interaction is supported by RIC-seq contacts in IMR90, hNPC, and GM12878 cell lines (Supplemental Fig. S10).
FIGURE 6.
Case study of CASK. (A) Genomic organization of the cassette exon 19, complementary sequences R3 and R4, and their respective AONs (AON3 and AON4). (B) The treatment with AON3 or AON4 significantly reduces exon 19 skipping; NT (nontreated control); C (control AON); neg (negative control). (C) The scheme of a minigene expressing a fragment of the CASK gene. (D) In m3 and m4 mutants, the base pairing between R3 and R4 is disrupted by sequence reversal; in the compensatory double mutant m3m4, it is restored. (E) Exon 19 skipping in m3 and m4 is suppressed, but in m3m4 it returns to that of the WT. Statistically discernible differences at the 5%, 1%, and 0.1% significance level are denoted by *, **, and ***, respectively. Nonsignificant differences are denoted by “ns.”
The exon 19 inclusion rate substantially increased in response to the treatment by AON3 and AON4, locked nucleic acid (LNA) complementary to R3 and R4, respectively (Fig. 6B). Next, we assembled a minigene that contains a fragment of CASK spanning between exons 18 and 20 (Fig. 6C) and introduced disruptive and compensatory mutations as before (Fig. 6D). Single mutations (m3 or m4) expectedly led to the loss of exon skipping, while the compensatory mutant (m3m4) restored the exon inclusion rate of the WT (Fig. 6E).
mPHF20L1 and mCASK
Since the nucleotide sequences of R1/R2 and R3/R4 are highly evolutionarily conserved, we chose to interrogate mPHF20L1 and mCASK, the murine orthologs of PHF20L1 and CASK, using the same AONs. The treatment of NIH 3T3 mouse fibroblasts with AON1 and AON2 led to a significant increase in mPHF20L1 exon 6 inclusion rate (Supplemental Fig. S11). Similarly, the rate of exon 19 inclusion in mCASK also increased by on average 40% with increasing AON3 and AON4 concentration (Supplemental Fig. S12).
Furthermore, we constructed a minigene carrying the respective fragments of mPHF20L1 and mCASK and applied the same mutagenesis strategy as in PHF20L1 and CASK. In mPHF20L1, single mutants (m1 or m2) promoted exon 6 inclusion, while the compensatory double mutant (m1m2) recovered the WT splicing ratio; however, not completely (Supplemental Fig. S13). Again, single mutants (m3 or m4) promoted exon 19 inclusion in mCASK, but the introduction of compensatory double mutation (m3m4) led to a partial recovery of the WT splicing (Supplemental Fig. S14). The incomplete recovery of the WT splicing in murine cells could be attributed to functional differences between splicing regulatory circuits in humans and in mice, which could be affected by the mutations we introduced. Nevertheless, the response to the AON treatment and the reversal of splicing in the double mutants together indicate that complementary base pairings in R1/R2 and R3/R4 represent an evolutionarily and functionally conserved mechanism of alternative splicing control.
DISCUSSION
The structure of eukaryotic transcripts has a remarkably complex multilevel organization, in which local secondary structure elements are hierarchically assembled into a tertiary structure that is stabilized by RBPs and long-range intramolecular base pairings (Pervouchine 2018). RIC-seq technology has made it possible to take snapshots of this complex and dynamic picture by detecting RNA contacts at single-base resolution (Cai et al. 2020; Cao et al. 2021). These contacts, however, do not necessarily correspond to RNA structure but rather confirm that two RNA strands are located proximally in 3D, possibly due to intra- and intermolecular RNA–RNA interactions, or possibly due to interaction mediated by proteins. For instance, hnRNPA1 and PTB splicing factors bound to the pre-mRNA tend to form dimers, thus creating contacts that are not mediated by RNA base pairings (Blanchette and Chabot 1999; Oberstrass et al. 2005).
The analysis presented here interrogates the relationship between intramolecular RNA contacts from a panel of RIC-seq experiments and PCCRs obtained through bioinformatic predictions, presumably capturing RNA proximity mediated by long, almost perfect stretches of complementary nucleotides. While some of the functional PCCRs, including ones described here in PHF20L1 and CASK genes, are strongly supported by inner and outer RNA contacts, many others lack RIC-seq support because RNA contacts are highly sparse and variable. In an attempt to identify RIC-seq contacts in structured RNA classes such as miRNA precursors, we observed that most of them are supported only by inner contacts corresponding to the apical loop of the hairpin, but not outer contacts (data not shown). This observation underscores a fundamental distinction between the analysis of RIC-seq and Hi-C experiment results (Lieberman-Aiden et al. 2009). In the latter it is a common practice to average chromatin contacts at kilobase or megabase scale, while the assessment of RNA contacts by RIC-seq intrinsically targets the single-nucleotide level. Currently, it appears that a reliable inference from RIC-seq requires aggregation of different biological conditions to yield a consolidated RNA structure shared by these conditions. However, the correlated changes between RNA structure and alternative splicing also suggest that the same RNA fragment may be folded and, consequently, spliced differently in different conditions, which remains a matter of future investigations.
The relationship between RNA contacts and complementarity manifests itself in different PCCR features such as higher stability, significance of compensatory substitutions, high abundance of A-to-I editing sites or forked eCLIP peaks. All these features are hallmarks of functional RNA structures, which correspond to PCCRs with lower false discovery rates (Kalmykova et al. 2021). Long-range RNA structures were proposed to bring the 5′- and 3′-splice sites closer, thereby facilitating splicing (Warf and Berglund 2010). To the contrary, we found that exons enclosed in PCCRs supported by RIC-seq are flanked by shorter introns, in which the PCCRs are positioned closer to the skipped exon, and their splice sites resemble those of introns that are spliced post-transcriptionally. This finding, and the decrease of the exon inclusion rate with increasing RIC-seq support, together suggest that exons looped out by PCCRs tend to be spliced post-transcriptionally, or else their flanking introns would have been spliced out on a “first-come-first-served” basis before the RNA structure had a chance to assemble (Dujardin et al. 2013). Another remarkable finding is that poison exons tend to be looped out by more stable PCCRs, suggesting that the regulatory mechanisms controlling gene expression through unproductive splicing could be largely mediated by RNA structure.
In regard to the identification of RNA structures that impact pre-mRNA splicing, our analysis had no a priori connection to splicing except the requirement that PCCR loop out a cassette exon. Hence, PCCRs that regulate alternative splicing in PHF20L1 and CASK genes were identified by serendipity. Several other PCCRs with RIC-seq support were also probed by AONs without an apparent impact on splicing, raising the question of whether they were false negatives of the AON screening method or RNA structures truly unrelated to splicing. The dynamic range of Ψ values in response to AON treatment targeting the endogenous transcript and in mutagenesis of the minigene expressing a gene fragment were drastically different, for example, 0%–100% for the AON and 80%–100% for m1/m2 mutants in the case of PHF20L1 (Fig. 5). These differences are due to the elevated baseline level of exon inclusion in the minigene, which lacks the rest of the gene sequence that is important for splicing—an effect seen in earlier studies as well as the incomplete recovery of the WT splicing in the compensatory double mutant (Kalinina et al. 2021). Changes in the amount of transfected minigene did not affect this discrepancy (Supplemental Fig. S15). The AON analysis nonetheless uncovered remarkable differences between human and murine PHF20L1 and CASK genes, which indicate that alternative splicing is not regulated by RNA structure alone, and that other critical players expressed in one or the other species may be involved.
The functional relevance of the exon skipping events in PHF20L1 and CASK and their possible regulation deserve further investigation. Here we can only speculate that exon 6 skipping in PHF20L1 could be related to modulating the function of its TUDOR domain that is encoded within exons 4 and 5. In CASK, the inclusion of exon 19 (69 bp) or exon 20 (36 bp) in murine neurons can be induced by KCl treatment, which mimics neuronal excitation (Dembowski et al. 2012). It was hypothesized that different splice isoforms, including exon 19 skipping isoform, have altered binding properties to CASK partners during development stages, as well as in different cell populations with distinct neuronal activity (Tibbe et al. 2021). These examples demonstrate the importance of the resource provided in this work, which has many practical applications to uncovering new regulatory roles of RNA structures in human disease (Wang et al. 2021).
Conclusion
This study matches for the first time the bioinformatics predictions of long-range RNA structure, in the form of PCCRs, and RNA contacts inferred from RIC-seq experiments conducted in seven human cell lines. Based on their characterization, two long-range RNA structures in human disease-associated genes were identified and experimentally validated. This work outlines a plan for future discovery of functional long-range RNA structures, for which it provides a track hub visualizing PCCRs and their RIC-seq support through the UCSC Genome Browser.
MATERIALS AND METHODS
Genomes and gene annotations
The February 2009 (hg19) assembly of the human genome and GENCODE annotation v41lift37 were downloaded from the Genome Reference Consortium (Church et al. 2011) and GENCODE website (Harrow et al. 2012), respectively. For each annotated exon, the shortest flanking upstream and downstream introns were considered. A PCCR was considered as looping out an exon if its constituent complementary regions were fully included in its flanking introns. MaxEntScan software (Yeo and Burge 2004) was applied to the nucleotide sequences around splice sites (−3…+6 nt for the donor and −15…+3 nt for the acceptor splice site) to predict splice site strengths. The list of 4705 poison exons and the list of 15,528 cassette exons from protein-coding genes were obtained as described earlier (Pervouchine et al. 2019).
RIC-seq and RNA-seq experiments
The results of RIC-seq experiments conducted in human cell lines including GM12878, H1, HeLa, HepG2, IMR90, K562, and hNPC (two bioreplicates each) were downloaded from the Gene Expression Omnibus under the accession numbers GSE127188 and GSE190214 in FASTQ format. The matched control RNA-seq experiments were downloaded from the ENCODE consortium under the accession numbers listed in Supplemental Table S4. RIC-seq and the matched RNA-seq data were processed by the RNAcontacts pipeline with the default settings (Margasyuk et al. 2023). The pipeline generated a track hub for the UCSC Genome Browser (Raney et al. 2014; Lee et al. 2020) that is available at https://github.com/smargasyuk/PHRIC-hub.
Splicing quantification and analysis
RNA-seq experiments were processed by the IPSA pipeline to obtain split read counts supporting splice junctions with the default settings (Pervouchine et al. 2013). The exon inclusion rate (Ψ, PSI, or percent-spliced-in) was calculated according to the equation:
where inc is the number of reads supporting exon inclusion and exc is the number of reads supporting exon exclusion. Ψ values with the denominator below 10 were considered unreliable and discarded.
Each PCCR containing at least one exon was characterized by the average Ψ value of all exons inside it in each cell line i, denoted by Ψi. PCCRs with alternatively spliced exons were defined as those having at least two distinct Ψi values for different i. We computed the average Ψi across all i to characterize the general tendency for exon skipping as a function of the number of cell lines with RIC-seq support. Similarly, we selected PCCR that are supported by at least five reads in at least three cell lines, in which Ψi is also defined, and computed Ψh, the average Ψi across cell lines with RIC-seq support, Ψl, the average Ψi across cell lines without RIC-seq support, and their difference, ΔΨ = Ψh − Ψl.
Co- and post-transcriptional introns
Two bioreplicates of RNA-seq experiments performed by the ENCODE Consortium (Djebali et al. 2012) on the nuclear and cytoplasmic poly(A)+ RNA fractions in the HepG2 cell line (accession numbers ENCFF148RNW, ENCFF064IOO, ENCFF887ZOX, and ENCFF280XOG) were used to categorize introns as cotranscriptional and post-transcriptional. The completeness of the splicing (coSI) metric was computed for each intron using IPSA software (Tilgner et al. 2012; Pervouchine et al. 2013). An intron was categorized as cotranscriptional if the coSI metric in both nuclear and cytoplasmic fractions was greater than 0.9. An intron was categorized as post-transcriptional if the nuclear coSI was below 0.3 and the cytoplasmic coSI was greater than 0.9.
Random forest classifier
To construct a random forest model, we computed the number of RIC-seq split reads supporting contacts between 50-nt bins, −3L, −2L, −1L, 1L, 2L, 3L and −3R, −2R, −1R, 1R, 2R, 3R for each PCCR as shown in Figure 3A. The model was implemented using the RandomForestClassifier routine from scikit-learn Python package v1.1.1, using all PCCRs with at least one contact. The response variable (forked eCLIP peak) was modeled as a function of the number of reads supporting each pair of bins and the spread of the PCCR. At that, the data set was split into the training set (66%) and the test set (33%) to estimate true positive rate (TPR) and false positive rate (FPR). The random forest classifier was used at the default settings: 100 trees forest optimizing Gini impurity; tree depths were selected dynamically (leaves are expanded until they are pure or contain only one sample).
Statistical tests
The data were analyzed using Python version 3.8.2 and R statistics software version 3.6.3. Nonparametric tests were performed using normal approximation with continuity correction. In all figures, the significance levels 0.05, 0.01, and 0.001 are denoted by (*), (**), and (***), respectively.
Cell culture and transfection
Human A549 lung adenocarcinoma cells and NIH 3T3 mouse fibroblasts were cultured in Dulbecco's modified Eagle's medium/nutrient mixture F-12 supplemented with 10% fetal bovine serum and 1% GlutaMAX (Thermo Fisher Scientific). 1.5 × 105 cells were seeded in a 12-well plate. One thousand or five hundred nanograms of WT or mutated minigene plasmids were transfected into NIH 3T3 or A549 cells using Lipofectamine 3000 (Invitrogen) for 24 h. AON (13-mer) transfection was performed with Lipofectamine RNAiMAX (Invitrogen) in OptiMEM serum-reduced media (Gibco) for 48 h.
Antisense oligonucleotides
All AONs were designed as LNA-based with a DNA substitution at every second nucleotide (Touznik et al. 2017). Synthesis of LNA/DNA mixmers was carried out by Syntol JSC. Sequences of AONs are listed in Supplemental Table S5.
Minigene construction and mutagenesis
Minigenes encoding CASK exons 18–20 and PHF20L1 exons 5–7 were cloned into a pRK5 expression vector containing CMV promoter for the evaluation of target exon inclusion rates. The minigene sequence was amplified from A549 or NIH 3T3 genomic DNA using Q5 High-Fidelity DNA polymerase (New England Biolabs). The human PHF20L1 minigene was assembled using restriction-free cloning. The murine PHF20L1 minigene was assembled using NEBuilder HiFi DNA Assembly Master Mix (New England Biolabs). The human CASK fragment of DNA was cloned using blunt end cloning protocol. The murine CASK was cloned into the ClaI and SalI sites of the pRK5 vector. All mutations in minigenes were introduced by PCR-based site-directed mutagenesis. All primers for cloning and mutagenesis are listed in Supplemental Tables S6 and S7, respectively. All constructs were confirmed by sequencing.
RNA extraction
Total RNA was isolated by a guanidinium thiocyanate phenol–chloroform method using ExtractRNA Reagent (Evrogene) or the PureLink RNA Mini Kit (Invitrogen) (Chomczynski and Sacchi 1987). One thousand nanograms of total RNA was first subjected to RNase-Free DNase I digestion (Thermo Fisher Scientific) at 37°C for 30 min to remove contaminating gDNA. Next, 500 ng of total RNA was used for cDNA synthesis using a Maxima First Strand cDNA Synthesis Kit for RT-qPCR (Thermo Fisher Scientific) to a final volume of 10 µL. cDNA was diluted 1:10 with nuclease-free water for qPCR and RT-PCR analysis.
RT-PCR
RT-PCR analysis was used for all human CASK experiments to assess the ratio of splice isoforms. Reactions were carried out using PCR Master Mix (2×) (Thermo Scientific). RT-PCR was carried out under the following conditions: 95°C for 3 min, 35 cycles at 95°C for 30 sec, 54°C for 30 sec, 72°C for 1 min, ending at 72°C for 5 min. The resulting products were analyzed on a 3% agarose gel stained with ethidium bromide and visualized using ChemiDoc XRS+ (Bio-Rad). The relative amounts of the resulting products were analyzed by scanning the gels and determining the intensities of ethidium bromide stained bands using Image Lab software version 1.36b (Bio-Rad). Endpoint PCR for all mouse CASK experiments was performed using 0.25 U recombinant Taq DNA Polymerase (Thermo Fisher Scientific) with 2 mM MgCl2, 1× KCl buffer, 2 µL of cDNA (same as for qPCR), 0.2 mM of each dNTPs, and 420 nM of each forward and reverse primer (Supplemental Tables S8, S9). Cycling conditions were as follows: 95°C for 5 min, followed by 35 cycles at 95°C for 30 sec, 61°C for 30 sec, and 72°C for 30 sec and ended at 72°C for 5 min. PCR products were visualized on the agarose gel.
qPCR
qPCR reactions were performed in triplicate in a final volume of 12 µL in 96-well plates with 420 nM gene-specific primers, 2 µL of cDNA using Maxima SYBR Green/ROX qPCR Master Mix (2×) (Thermo Fisher Scientific). A sample without reverse transcriptase enzyme was included as control to verify the absence of genomic DNA contamination. Amplification of the targets was carried out on the CFX96 Real-Time system (Bio-Rad); cycle parameters were as follows: 95°C for 2 min, followed by 45 cycles at 95°C for 30 sec, 61°C for 30 sec, and 72°C for 30 sec, ending at 72°C for 5 min. The expression of genes and gene isoforms was calculated using the assay-specific PCR efficiency. The result of replicate measurement was considered an outlier and rejected if its Cq value was not in the range of 0.5 cycles (De Ronde et al. 2017).
DATA DEPOSITION
The data obtained in this work (Supplemental Data Files 1, 2) are available through the Zenodo repository (https://zenodo.org/record/7945058). The code is available at the Github repository https://github.com/smargasyuk/PHRIC-hub.
SUPPLEMENTAL MATERIAL
Supplemental material is available for this article.
ACKNOWLEDGMENTS
The authors thank Margarita Vorobiova for feedback on the manuscript. All authors acknowledge Professor O.A. Dontsova and Professor A.A. Mironov for insightful discussions. S.M., M.P., D.S., and D.P. acknowledge the research grant of the Ministry of Science and Higher Education of the Russian Federation (075-10-2021-116, IGK0752RGO0002). C.C. acknowledges the research grant from the National Key Research and Development Program of China (2021YFE0114900).
Author contributions: D.P. designed and supervised the study; D.P. and S.M. performed data analysis; C.C. provided RIC-seq data; M.K. and D.S. performed the experiments in PHF20L1 and CASK; M.P. and D.S. performed the experiments in mPHF20L1 and mCASK; D.P., S.M., and M.P. wrote the first draft of the manuscript. All authors edited the final version of the manuscript.
Footnotes
Article is online at http://www.rnajournal.org/cgi/doi/10.1261/rna.079508.122.
Freely available online through the RNA Open Access option.
MEET THE FIRST AUTHOR
Sergei Margasyuk.
Meet the First Author(s) is an editorial feature within RNA, in which the first author(s) of research-based papers in each issue have the opportunity to introduce themselves and their work to readers of RNA and the RNA research community. Sergei Margasyuk is the first author of this paper, “RNA in situ conformation sequencing reveals novel long-range RNA structures with impact on splicing.” Sergei is a second-year PhD student in the group of Professor Dmitri D. Pervouchine at the Skolkovo Institute of Science and Technology in Moscow, Russia. Sergei's background is in bioinformatics and computational biology, and his PhD thesis is dedicated to studying regulation of alternative splicing by RNA-binding proteins and the role of RNA secondary structures in this regulation.
What are the major results described in your paper and how do they impact this branch of the field?
In this paper, we analyze together the in silico predicted long-range RNA structures and the experimental evidence for the formation of RNA contacts derived from RIC-seq data. RNA structures supported by contacts turned out to have remarkable properties, for example, they tend to suppress exon inclusion in the conditions when the structures are present. We characterize these exon loop-outs statistically and show that they tend to have weaker splice sites, so we conjecture that they are spliced post-transcriptionally and their splicing is regulated by pre-mRNA folding. My colleagues in the wet lab showed experimentally that RNA structure in two disease-associated genes, PHF20L1 and CASK, indeed modulates exon inclusion in human (Marina Kalinina) and murine (Marina Petrova) cell lines. Our study adds to the knowledge on eukaryotic RNA structures that regulate alternative splicing.
What led you to study RNA or this aspect of RNA science?
RNA is my favorite molecule, first and foremost. Second, in recent years a lot of new technologies have emerged to study RNA, which changed the focus of genomic research from static features such as regulatory sequence elements to processes such as dynamic expression regulation. Further research in this field brings us closer to an understanding of cellular dynamics patterns, such as in differentiation or stimulus response. Splicing regulation is a particular case of expression regulation that allows cells to express different protein isoforms. The regulation of splicing is mediated by RNA-binding proteins, and the process is poorly understood because of the lack of clear sequence binding preference for many of these proteins. However, for some proteins, their binding and regulatory effect is mediated by RNA secondary structure, so research of the relation between RNA-binding proteins, RNA secondary structure and splicing regulation is particularly exciting to help understand all these mechanisms.
During the course of these experiments, were there any surprising results or particular difficulties that altered your thinking and subsequent focus?
When we started analyzing the RIC-seq experimental data in seven different cell lines, we hoped to see or come up with a number of high fidelity RNA contacts that behave differently in different conditions. However, because of the nature of RIC-seq data, the support levels for each individual RNA structure turned out to be low, so that we had to pool the experiments together and use the number of supporting experiments as the proxy for structure fidelity. Despite this difficulty, we were able to identify a handful of cases with different splicing with and without the looping-out RNA structure.
What are some of the landmark moments that provoked your interest in science or your development as a scientist?
One of the first events that directed me toward the life sciences was the biological competition organized by the Faculty of Biology of Moscow State University, in which I participated in the sixth grade. During the second stage of this event I had an opportunity to talk to biologists from different fields, and I was impressed by the diversity of objects and scientific paradigms in this area. At that time I did not develop a deep interest in studying the descriptive fields of biology and focused on the math and computers in high school instead. But as I was preparing for university, I realized that modern biology is tightly connected to information processing, so I was able to enter the Faculty of Bioinformatics and study the biology using the methods I was interested in.
What are your subsequent near- or long-term career plans?
Well, my next immediate plan is to defend my PhD thesis. Most likely, this publication will be the core of it. The universe of transcriptomic data is large and expanding, so I view myself as working in it for some time in the future.
REFERENCES
- Aw JG, Shen Y, Wilm A, Sun M, Lim XN, Boon KL, Tapsin S, Chan YS, Tan CP, Sim AY, et al. 2016. In vivo mapping of eukaryotic RNA interactomes reveals principles of higher-order organization and regulation. Mol Cell 62: 603–617. 10.1016/j.molcel.2016.04.028 [DOI] [PubMed] [Google Scholar]
- Baralle FE, Singh RN, Stamm S. 2019. RNA structure and splicing regulation. Biochim Biophys Acta Gene Regul Mech 1862: 194448. 10.1016/j.bbagrm.2019.194448 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Blanchette M, Chabot B. 1999. Modulation of exon skipping by high-affinity hnRNP A1-binding sites and by intron elements that repress splice site utilization. EMBOJ 18: 1939–1952. 10.1093/emboj/18.7.1939 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cai Z, Cao C, Ji L, Ye R, Wang D, Xia C, Wang S, Du Z, Hu N, Yu X, et al. 2020. RIC-seq for global in situ profiling of RNA–RNA spatial interactions. Nature 582: 432–437. 10.1038/s41586-020-2249-1 [DOI] [PubMed] [Google Scholar]
- Cao C, Cai Z, Ye R, Su R, Hu N, Zhao H, Xue Y. 2021. Global in situ profiling of RNA–RNA spatial interactions with RIC-seq. Nat Protoc 16: 2916–2946. 10.1038/s41596-021-00524-2 [DOI] [PubMed] [Google Scholar]
- Carr SM, Munro S, Sagum CA, Fedorov O, Bedford MT, La Thangue NB. 2017. Tudor-domain protein PHF20L1 reads lysine methylated retinoblastoma tumour suppressor protein. Cell Death Differ 24: 2139–2149. 10.1038/cdd.2017.135 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Caruana G. 2002. Genetic studies define MAGUK proteins as regulators of epithelial cell polarity. Int J Dev Biol 46: 511–518. [PubMed] [Google Scholar]
- Chomczynski P, Sacchi N. 1987. Single-step method of RNA isolation by acid guanidinium thiocyanate-phenol-chloroform extraction. Anal Biochem 162: 156–159. 10.1016/0003-2697(87)90021-2 [DOI] [PubMed] [Google Scholar]
- Church DM, Schneider VA, Graves T, Auger K, Cunningham F, Bouk N, Chen HC, Agarwala R, McLaren WM, Ritchie GR, et al. 2011. Modernizing reference genome assemblies. PLoS Biol 9: e1001091. 10.1371/journal.pbio.1001091 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cohen AR, Woods DF, Marfatia SM, Walther Z, Chishti AH, Anderson JM, Wood DF. 1998. Human CASK/LIN-2 binds syndecan-2 and protein 4.1 and localizes to the basolateral membrane of epithelial cells. J Cell Biol 142: 129–138. 10.1083/jcb.142.1.129 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dembowski JA, An P, Scoulos-Hanson M, Yeo G, Han J, Fu XD, Grabowski PJ. 2012. Alternative splicing of a novel inducible exon diversifies the CASK guanylate kinase domain. J Nucleic Acids 2012: 816237. 10.1155/2012/816237 [DOI] [PMC free article] [PubMed] [Google Scholar]
- De Ronde MWJ, Ruijter JM, Lanfear D, Bayes-Genis A, Kok MGM, Creemers EE, Pinto YM, Pinto-Sietsma SJ. 2017. Practical data handling pipeline improves performance of qPCR-based circulating miRNA measurements. RNA 23: 811–821. 10.1261/rna.059063.116 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Djebali S, Davis CA, Merkel A, Dobin A, Lassmann T, Mortazavi A, Tanzer A, Lagarde J, Lin W, Schlesinger F, et al. 2012. Landscape of transcription in human cells. Nature 489: 101–108. 10.1038/nature11233 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dujardin G, Lafaille C, Petrillo E, Buggiano V, Gómez Acuña LI, Fiszbein A, Godoy Herz MA, Nieto Moreno N, Muñoz MJ, Alló M, et al. 2013. Transcriptional elongation and alternative splicing. Biochim Biophys Acta 1829: 134–140. 10.1016/j.bbagrm.2012.08.005 [DOI] [PubMed] [Google Scholar]
- Estève PO, Terragni J, Deepti K, Chin HG, Dai N, Espejo A, Corrêa IR, Bedford MT, Pradhan S. 2014. Methyllysine reader plant homeodomain (PHD) finger protein 20-like 1 (PHF20L1) antagonizes DNA (cytosine-5) methyltransferase 1 (DNMT1) proteasomal degradation. J Biol Chem 289: 8277–8287. 10.1074/jbc.M113.525279 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Garcia-Lopez A, Tessaro F, Jonker HRA, Wacker A, Richter C, Comte A, Berntenis N, Schmucki R, Hatje K, Petermann O, et al. 2018. Targeting RNA structure in SMN2 reverses spinal muscular atrophy molecular phenotypes. Nat Commun 9: 2032. 10.1038/s41467-018-04110-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hale MA, Johnson NE, Berglund JA. 2019. Repeat-associated RNA structure and aberrant splicing. Biochim Biophys Acta Gene Regul Mech 1862: 194405. 10.1016/j.bbagrm.2019.07.006 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Harrow J, Frankish A, Gonzalez JM, Tapanari E, Diekhans M, Kokocinski F, Aken BL, Barrell D, Zadissa A, Searle S, et al. 2012. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res 22: 1760–1774. 10.1101/gr.135350.111 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hou Y, Liu W, Yi X, Yang Y, Su D, Huang W, Yu H, Teng X, Yang Y, Feng W, et al. 2020. PHF20L1 as a H3K27me2 reader coordinates with transcriptional repressors to promote breast tumorigenesis. Sci Adv 6: eaaz0356. 10.1126/sciadv.aaz0356 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kalinina M, Skvortsov D, Kalmykova S, Ivanov T, Dontsova O, Pervouchine DD. 2021. Multiple competing RNA structures dynamically control alternative splicing in the human ATE1 gene. Nucleic Acids Res 49: 479–490. 10.1093/nar/gkaa1208 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kalmykova S, Kalinina M, Denisov S, Mironov A, Skvortsov D, Guigó R, Pervouchine D. 2021. Conserved long-range base pairings are associated with pre-mRNA processing of human genes. Nat Commun 12: 2300. 10.1038/s41467-021-22549-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kim J, Daniel J, Espejo A, Lake A, Krishna M, Xia L, Zhang Y, Bedford MT. 2006. Tudor, MBT and chromo domains gauge the degree of lysine methylation. EMBO Rep 7: 397–403. 10.1038/sj.embor.7400625 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kudla G, Wan Y, Helwak A. 2020. RNA conformation capture by proximity ligation. Annu Rev Genomics Hum Genet 21: 81–100. 10.1146/annurev-genom-120219-073756 [DOI] [PubMed] [Google Scholar]
- Kurosaki T, Popp MW, Maquat LE. 2019. Quality and quantity control of gene expression by nonsense-mediated mRNA decay. Nat Rev Mol Cell Biol 20: 406–420. 10.1038/s41580-019-0126-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lareau LF, Inada M, Green RE, Wengrod JC, Brenner SE. 2007. Unproductive splicing of SR genes associated with highly conserved and ultraconserved DNA elements. Nature 446: 926–929. 10.1038/nature05676 [DOI] [PubMed] [Google Scholar]
- Lee CM, Barber GP, Casper J, Clawson H, Diekhans M, Gonzalez JN, Hinrichs AS, Lee BT, Nassar LR, Powell CC, et al. 2020. UCSC Genome Browser enters 20th year. Nucleic Acids Res 48: D756–D761. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li Y, Spangenberg O, Paarmann I, Konrad M, Lavie A. 2002. Structural basis for nucleotide-dependent regulation of membrane-associated guanylate kinase-like domains. J Biol Chem 277: 4159–4165. 10.1074/jbc.M110792200 [DOI] [PubMed] [Google Scholar]
- Lieberman-Aiden E, van Berkum NL, Williams L, Imakaev M, Ragoczy T, Telling A, Amit I, Lajoie BR, Sabo PJ, Dorschner MO, et al. 2009. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326: 289–293. 10.1126/science.1181369 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liu X, Sun P, Yuan Q, Xie J, Xiao T, Zhang K, Chen X, Wang Y, Yuan L, Han X. 2021. Specific deletion of CASK in pancreatic β cells affects glucose homeostasis and improves insulin sensitivity in obese mice by reducing hyperinsulinemia. Diabetes db201208. 10.2337/db20-1208 [DOI] [PubMed] [Google Scholar]
- Lu Z, Zhang QC, Lee B, Flynn RA, Smith MA, Robinson JT, Davidovich C, Gooding AR, Goodrich KJ, Mattick JS, et al. 2016. RNA duplex map in living cells reveals higher-order transcriptome structure. Cell 165: 1267–1279. 10.1016/j.cell.2016.04.028 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Margasyuk SD, Vlasenok MA, Li G, Cao C, Pervouchine DD. 2023. RNAcontacts: a pipeline for predicting contacts from RNA proximity ligation assays. Acta Naturae 15: 51–57. 10.32607/actanaturae.11893 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mathews DH, Sabina J, Zuker M, Turner DH. 1999. Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. J Mol Biol 288: 911–940. 10.1006/jmbi.1999.2700 [DOI] [PubMed] [Google Scholar]
- Oberstrass FC, Auweter SD, Erat M, Hargous Y, Henning A, Wenter P, Reymond L, Amir-Ahmady B, Pitsch S, Black DL, et al. 2005. Structure of PTB bound to RNA: specific binding and implications for splicing regulation. Science 309: 2054–2057. 10.1126/science.1114066 [DOI] [PubMed] [Google Scholar]
- Pervouchine DD. 2018. Towards long-range RNA structure prediction in eukaryotic genes. Genes (Basel) 9: 302. 10.3390/genes9060302 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pervouchine DD, Knowles DG, Guigó R. 2013. Intron-centric estimation of alternative splicing from RNA-seq data. Bioinformatics 29: 273–274. 10.1093/bioinformatics/bts678 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pervouchine D, Popov Y, Berry A, Borsari B, Frankish A, Guigó R. 2019. Integrative transcriptomic analysis suggests new autoregulatory splicing events coupled with nonsense-mediated mRNA decay. Nucleic Acids Res 47: 5293–5306. 10.1093/nar/gkz193 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Picardi E, D'Erchia AM, Lo Giudice C, Pesole G. 2017. REDIportal: a comprehensive database of A-to-I RNA editing events in humans. Nucleic Acids Res 45: D750–D757. 10.1093/nar/gkw767 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ramaswami G, Li JB. 2014. RADAR: a rigorously annotated database of A-to-I RNA editing. Nucleic Acids Res 42: D109–D113. 10.1093/nar/gkt996 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Raney BJ, Dreszer TR, Barber GP, Clawson H, Fujita PA, Wang T, Nguyen N, Paten B, Zweig AS, Karolchik D, et al. 2014. Track data hubs enable visualization of user-defined genome-wide annotations on the UCSC Genome Browser. Bioinformatics 30: 1003–1005. 10.1093/bioinformatics/btt637 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rivas E, Clements J, Eddy SR. 2017. A statistical test for conserved RNA structure shows lack of evidence for structure in lncRNAs. Nat Methods 14: 45–48. 10.1038/nmeth.4066 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sharma E, Sterne-Weiler T, O'Hanlon D, Blencowe BJ. 2016. Global mapping of human RNA-RNA interactions. Mol Cell 62: 618–626. 10.1016/j.molcel.2016.04.030 [DOI] [PubMed] [Google Scholar]
- Singh NN, Singh RN. 2019. How RNA structure dictates the usage of a critical exon of spinal muscular atrophy gene. Biochim Biophys Acta Gene Regul Mech 1862: 194403. 10.1016/j.bbagrm.2019.07.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tabuchi K, Biederer T, Butz S, Sudhof TC. 2002. CASK participates in alternative tripartite complexes in which Mint 1 competes for binding with Caskin 1, a novel CASK-binding protein. J Neurosci 22: 4264–4273. 10.1523/JNEUROSCI.22-11-04264.2002 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tibbe D, Pan YE, Reißner C, Harms FL, Kreienkamp HJ. 2021. Functional analysis of CASK transcript variants expressed in human brain. PLoS ONE 16: e0253223. 10.1371/journal.pone.0253223 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tilgner H, Knowles DG, Johnson R, Davis CA, Chakrabortty S, Djebali S, Curado J, Snyder M, Gingeras TR, Guigó R. 2012. Deep sequencing of subcellular RNA fractions shows splicing to be predominantly co-transcriptional in the human genome but inefficient for lncRNAs. Genome Res 22: 1616–1625. 10.1101/gr.134445.111 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Touznik A, Maruyama R, Hosoki K, Echigoya Y, Yokota T. 2017. LNA/DNA mixmer-based antisense oligonucleotides correct alternative splicing of the SMN2 gene and restore SMN protein expression in type 1 SMA fibroblasts. Sci Rep 7: 3672. 10.1038/s41598-017-03850-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang TF, Ding CN, Wang GS, Luo SC, Lin YL, Ruan Y, Hevner R, Rubenstein JL, Hsueh YP. 2004. Identification of Tbr-1/CASK complex target genes in neurons. J Neurochem 91: 1483–1492. 10.1111/j.1471-4159.2004.02845.x [DOI] [PubMed] [Google Scholar]
- Wang XW, Liu CX, Chen LL, Zhang QC. 2021. RNA structure probing uncovers RNA structure-dependent biological functions. Nat Chem Biol 17: 755–766. 10.1038/s41589-021-00805-7 [DOI] [PubMed] [Google Scholar]
- Warf MB, Berglund JA. 2010. Role of RNA structure in regulating pre-mRNA splicing. Trends Biochem Sci 35: 169–178. 10.1016/j.tibs.2009.10.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wilson JB, Ferguson MW, Jenkins NA, Lock LF, Copeland NG, Levine AJ. 1993. Transgenic mouse model of X-linked cleft palate. Cell Growth Differ 4: 67–76. [PubMed] [Google Scholar]
- Wrzeszczynski KO, Varadan V, Byrnes J, Lum E, Kamalakaran S, Levine DA, Dimitrova N, Zhang MQ, Lucito R. 2011. Identification of tumor suppressors and oncogenes from genomic and epigenetic features in ovarian cancer. PLoS ONE 6: e28503. 10.1371/journal.pone.0028503 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Xu B, Zhu Y, Cao C, Chen H, Jin Q, Li G, Ma J, Yang SL, Zhao J, Zhu J, et al. 2022. Recent advances in RNA structurome. Sci China Life Sci 65: 1285–1324. 10.1007/s11427-021-2116-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ye R, Hu N, Cao C, Su R, Xu S, Yang C, Zhou X, Xue Y. 2023. Capture RIC-seq reveals positional rules of PTBP1-associated RNA loops in splicing regulation. Mol Cell 83: 1311–1327. 10.1016/j.molcel.2023.03.001 [DOI] [PubMed] [Google Scholar]
- Yeo G, Burge CB. 2004. Maximum entropy modeling of short sequence motifs with applications to RNA splicing signals. J Comput Biol 11: 377–394. 10.1089/1066527041410418 [DOI] [PubMed] [Google Scholar]
- Yu H, Jiang Y, Liu L, Shan W, Chu X, Yang Z, Yang ZQ. 2017. Integrative genomic and transcriptomic analysis for pinpointing recurrent alterations of plant homeodomain genes and their clinical significance in breast cancer. Oncotarget 8: 13099–13115. 10.18632/oncotarget.14402 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang C, Hoang N, Leng F, Saxena L, Lee L, Alejo S, Qi D, Khal A, Sun H, Lu F, et al. 2018. LSD1 demethylase and the methyl-binding protein PHF20L1 prevent SET7 methyltransferase-dependent proteolysis of the stem-cell protein SOX2. J Biol Chem 293: 3663–3674. 10.1074/jbc.RA117.000342 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang Z, He T, Huang L, Ouyang Y, Li J, Huang Y, Wang P, Ding J. 2019. Two precision medicine predictive tools for six malignant solid tumors: from gene-based research to clinical application. J Transl Med 17: 405. 10.1186/s12967-019-02151-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ziv O, Gabryelska MM, Lun ATL, Gebert LFR, Sheu-Gruttadauria J, Meredith LW, Liu ZY, Kwok CK, Qin CF, MacRae IJ, et al. 2018. COMRADES determines in vivo RNA structures and interactions. Nat Methods 15: 785–788. 10.1038/s41592-018-0121-0 [DOI] [PMC free article] [PubMed] [Google Scholar]