Abstract
Alternative splicing is central to metazoan gene regulation, but the regulatory mechanisms are incompletely understood. Here, we show that G-quadruplex (G4) motifs are enriched ~3-fold near splice junctions. The importance of G4s in RNA is emphasised by a higher enrichment for the non-template strand. RNA-seq data from mouse and human neurons reveals an enrichment of G4s at exons that were skipped following depolarisation induced by potassium chloride. We validate the formation of stable RNA G4s for three candidate splice sites by circular dichroism spectroscopy, UV-melting and fluorescence measurements. Moreover, we find that sQTLs are enriched at G4s, and a minigene experiment provides further support for their role in promoting exon inclusion. Analysis of >1,800 high-throughput experiments reveals multiple RNA binding proteins associated with G4s. Finally, exploration of G4 motifs across eleven species shows strong enrichment at splice sites in mammals and birds, suggesting an evolutionary conserved splice regulatory mechanism.
Subject terms: Transcriptomics, Transcriptomics
Here the authors shows that G-quadruplexes, non-canonical DNA/RNA structures, can have a direct impact on alternative splicing and that binding of splicing regulators is affected by their presence.
Introduction
In eukaryotes, pre-mRNA processing is key to gene regulation and the generation of isoform diversity. Alternative splicing is arguably the most pivotal mRNA processing mechanism in higher eukaryotes, and in humans it contributes substantially to protein diversity, affecting 95% of mRNA transcripts1–4. Moreover, alternative splicing is essential for normal cell growth, cell death, differentiation, development, sex, circadian rhythms, responses to environmental changes and pathogen responses5–7.
The accuracy of pre-mRNA splicing relies on the recognition of three core signals; the 5′ splice site (5′ss), the 3′ splice site (3′ss), and the branch point. Despite the high fidelity observed during the splicing process, computational analyses have reported that human splice site core signals contain only half of the information required to accurately define exon/intron boundaries, implying the involvement of additional sequence features in splice site selection8,9. Some of the additional information necessary for splice site definition is found in a complex combination of cis-regulatory elements. These splice regulatory elements are short nucleotide sequences that are often bound by RNA-binding proteins (RBPs) that can either facilitate or inhibit the splice site recognition. The role of RBPs and splicing enhancers has been extensively studied, and the current understanding goes a long way towards a quantitative, predictive model of alternative splicing10,11.
In addition to RBPs, secondary RNA structures are known to modulate alternative splicing12, yet little is known about the impact of DNA secondary structures over alternative splicing. More than 20 non-canonical secondary structures have been previously reported for DNA13, including G-quadruplexes (G4s), hairpins, cruciforms and triplexes. Sequences that predispose the DNA to non-canonical conformations are known as non-B DNA motifs, and they have been characterised with respect to their roles in gene regulation. It has been demonstrated that non-B DNA motifs can influence several aspects, including transcription initiation, transcription termination, and translation initiation14–20. Among the non-canonical secondary structures, G4s are the most widely studied class as they have been reported to have an important role in the transcriptional regulation of clinically relevant genes. For example, a DNA G4 in the promoter of the oncogene MYC acts as a repressor21–23. Similarly, a DNA G4 in the promoter of the proto-oncogene KRAS has a negative effect on expression levels24. Moreover, DNA G4s are also implicated in genomic instability in cancer and neurodegenerative diseases21,25–28.
Many non-B DNA motifs will result in similar secondary structures at the RNA level29–31. In particular, abundant RNA G4 structure formation in the transcriptome has been demonstrated recently32. Importantly, the impact of RNA secondary structures in alternative splicing remains only partially understood33,34 and although a role of G4s in splicing has been suggested35–41 the extent of G4 impact on alternative splicing remains to be explored. Here we provide a genome-wide characterisation across multiple species of the role of non-B DNA motifs in alternative splicing.
Results
Sequence analysis and experimental data show that DNA G4s are enriched near splice sites
To investigate the contribution of non-canonical secondary structures to splice site definition, we systematically explored the distribution of seven known non-B DNA motifs. Since the secondary structures can form both at the DNA and the RNA level29,31,42, we initially considered both DNA strands. These motifs can be identified from the primary sequence, and we focused on the regions flanking human splice sites (Methods). The enrichment profiles varied substantially across the different non-B DNA motif categories (Fig. 1A), with exon-intron junctions displaying an acute enrichment for G4s, short tandem repeats and H-DNA motifs. The high enrichment of short tandem repeats was expected since a subset of them overlap with intronic polypyrimidine tracts, which are known to be part of the core splicing signal43,44. By contrast, the enrichment patterns for G4s or H-DNA motifs cannot be explained by the distribution of known splicing signals.
The highest enrichment was for G4 motifs, both at the 3′ss (2.44-fold) and the 5′ss (4.06-fold), and this prompted us to further investigate if they have a role in the regulation of splicing. It has been shown previously that GC content is higher in exonic regions45,46, but to control for the effect of the nucleotide composition of splice sites in the distribution of the GC-rich G4 motifs, we shuffled the 100 nt window on each side of the splice site while controlling for dinucleotide content. Comparing the observed frequency to the median from 1000 permutations we observed a corrected 2.53-fold and 2.73-fold enrichment for the frequency of G4 motifs at the 3′ss and 5′ss, respectively (p value < 0.001 in both 3′ss and 5′ss), indicating that the G4 patterns are not driven by the sequence composition of splice sites. Moreover, the enrichment was consistent between human and mouse splice sites (Fig. 1A and Supplementary Fig. 1a), and the colocalization of G4 motifs and splice sites are not driven by a small number of loci. Within 100 nt of each splice junction, we identified 19,987 and 20,088 G4 motifs at the 3′ss and 5′ss, respectively. In total, 31% of human genes contain a G4 motif near at least one splice site within a distance of 100 bp. G4 motifs were found within 100 nt for 8.79% and 8.83% of the 3′ss and 5′ss, respectively. The reported G4 motif frequencies are likely a conservative estimate since we do not take into account intermolecular G4s or G4s that do not adhere to the consensus motif (G ≥ 3N1-7G ≥ 3N1-7G ≥ 3N1-7G ≥ 3)47–49.
To evaluate if the G4 motifs that were enriched near splice sites lead to the formation of DNA G4 structures in vitro, we analysed previously published G4-seq data50,51. G4-seq utilises the fact that stable G4s can stall the DNA polymerase in vitro, thereby allowing high-throughput sequencing to be used to detect DNA G4s at high resolution. Chambers et al. provided the first method that enabled genome-wide detection of sites with DNA G4 formation potential, and they identified non-canonical structural features of DNA G4 formation as well as regions in the genome that are more likely to harbour DNA G4s, such as 5′ untranslated regions and splicing sites. We first measured the distribution of DNA G4s relative to the splice sites for HEK-293T cells in Pyridostatin (PDS) and K+ treatments from Marsico et al.51. PDS is a highly potent small molecule that binds and stabilises G4s. PDS, K+ and Na+ molecules selectively interact with G4s and stabilise them52. Compared to Na+, K+ stabilises G4 assemblies to a larger extent. In both conditions, we observed an enrichment of G4-seq peaks relative to the 3′ss and 5′ss, but with a more pronounced DNA G4 enrichment in PDS treatment compared to K+ treatment (Fig. 1B). The majority of DNA G4 positions derived from G4-seq peaks in K+ and PDS treatments did not overlap consensus G4 motifs (Fig. 1C). Since the G4-seq assay can also identify DNA G4s with non-canonical motifs, it is to be expected that the overlap with the consensus G4 motifs would be limited. We replicated our results in primary human B lymphocytes (NA18507) in Na+-K+ and Na+-PDS conditions50, both of which promote G4 formation. In both conditions, only ~25% of G4-seq derived peaks are captured by the consensus G4 motif. Nevertheless, at splice sites we found an enrichment comparable to that obtained from the motif analysis, directly implicating G4 formation at splice sites (Supplementary Fig. 1b–d). Differences between the G4-seq datasets are likely the result of the differences in the experimental settings and treatments between the two studies50,51. For both the PDS and K+ treatments, we find that a substantial fraction of the genome is affected, with 31.72 and 10.25% of splice junctions having a G4 within 100 bp. In addition, 67 and 35% of human genes contain a G4-seq peak from PDS and K+ treatments within 100 bp of a splice junction, supporting our earlier observations using the consensus G4 motif. As a result of these findings, we conclude that DNA G4s are a pervasive feature across splicing junctions.
DNA G4 distribution patterns are found across splice site categories
We extracted five different types of splicing sites (exon skipping, intron retention, alternative donor, alternative acceptor and mutually coordinated events) from VastDB53, a curated alternative splicing database. We analysed the enrichment profile of G4 motifs across the different types of alternative splicing events, and we found distinct differences (Fig. 1D and Supplementary Fig. 1e–g). Cassette exons are the most common type of alternative splicing event54, and we found an enrichment profile for G4s consistent with our previous results (Fig. 1A, D). Interestingly, for both alternative acceptor and alternative donor events, we found that the enrichment was higher for the proximal than for the distal sites (Fig. 1D). These results provide evidence that G4s are associated with multiple splice site categories.
DNA G4s are preferentially found on the non-template strand
Thus far, we have analysed G4 sequences at the DNA level, but following transcription some of these could lead to RNA G4 formation. Since G4 motifs are strand-specific, we oriented each instance relative to the direction of transcription. Thus, we considered DNA G4s found at the template (non-coding) and non-template (coding) strands separately and found them statistically enriched on the non-template strand (Binomial tests, p value < 0.001 at 3′ss and 5′ss). DNA G4s were enriched at both strands. At 3′ss the enrichment was 3.01-fold and 2.78-fold enrichment scores at the non-template and template strands, respectively (Supplementary Fig. 2). At the 5′ss the difference between the strands was larger with 5.56-fold and 2.38-fold at the non-template and template strands, respectively (Supplementary Fig. 2). Therefore, there was an asymmetric enrichment between the template and non-template strands at the 5′ss, but only a weak asymmetry at the 3′ss.
DNA G4s are enriched at weak splice sites
Weak splice sites are highly involved in alternative splicing and often contain additional regulatory elements55–57. To explore the distribution of G4s across weak and strong splice sites, we calculated a splicing strength score for all internal exons based on splice site position weight matrices55,58. We grouped splice sites into four quantiles based on the splicing strength scores and explored the enrichment levels of G4 motifs for each quantile separately. We found an inverse relationship between the calculated splicing strength score and G4 enrichment, with the weakest splice sites having the highest enrichment of G4s both at the 3′ss and the 5′ss with 2.77-fold and 4.95-fold enrichment, respectively (Supplementary Fig. 3). For both mouse and human, the splicing strength scores for splice junctions with a G4 are significantly lower than for splice junctions without a G4 (Mann–Whitney U, p value <0.001). The same inverse relationship between the splicing strength score at splice sites and the DNA G4 enrichment is found in the G4-seq data for human in PDS and K+ treatment51 and for PDS-Na+ and K+-Na+ treatment50 (Supplementary Fig. 4a–d), (Mann–Whitney U, p values < 0.001).
We also investigated if there was a strand asymmetry when considering the splicing strength scores. Indeed, we found a bias in the splicing strength scores dependent on the strand orientation of G4s (Mann–Whitney U, 3′ss p value <0.05, 5′ss p value < 0.001). At the 3′ss the enrichment for splice junctions with the weakest splicing strength scores at the template and non-template strand was 3.90-fold and 3.66-fold, respectively. By contrast, we observed a 6.76-fold enrichment for G4s at the 5′ss at the non-template strand, but only a 3.66-fold enrichment on the template strand at the splice junctions with the weakest splicing strength scores (Fig. 2A). Taken together, these results indicate a preference for the non-template strand that is inversely proportional to the splicing strength score (Fig. 2A). They also suggest that G4s are more prevalent at the non-template strand, which is the sequence found in the transcribed mRNA. We validated the differences in the distribution of DNA G4 motifs at the template and non-template strands and the associated differences in splicing strength score using G4-seq from the two available datasets50,51. Consistent with the results derived from the consensus G4 motif, we found that DNA G4 formation has a preference for the non-template strand at the 5′ss of weak splice sites using the G4-seq datasets (Supplementary Fig. 4e–h).
Longer G-runs are more highly enriched at splice sites
An intramolecular G4 is usually a representation of four or more consecutive G-runs. Yet, intermolecular G4s can form with fewer G-runs since multiple molecules can contribute to G4 structure formation59. We found minimal to no enrichment for single G-runs at both 5′ss and 3′ss (Fig. 2B). However, for two and three G-runs we observed a 1.39-fold and a 2.10-fold enrichment at the 3′ss and a 1.67-fold and a 2.47-fold enrichment at the 5′ss, which may implicate intermolecular G4s in splice sites. The highest enrichment was observed for four to six G-runs, indicating that intramolecular G4 motifs are more enriched at splice sites than their intermolecular counterparts.
DNA G4s are enriched for short introns
The length of introns in metazoans can vary across four orders of magnitude60. We hypothesised that the enrichment patterns of G4s at introns proximal to splicing sites would be associated with intron length. We compared the intron length of splice sites that had a G4 motif within 100 bps in the direction of the intron to the ones that did not have this motif. Consistent with our hypothesis, we found that introns with a G4 at the 3′ss had a median length of 701 nt while introns without a G4 had a median length of 1618 nt (Supplementary Fig. 5a), (Mann–Whitney U, p value < 0.001). Similarly, at the 5′ss, introns with a G4 had a median length of 379 nt, whereas introns without a G4 had a median length of 1629 nt (Supplementary Fig. 5a), (Mann–Whitney U, p value < 0.001). Interestingly, introns in the range of ~45–85 bps were the most enriched for G4s for both the 3′ss and the 5′ss. Moreover, the enrichment of introns in G4s declined rapidly with increased intron length, indicating that they are preferentially found in short introns (Fig. 3A, B, Kolmogorov–Smirnov test p value < 0.001). We also investigated the association between splicing strength score and intron length at sites with G4s in the 3′ss and 5′ss and found that the highest enrichment for G4s was in short introns with weak splice site strength (Fig. 3C). However, when comparing G4 enrichment at splice sites of short introns with a selection of long introns that have the same GC-content distribution, we found no difference or even higher enrichment of G4 at long intron splice sites (Fig. 3D and Supplementary Fig. 5b), indicating that the association of G4 to splice sites located at short introns could be driven by GC-content.
To further investigate the relationship regarding the intron length, we separated G4s identified using the consensus motif into non-template and template for both the 5′ss and the 3′ss. In this case, the GC-content is not a covariate since the template and non-template strands have the same GC-content. The 3′ss introns showed small but significant differences in length if a G4 was at the non-template or the template strand with medians of 736 nt and 621 nt, respectively (Mann–Whitney U, p value < 1e−16) (Supplementary Fig. 5c). However, if a G4 was at the non-template strand at the 5′ss the median intron length was 267 bp, whereas if the G4 was at the template strand the median intron length was 539 nt (Mann–Whitney U, p value < 1e−16) (Supplementary Fig. 5c), displaying more aggravated differences in intron length. Therefore, we conclude that the highest enrichment is found for short introns, on the non-template strand, downstream of the 5′ss.
We also investigated if there is an association between G4s near splice sites and exon length. We do not find a significant association between G4s and exon length at the 3′ss (median exon length without G4s: 124 bp, median exon length with G4s: 123 bp, p value >0.05, Mann–Whitney U), but we find a significant association for smaller exons near the 5′ss, albeit with a very small magnitude (median exon length without G4s: 127 bp, median exon length with G4: 123 bp, p value <0.001, Mann–Whitney U). Furthermore, we explored if microexons, defined as exons <30 nt long61–63 had an enrichment for G4s at their splice sites relative to other exons. However, we could not find a higher density of G4s at the introns flanking microexons than other exons.
In addition to exploring the relationship between intron and exon lengths and G4s, we also considered the position across the gene body. For each gene with nine or more exons, we separated the exons of its longest transcript into nine groups; the first four exons, the last four exons and the remaining middle exons. We find an attenuation of the enrichment around the last intron for the 5′ss and at the first couple of introns for the 3′ss, indicating clear differences at both ends of the transcript in comparison to other introns (Supplementary Fig. 5d), this result indicates that the role of G4s in splicing regulation is pervasive across the gene body.
Importantly, we also separately measured the G4 enrichment at the template and non-template strands of splice junctions across the gene length (Fig. 3E). We found that the enrichment of G4s was consistently higher in the non-template strand across exons. The foremost difference between enrichment scores for the two strands was found in the 3′ss exceeding 3-fold higher enrichment, while the differences between the two strands at the 5′ss were smaller. These findings provide evidence for widespread variation in the topography of G4s in splice junctions; these include the frequency of G4s in the exons and introns flanking the splice site, biases regarding the strand preference, the distance from the splice site and the positioning across the gene body.
Dynamic splicing responses to stimuli are associated with proximal RNA G4s
To gain further insights regarding the biological roles of G4s in the modulation of alternative splicing events, we investigated if G4s are associated with dynamic splicing changes in response to depolarisation stimuli. To that end we analysed published data from mouse and human embryonic stem cell (ESC)-derived neurons and mouse primary neurons subjected to a depolarisation solution including 170 mM of KCl and an L-type Ca2+ channel agonist, resulting in an influx of Ca2+ and followed by RNA-seq 4 h post-treatment64. The rise of intracellular Ca2+ has been shown to have an impact on alternative splicing mediated by calmodulin-dependent kinase IV (CaMK IV)65.
We used Whippet66 to quantify alternative splicing events after depolarisation and compared them to the unstimulated controls (Methods) (Supplementary Material). The change in the inclusion of an exon is quantified using the percent spliced-in index (PSI) which represents the fraction of transcripts that contain the exon. We found a total of 44 G4-flanked exons that are differentially included in at least one human or mouse RNA-seq experiment. As case studies, we considered exons flanked by one or more G4s in the SLC6A17, UNC13A and NAV2 loci that were differentially included after treatment for further experimental validation (Fig. 4A and Supplementary Fig. 6). Firstly, SLC6A17 (NTT4/XT1) is a member of the SLC family of transporters that are involved in Na+-dependent uptake of the majority of neurotransmitters at presynaptic neurons67. SLC6A17 is involved in the transport of neutral amino acids, and mutations in this gene have been associated with autosomal-recessive intellectual disability67,68. We show that exon seven from SLC6A17, which is alternatively skipped after KCl treatment (Delta PSI = −0.177), has a G4 50 nt downstream of the 5′ss on the non-template strand. As the domains of SLC6A17 include an intracellular loop, two transmembrane regions and part of extracellular domains, the KCl-induced alternative skipping of this exon may lead to functional structural changes (Fig. 4A). Similarly, UNC13A encodes another presynaptic protein involved in glutamatergic transmission, and it has been associated with amyotrophic lateral sclerosis69. We identify a G4 downstream of exon 38, which results in dramatic exon skipping (Delta PSI = −0.369), (Supplementary Fig. 6a). Finally, the third target was a G4 located downstream of exon 16 in NAV2 (navigator protein 2), which is required for retinoic acid-induced neurite outgrowth in human neuroblastoma cells70. Again, KCl treatment resulted in exon skipping (Delta PSI = −0.271), which affects a NAV2 serine-rich sequence region (Supplementary Fig. 6b).
For each of the three candidates, we performed multiple assays to provide additional support for the formation of G4s at the RNA level30,71,72. First, we performed circular dichroism spectroscopy and UV-melting measurements of the G4-containing RNA oligonucleotides, in the presence of lithium ions (non-G4 stabilising) or potassium ions (G4 stabilising), to examine the formation potential and stability of RNA G4s. Supporting the case that RNA G4s are present in the transcripts, we found that all three candidates folded into stable RNA G4 structures preferentially under K+ conditions (Fig. 4B–F and Supplementary Fig. 7a). To confirm the results from the circular dichroism and UV-melting experiments, we further used fluorescent-based arrays such as N-methyl mesoporphyrin IX (NMM) ligand enhanced fluorescence, Thiovlavin-T (ThT)-enhanced fluorescence, and intrinsic fluorescence experiments (Fig. 4G–I and Supplementary Fig. 7b, c). Indeed, we observed increased fluorescence intensity under conditions that promoted RNA G4 formation for all three candidates, confirming the formation of RNA G4s in these examples.
Having validated the formation of RNA G4s around three exons that are differentially included after KCl-induced depolarisation, we examined their impact on splicing genome-wide (Fig. 5A). We find that exon skipping at core exons is associated with G4s (chi-squared test multiple testing corrected, p value <5e−12, Fig. 5A). We also found G4s associated with the inclusion of alternative first exons following KCl treatment indicating alternative promoter usage (chi-squared test, multiple testing corrected, p value <0.05, Fig. 5A). The analysis identified a total of 22,344 splicing events where the absolute value of Delta PSI was greater or equal to 0.1 and the probability was greater or equal to 0.9. We focused our analysis on the 2633 events that involved cassette exons, and of these 2346 (89.1%) corresponded to increased skipping (Fig. 5B, binomial test, p value < 0.001). These results are consistent with previous studies which have demonstrated exon skipping following depolarisation in individual examples73–77 and genome-wide analyses of RNA-seq experiments78. Interestingly, we found enrichment for differential splicing events with G4 motifs at the associated splicing sites, exceeding that which was expected by the background distribution (Fig. 5A–C, chi-squared test with multiple testing correction, p value <0.001, odds ratio = 1.57). To provide further support for the findings obtained from the consensus G4 motif, we examined the distribution of G4-seq derived peaks in PDS and K+ conditions around splice sites of differentially and non-differentially included exons. As expected, we found a consistent enrichment at the differentially included exons (Fig. 5A and Supplementary Fig. 8). Moreover, the effect size was larger for the G4 motifs and the G4-seq derived DNA G4 sites in the non-template strand at the 5′ss in comparison to those found at the template strand (chi-squared test multiple testing correction, p value <0.001 when using the consensus G4 motif and for both PDS and K+ G4-seq conditions in human neurons). In addition to human cells, we performed the same analysis in similar experiments in mice across four different conditions (Supplementary Figs. 8–11). We recapitulate the widespread exon skipping phenomenon observed after depolarisation in human neuronal cells (Fig. 5A and Supplementary Fig. 9), but we found a significant association of alternative included exons only with PDS G4-seq derived peaks in two conditions (Supplementary Fig. 11). Importantly, we report alternative promoter usage associated with G4s and alternative first exon inclusion in both mouse and human neurons following KCl treatment (Fig. 5A and Supplementary Fig. 10). Taken together, our results suggest that the presence of G4s at the splice junction of cassette exons is associated with dynamic changes of alternative splicing in response to KCl-induced depolarisation.
Mutations at G4s affect splicing
To provide direct evidence for the role of G4s in the modulation of alternative splicing events, we designed two minigene constructs that contain wild-type and mutant G4 motifs, which we previously validated to lead to RNA-G4 structure formation in the SLC6A17 and NAV2 genes (Fig. 4). Within these constructs, we included the whole sequence of exons that were differentially included after KCl-induced depolarisation and corresponding flanking regions (Supplementary Fig. 12). In the case of SLC6A17 minigene, we inserted a wild-type or mutated G4 motif flanked by two exons, one of them corresponding to a KCl-responding alternative exon (Fig. 5D). In the SLC6A17 minigene containing the wild-type G4 motif, we observed two main splicing products corresponding to isoforms, where either both exons are included or excluded and a minority product where only one alternative exon is included (Fig. 5E). After the introduction of mutations in the G4 motif, we observed strong exclusion of both exons. Similarly, for the NAV2 minigene experiments, we also observed that mutations over the flanking G4 motifs favoured exon skipping (Supplementary Fig. 12). These results indicate that G4 motifs present in flanking intronic regions can have a direct effect on alternative splicing outcomes, favouring exon inclusion, which is in agreement with previous observations39.
To investigate if G4 motifs have a transcriptome-wide effect over alternative splicing, we took advantage of splicing quantitative trait loci (sQTL) data from GTEx consortium data79. Since DNA G4s are reported to be enriched in both germline and somatic mutations28,80, we adjusted the enrichment of sQTLs for differences in SNP distribution across the splice junction. We found that sQTLs were more likely to overlap G4 motifs that are in close proximity to splice sites (Fig. 5F). The highest sQTL adjusted enrichment values were found in exonic regions and the most proximate flanking intronic regions. Therefore, we conclude that G4 motifs are enriched for sQTLs relative to the expected population variant frequency, suggesting a functional role.
Systematic characterisation of the interplay between RBPs and G4s at splice sites
During gene expression, RNA-binding proteins (RBPs) enable different catalytic steps of RNA processing and serve as key regulatory factors of alternative splicing. We processed data from 1345 eCLIP experiments that were performed on K562 and HepG2 cell lines81 to calculate the differential binding of RBPs between exons that are flanked and exons that are not flanked by G4 motifs within a 200 bp window. We performed unsupervised hierarchical clustering of RBPs based on their G4 motif enrichment profile, taking into consideration G4 motifs at the template and non-template orientations in proximity to splice sites. Using unsupervised hierarchical clustering we identified a total of ten clusters with distinct differential binding patterns (Fig. 6A), of which all except cluster 6 showed substantial and significant enrichment differences between exons that are flanked by G4 motifs and exons that are not flanked by G4 motifs. However, the other constituent clusters exhibited clearly distinguishable patterns; for instance, in cluster 7 we observed an enrichment difference only for G4s found at the non-template orientation, whereas for cluster 10 we observed an enrichment difference specific to the template orientation. These results show that several RBPs differentially bind to splice sites flanked by G4 motifs. To complement our observations from eCLIP data, we analysed 506 loss of function (LoF) experiments followed by RNA-seq that targeted a total of 269 RBPs in HepG2 and K562 cell lines, from which 143 RBP factors overlapped between the eCLIP and LoF experiments. We performed quantitative analyses to determine the number of differentially included exons induced by the LoF of target RBPs. We found that for 36 RBPs, differential inclusion following the LoF of the RBP is associated with the presence of a G4 motif in proximity to the splice site in at least one of the analysed LoF experiments (chi-squared test, adjusted p value < 0.05) (Fig. 6B). Integrating eCLIP with LoF RNA-seq experiments we obtained a high-confidence list of 15 differentially bound RBPs to G4 motifs that show a significant and consistent association with alternative splicing (Fig. 6B). Interestingly, we found examples such as HNRNPK and HNRNPU, which exhibit higher binding to G4 motif around splice sites and are positively associated with differentially included exons (p value < 0.05 and log2(OR) > 0), suggesting that these factors could have a direct impact on G4-mediated AS regulation (Fig. 6C). Conversely, we also found examples such as AQR and RBM15, which are depleted around splice sites flanked by G4 motifs and are negatively associated with differentially included exons (p value <0.05 and log2(OR) < 0), suggesting that binding and impact of these factors over AS could be prevented by G4 formation (Fig. 6C). We also found cases such as RBFOX2, that exhibit remarkably different binding profiles across splice sites flanked by G4 motifs, although they were not positively or negatively associated with differentially included exons.
Enrichment of DNA G4s at splice sites does not extend beyond vertebrates
Alternative splicing is a pivotal step of eukaryotic mRNA processing. To understand to what extent splice site regulation by G4s is conserved we considered eleven eukaryotes: Homo sapiens (human), Mus musculus (mouse), Sus scrofa (pig), Gallus gallus (chicken), Danio rerio (zebrafish), Caenorhabditis elegans (nematode) D. melanogaster (fruit fly), Xenopus tropicalis (frog), Anolis carolinensis (lizard), Saccharomyces cerevisiae (yeast) and Arabidopsis thaliana (flowering plant). S. cerevisiae was excluded from further analysis since we could not find any DNA G4s at splice sites and DNA G4s were rare with only 39 occurrences genome-wide. Interestingly, we found that the enrichment pattern of DNA G4s at splice sites was restricted to a subset of vertebrate species, with minimal or no enrichment in fruit fly, Arabidopsis and C. elegans (Fig. 7A, B and Supplementary Fig. 13a). We observed strong enrichment in chicken, pig, human and mouse, while lizard displayed limited enrichment levels. Surprisingly, frog and zebrafish displayed relative depletion. This suggests that alternative splicing regulation by G4s is found to be restricted to mammals and birds, but absent in plants, other tetrapods or fish.
Additional support for this conclusion comes from our analysis of G4-seq derived DNA G4 maps generated in PDS and K+ conditions. These maps are available for multiple model organisms, including three vertebrates (human, mouse and zebrafish) and four non-vertebrate species (nematode, fruit fly, arabidopsis and yeast). Consistent with the analysis based on the primary sequence, we find an acute enrichment of DNA G4s at the 5′ss and 3′ss only in human and mouse. In particular, we could not find any DNA G4s in the vicinity of splicing junctions for S. cerevisiae, there was no enrichment for D. melanogaster and D. rerio, while we observed a depletion in A. thaliana (Fig. 7C, D).
Discussion
The identification of splicing regulators remains an active area of research, as the information content at splice sites is insufficient for predicting alternative splicing events in higher eukaryotes8,82. Here, we provide evidence for the widespread role of G4s in splicing regulation. The enrichment of DNA G4s at splice junctions is comparable to what is observed10 at promoters in humans (Supplementary Fig. 13b), even though it is primarily the importance of G4s for transcriptional and translational regulation that has previously been recognised48,83. We provide several lines of evidence, including a high enrichment at splice site regions, preference for the non-template strand and in vitro experiments to suggest that G4s form at the pre-mRNA and can modulate alternative splicing events. In addition, RNA G4s are more stable than DNA G4s, suggesting that they could have a greater influence in the transcriptome than in the genome83,84. However, co-transcriptional splicing has been previously demonstrated to be the norm85,86, and we cannot rule out the possibility that the nascent transcripts form DNA-RNA hybrids, implying more complex interactions87. In fact, enrichment of G4s at the template strand suggests formation and potential roles at the DNA level as well, likely during co-transcriptional splicing when the DNA is single-stranded or the formation of i-motifs88.
Weaker splice sites lead to suboptimal exon recognition, which enables alternative splicing to be modulated by additional cis-regulatory elements or epigenetic factors89. Here we show a pronounced enrichment of G4s at weak splice sites and provide evidence for a widespread contribution of G4 structures over alternative splicing. We also find that G4s appear in a subset of species near splice sites (Fig. 7), suggesting that they have emerged as splicing modulators during vertebrate evolution. The presence of additional regulatory mechanisms is in accordance with higher frequencies of alternative splicing events in vertebrates compared to invertebrates90. Moreover, DNA G4s display a higher likelihood of DNA mutations91 and as a result they are likely plastic in nature, enabling rapid splicing changes during evolution and the establishment of new functions through alternative splicing and the generation of isoform diversity.
We observed widespread exon skipping following potassium depolarisation of neurons (Fig. 5A and Supplementary Fig. 9), a phenomenon that, to our knowledge, has only been documented for a handful of cases73,74,76,77. These alternative splicing changes are likely induced by Ca2+ influx after depolarisation which is known to affect splicing via CaMK IV65. Here, we show that the changes in splicing patterns are associated with the presence of G4s at the splice junctions. Given the relatively short interval between the time points for the RNAseq samples, we find it most likely that the changes in splicing are due to post-translational effects rather than altered expression of splicing regulators. In fact, part of the alternative splicing changes observed in response to depolarisation has been shown to be dependent on hnRNP L phosphorylation by CaMK IV76.
Our results also show that G4 motifs present at flanking intronic regions can have a direct over alternative splicing, evidenced by sQTL enrichment analyses at G4s nearby splice sites and by our minigene experiments (Fig. 5 and Supplementary Fig. 12). These results corroborate the importance of G4s as splicing modulators, and our findings are consistent with previous work39. We also provide evidence that some RBPs’ binding preference to splice sites depends on the presence of G4s (Fig. 6). We find 15 RBPs whose binding and expression perturbation profiles are significantly associated with the presence of G4s at splice sites, such as HNRPU, HNRPK, RBM15 and PCBP2, suggesting that they could be involved in the mechanism by which G4 formation modulates alternative splicing (Fig. 6). However, it remains unclear by which mechanism G4s influence alternative splicing. G4 formation has the potential to mask otherwise accessible RBP sites, which could explain RBP binding profiles of factors such as AQR, SF3B4, SF3A3 and RBM15, which we found to be depleted in G4-flanked splice sites, and their negative association with alternative splicing changes across the LoF RNA-seq experiments. On the other hand, experimental evidence suggests that G4 stabilisation favours the binding of proteins such HNRNP H/F39,92 and HNRPU93. Our results show that HNRPU has stronger binding profiles across G4-flanked exons, which are also enriched amongst detected alternative splicing events after HNRPU knockdown, supporting the observations from Izumi and Funa93. Moreover, similar results were observed for HNRPC, MATR3 and PCBP1/2, suggesting additional RBPs that can mediate the effect of RNA G4s over alternative splicing. Finally, we also found significant differential RBP binding profiles for other factors such as DDX3X, PRPF4, GTF2F1 and CSTF2T (Supplementary Fig. 14) which were previously linked to RNA G4s94,95 but were not highlighted by our RNA-seq analyses, suggesting the possibility that LoF effects of some of these factors could not be detected due to the strict criteria that we implemented to assess the consistency across all experiments and cell lines.
The fact that G4 formation at the RNA level can be modulated by helicases, monovalent ions or small molecules41,96–98, opens up new avenues for modulating splicing for therapeutic purposes. It is plausible that by perturbing the stability of RNA G4s or the lifetime of the folded state other RBP binding sites can either become exposed or masked, modulating alternative splicing events. Drugs targeting splicing modulation are already clinically approved, such as Spinraza for spinal muscular atrophy99. There are already multiple compounds available with varying specificities for G4 binding. One example is Quarfloxin, which previously reached phase II clinical trials targeting a DNA G4 in the CMYC promoter, but its evaluation was halted due to interference with pol I in rRNA100. Our work suggests that G4s in splice sites could be used as pharmacological targets.
Methods
Genome and gene annotations processing
We obtained genome assemblies from the UCSC Genome Browser FTP server for eleven organisms: GRCh37 (hg19) reference assembly of the human genome, the mouse reference genome (mm10), the Saccharomyces cerevisiae reference genome (sacCer3), the chicken reference genome (galGal5), the Drosophila melanogaster reference genome (dm6), the zebrafish reference genome (danRer11), the Xenopus tropicalis reference genome (xenTro9), the Anolis carolinensis reference genome (anoCar2), the Arabidopsis thaliana reference genome (Tair10) and the Caenorhabditis elegans reference genome (ce10).
We downloaded the Ensembl gene annotation files for the associated genomes from UCSC Table Browser as BED files for each species101. Using in-house python scripts we extracted the coordinates of internal exons flanked by canonical splice sites (GT-AG introns) for every species. To calculate the splicing strength scores, we used publicly available positional frequency matrices from the SpliceRack database58 and previously developed scripts used before for the same purpose55. Splice sites were grouped into quartiles based on their splicing strength score for the downstream analyses to study the distribution of non-B DNA motifs and in particular G4 motifs (Fig. 2A and Supplementary Figs. 3 and 4). For Fig. 2A, the confidence intervals were calculated using “binconf” command from the “Hmisc” package in R with default parameters. Mann–Whitney U tests were performed at 100 nt on each side in the upstream splice site and at the downstream splice site to compare the splicing strength scores of sites with and without G4s.
We used in-house scripts to process a bed file containing all annotated alternative events which were obtained from VastDB’s UCSC Genome Browser Track Hub53. We extracted the splice site coordinates from exon skipping (HsaEX), alternative acceptor (HsaALTA), alternative donor (HsaALTD), intron retention (HsaINT) and mutually coordinated events (MULTI) to analyse each category separately for G4 enrichment.
Genomic datasets
Non-B DNA motifs
Identification of each non-B DNA motif was performed using the genome-wide maps in humans and mice provided by102 and processed as described in28. We focused on seven non-B DNA motifs; inverted repeats, mirror repeats, H-DNA which forms at a subset of mirror repeats with high AG content, G4s, Z-DNA which forms at non-AT alternating purine pyrimidine stretches, short tandem repeats and direct repeats (Fig. 1A and Supplementary Fig. 3).
Regular expressions were employed to identify genome-wide consecutive G-runs across the human genome, interspersed with loops of up to 7 bps. In total, one to six consecutive G-runs were searched (Fig. 2B). For each species, we generated the genome-wide DNA G4 maps using a regular expression of the consensus G4 motif (G ≥ 3N1-7G ≥ 3N1-7G ≥ 3N1-7G ≥ 3) (Fig. 7A, B). Orientation of G4s and G-runs was performed with respect to the template and non-template strands to calculate strand asymmetries at genic regions as previously described for polyN motifs (N being Gs, Cs, Ts and As) in103,104 (Fig. 2A and Supplementary Figs. 2 and 4e–h).
Permuted windows of 100 nt on each side of each splice junction were generated using ushuffle105 correcting for dinucleotide content. The fold enrichment for G4s was calculated as the ratio of the number of motifs found in the real sequences and the median of 1000 permutations of the set of all real sequences. The corrected enrichment of G4s at 3′ss and 5′ss was calculated as the ratio of the real enrichment of G4s over the background enrichment of G4s at shuffled splice site windows.
To investigate the relationship between non-B DNA motifs or G4-seq peaks and splice sites, we generated local windows around the splice sites and measured the distribution of each non-B DNA motif (Fig. 1A, D and Supplementary Fig. 3) or for G4-seq peak base pairs across the window (Fig. 1B and Supplementary Fig. 1a, c). The enrichment was calculated as the number of occurrences at a position over the median number of occurrences across the window. Regardless of the window size shown in the figures, the enrichment was calculated over a window of 1kB. The same approach was used to calculate the enrichment of G4s at splice sites across different species (Fig. 7B).
The density of G4 consensus motifs or G4-seq derived peaks at local windows was calculated as the number of occurrences of the motif or the peak over the total number of base pairs examined (Fig. 7A, C).
G4-seq data
G4-seq BedGraph data were obtained from GEO accession code GSE6387450 for the human genome and analysed with bedtools closest command to identify the closest DNA G4 to splice sites and calculate the distance. The analysis was performed separately for Na+-K+ and Na+-PDS conditions and it was compared to the distribution obtained from the G4 consensus motif. G4-seq BedGraph data for six species, human, mouse, D. melanogaster, C. elegans, A. thaliana and yeast, were obtained from GEO accession code GSE11058251 and analysed using the same genome annotations as those used for the generation of each G4-seq dataset.
Coordinates for internal exons flanked by canonical splice sites (GT-AG introns) were extracted for each species using the Ensembl annotation versions described in51 using custom Python scripts.
Relationship between G4s and exon/intron length
Introns and exons were grouped based on the presence or absence of G4s within 100 nt on each side of the 5′ss and 3′ss and further subdivided into those containing a G4 on the template or on the non-template strand, separately for the 3′ss and the 5′ss. For each of the eight groups, we calculated the median length of the intron or exon in a group and performed Mann–Whitney U tests to calculate the significance of the association between the length of exons/introns and G4 presence (Supplementary Fig. 5a). The R function stat_density was used to plot the length distribution of introns with and without G4s as modelled by a kernel density estimate (Fig. 3A). Abundance enrichment of intron length in 3′ss/5′ss in relationship with the presence of G4s was generated in R using the function geom_smooth in an eighth-grade model (Fig. 3B). Correction of GC content in introns with different lengths was performed by grouping introns into small introns (<500nt) and large introns (>500nt). Then we calculated the GC content for both groups and for each short intron we selected a long intron with a close GC content value, in such a way that GC distribution across short and long introns groups were nearly identical (Supplementary Fig. 5b).
G4s and relationship to exon number
For the longest transcript of each gene with nine or more exons, we separated exons into 9 groups, the first four exons, the last four exons and the remaining middle exons. To compare the frequency of G4s in splice junctions across the gene body we calculated the distribution of G4s in each exon group relative to the 5′ss/3′ss (Supplementary Fig. 5d). We also calculated the distribution of G4s in each exon group relative to the 5′ss/3′ss separately for the template and non-template strands as described previously104 (Fig. 3E).
Relationship between G4s, splicing strength score and intron length
We calculated the splicing strength score and intron length for the upstream and downstream intron of each exon. We separated introns and splicing strength scores into deciles and calculated the G4 density at each decile, from which we produced two heatmaps displaying the density of G4s as a function of splicing strength score and intron length for the upstream and downstream introns (Fig. 3C).
Comparative analysis of RNA-seq experiment
Differential exon inclusion following depolarisation
We analysed available data (BioProject Accession: PRJEB19451, ENA link: ERP021488) for mouse and human ESC-derived cortical neurons, mouse primary cortical neurons from wild-type and Tc1 mice stimulated with KCl treatment and untreated followed by RNA-seq at 4 h post-treatment64. These corresponded to two experiments in human cells and 12 in mouse cells. We used Whippet66 to quantify splicing nodes and assess their alternative inclusion after KCl treatment and controls (Fig. 5A, B and Supplementary Figs. 9 and 10). We used the absolute value of Delta PSI greater than 0.1 and probability greater than 0.9 to define a splicing node as differentially included between treatments and controls.
We calculated the distance between the middle point of G4 motifs and G4-seq peaks from each splicing node to determine their association with G4s. Splicing nodes whose splice sites were within 100 bps to G4 motif or 45 bps to G4-seq peak were classified as G4 associated splicing nodes. Next, we assessed the influence of G4s on splicing changes following KCl depolarisation of human and mouse neurons by calculating the odds ratio score of each splicing node type. To determine the statistical significance of the effect we performed a chi-squared test using Yates‘ correction and also adjusting for multiple testing with Bonferroni corrections (Fig. 5A and Supplementary Fig. 10). The distribution of G4 motifs and G4-seq peaks was profiled around differentially included and non-differentially included core exon splicing nodes (CE) (Supplementary Figs. 8 and 11). The confidence intervals were calculated using “binconf” command from “Hmisc” package in R with default parameters. Sashimi plots were generated using “ggsashimi” package106. Inclusion and exclusion path ratios were calculated using the total amount of spliced reads supporting each splice junction, where inclusion paths were calculated using the average read count for splice junctions flanking each exon side.
The differential gene expression analysis was performed using a Snakemake pipeline107 that was adapted from a publicly available repository108. Reads were mapped with HISAT2 (v2.1.0)109 and later quantified using FeatureCounts110. Finally, differentially expressed genes were identified using DEseq2 (v1.18.1)111, based on the adjusted p values.
Three putative non-template G4s found in proximity to splicing junctions and which were differentially included following depolarisation in human ESC-derived neurons and at least one condition in mice were selected for validation experiments. These were: (1) a G4 downstream of exon 7 for SLC6A17 (chr1: 110734886-110734906), (2) a G4 downstream of exon 38 in Unc13a (chr19:17731307-17731346) and (3) a G4 upstream of exon 16 in Nav2 (chr11:20072958-20072979) for which RNA oligonucleotides at the G4 locations were ordered.
The RNA oligonucleotides used were (G-runs marked in bold):
-
SLC6A17 oligonucleotide:
GGGAGTGGGCAGGGGTGGGGG
-
UNC13A oligonucleotide:
GGGGGGTGGTGGGTGGGGGGTTGGTGGGTAGGGCAGAGGG
Nrxn2 oligonucleotide:
GGGGGTTTGGGCTGGGCTGGGG
Integration of eCLIP and LoF followed by RNA-seq experiments
eCLIP data analysis
eCLIP data for K562 and HepG2 cell lines were derived from the ENCODE consortium81, including 1346 experiments of 150 RBPs. Among them, 722 experiments were performed in the K562 cell line and the remaining 624 experiments were performed in the HepG2 cell line, for 120 and 103 RBPs, respectively. To investigate the relationship between G4 sites, we extracted splice sites flanked by G4 within 100-bp intronic windows, with either template or non-template orientation. Splice site regions were then separated into 10 bp bins. For each bin, we calculated the factor binding enrichment across the different groups of splice sites (flanked by template or non-template G4s, or not flanked by G4s), and we then calculated the differential enrichment values between splice sites flanked and not flanked by G4s. We assessed the statistical significance of these differences using chi-squared testing with Bonferroni correction. For clustering based on differential enrichment, non-significant differences were set to 0. Differential enrichment values of template and non-template G4 bins were clustered using Ward’s method with unsupervised hierarchical clustering with the hclust package in R to classify RBP enrichment profiles into ten clusters.
Analysis of LoF experiments followed by RNA-seq
We analysed 506 LoF experiments followed by RNA-seq that targeted a total of 269 RBPs in HepG2 and K562 cell lines, all of which were derived from the ENCODE consortium81. To quantify alternative splicing changes after knockout or knockdown of target RBPs, we used Whippet66, which led us to obtain sets of differentially included exons associated with each LoF condition. Similarly to the KCl-induced depolarisation experiment analyses, we used the absolute value of Delta PSI >0.1 and probability >0.9 to define a splicing node as differentially included between treatments and controls. For each LoF experiment, we calculated the association between G4s and differentially included exons estimated as log odds ratio and assessed the statistical significance using chi-squared test with Bonferroni correction. To get a list of high-confidence factors that have a role in G4-mediated alternative splicing, we considered factors that were found to have significant eCLIP and RNA-seq analyses performed for the same cell line (K562 or HepG2). Finally, to check for experimental consistency, only factors with at least two eCLIP replicates that clustered in the same group were considered to be high confidence and they were labelled in Fig. 6B.
Evaluation of sQTL data and G4s
sQTL data
sQTL data were derived from the GTEx consortium79 from the link https://storage.googleapis.com/gtex_analysis_v8/multi_tissue_qtl_data/GTEx_Analysis_v8.metasoft.txt.gz. Population variants were derived from dbSNP153. The analysis was done within 25 bp windows from −500 bp to +500 bp relative to the splice site. The enrichment for sQTLs and SNPs was calculated as the density of their occurrences within G4s in each bin relative to the density across all bins. The adjusted enrichment for sQTLs overlapping G4s was calculated as the sQTL enrichment divided by the SNP enrichment. The confidence intervals were calculated in R with geom_smooth using the method “fit = loess”.
Experimental validation of RNA G4 candidates
NMM ligand enhanced fluorescence
This assay was performed similarly to our previous work71. Briefly, sample solutions containing 1 µM RNA were prepared in 150 mM LiCl/KCl, 10 mM LiCac buffer (pH 7.0) and 1 µM NMM ligand. Fluorescence spectroscopy was performed using HORBIA FluoroMax-4 and a 1-cm path length quartz cuvette (Wuxi Jinghe Optical Instrument Co.) was used with a sample volume of 100 µl. Before the measurement, the samples (ligand not added) were denatured at 95 °C for 3 min and allowed to cool down at room temperature for 15 min. The samples were excited at 394 nm and the emission spectra were acquired from 550 to 750 nm. Data were collected every 2 nm at 25 °C with 5 nm entrance and exit slit widths. Raw ligand enhanced fluorescence spectra were first blanked with the corresponding sample spectra that resembled all chemical conditions except for the absence of the ligand. All calculations mentioned were performed in Microsoft Excel.
ThT ligand enhanced fluorescence
This assay was performed similarly to our previous work71. Briefly, sample solutions containing 1 µM RNA were prepared in 150 mM LiCl/KCl, 10 mM LiCac buffer (pH 7.0) and 1 µM ThT ligand. Fluorescence spectroscopy was performed using HORBIA FluoroMax-4 and a 1-cm path length quartz cuvette (Wuxi Jinghe Optical Instrument Co.) was used with a sample volume of 100 µl. Before the measurement, the samples (ligand not added) were denatured at 95 °C for 3 min and allowed to cool down at room temperature for 15 min. The samples were excited at 425 nm and the emission spectra were acquired from 440 to 700 nm. Data were collected every 2 nm at 25 °C with 5 nm entrance and exit slit widths. Raw ligand enhanced fluorescence spectra were first blanked with the corresponding sample spectra that resembled all chemical conditions except for the absence of the ligand. All calculations mentioned were performed in Microsoft Excel.
Circular dichroism (CD) spectroscopy
This assay was performed similarly to our previous work72. Briefly, the CD spectroscopy was performed using Jasco J-1500 CD spectrophotometer and a 1-cm path length quartz cuvette (Hellma Analytics) was employed in a volume of 2 ml. Samples with 5 μM RNA (final concentration) were prepared in 10 mM LiCac (pH 7.0) and 150 mM KCl/LiCl. Each of the RNA samples was then thoroughly mixed and denatured by heating at 95 °C for 5 min and cooled to room temperature for 15 min for renaturation. The RNA samples were excited and scanned from 220–310 nm at 25 °C and spectra were acquired every 1 nm. All spectra reported were an average of 2 scans with a response time of 0.5 s/nm. They were then normalised to molar residue ellipticity and smoothed over 5 nm. All data were analysed with Spectra ManagerTM Suite (Jasco Software).
Thermal melting monitored by UV spectroscopy
This assay was performed similarly to our previous work72. Briefly, samples were prepared to a concentration of 10 mM LiCac buffer, 150 mM salt (KCl/LiCl) and 5 µM RNA, with a total volume of 2 ml. Each of the samples was mixed thoroughly and heated at 95 °C for 5 min so as to denature the RNA. It was then cooled for 15 min at room temperature for renaturation. All UV-melting experiments were performed on an Agilent Cary 100 UV-Vis Spectrophotometer, using 1-cm path length quartz cuvette. Before the experiment started, the sample block was first flushed with dry N2 gas and cooled down to 5 °C for 5 min. After the sample solutions were loaded to the cuvettes, they were sealed with 3 layers of Teflon® tape to prevent vaporisation at high temperatures. The samples were scanned from 5 to 95 °C with a temperature incremental rate of 0.5 °C/min. The temperature was held at 95 °C for 5-min before a reversed scan was performed, scanning from 95 to 5 °C with a rate of 0.5 °C/min. The unfolding and folding transitions in both scans were monitored at 295 nm.
Raw data obtained were subtracted by the blank solutions, which contain the same concentrations of LiCac buffer (pH 7.0) and corresponding salt only. It was then smoothed over 11 nm and its first derivative was plotted in Microsoft Excel. The final melting temperature was obtained by averaging the melting temperatures in the forward and reversed scans.
Intrinsic fluorescence spectroscopy
This assay was performed similarly to our previous work72. Briefly, intrinsic fluorescence spectroscopy was performed using HORIBA FluoroMax-4 and a 1-cm path length quartz cuvette (Hellma Analytics) was used with a volume of 2 ml. Samples with 5 μM RNA were prepared in 10 mM LiCac (pH 7.0) and 150 mM KCl/LiCl. The samples were then denatured at 95 °C for 5 min and cooled to room temperature for 15 min for renaturation. For the measurement of intrinsic fluorescence of G-quadruplexes, the samples are excited at 260 nm and the emission spectra were acquired from 300–500 nm. Spectra were acquired every 2 nm at 25 °C. The bandwidth of the entrance and exit slits was 5 nm. All data were smoothed over 5 nm. Results here are analysed using Microsoft Excel.
Minigene experiment
Minigenes were designed using the pI12 splicing reporter sequence112 and the sequences studied were inserted between the XhoI and XbaI restriction sites. For NAV2, the sequence (hg38 chr11:20,051,051–20,051,629) containing the alternative exon studied and 238 bp upstream of the exon and 296 bp downstream of the exon was selected. For SLC6A17, the sequence (chr1:110,191,667–110,192,950) was selected, which includes 305 bp upstream of exon 7, exon 7 intron 7, exon 8, and 252 bp downstream of exon 8. The minigene sequences were synthesised and cloned into the pTwist-CMV vector from Twist Bioscience using the NotI and NheI sites. For both vectors, we generated a mutant version of RNA G4, NAV2 RNA G4 gggggtttgggctgggctgggg mutated to gagagtttgagctgagctgagg and SLC6A17 RNA G4 gggcaggggtggggg mutated to gagcagaggtgagag. The full sequences incorporated into NAV2 and SLC6A17 minigenes are as follows (with nucleotides in upper case letters corresponding to exons):
>NAV2_minigene
TAATACGACTCACTATAGGGAGACCCAAGCTACGTTGGTACCGAGCTCGGATCCACTAGTAACGGCCGCCAGTGTGCTGGAATTCGAGCTCACTCTCTTCCGCATCGCTGTCTGCGAGGTACCCTACCAGgtgagtatggatccctctaaaagcgggcatgacttctagacaggcctcctggtgacctggggtaaagtatgcgctggtgtcagctcaggctgaggattgggtttcttttgttttccacggatgtggatgtgttgcattgcacgcctagctggataaggcacttcctggtgatgtgcacctctttctccagggcccttcagtcccctccctagctttccctctctctgccttctgtgtgctgctctgaagttcttatttttgttttaactttcctacagTGACCCGCACCTTGATAGGAACACTTTGCCTAAGAAAGGACTCAGgtatctgtgtttcctccttgcatctgtgccatctgttgtggctttggagcttggctgtgtgactccttcatggctggtgggggtttgggctgggctggggtccccgctttgaccaccacagcaggaccttttggatgacggctccccttgcaccctctcgttctcactctccatttgtcagcttatttgcttgagcaggggctgtgctttttcaggcttaatgtggtaaaaccatctcatgaaaaacatccctgggcaagcccaaggagcagtcattactgcttctggggccaatgctcgagggcgtactaactgggccctttcccttttttttcctcagGTCGCGGTTGAGCTGCAGGACAAACTCTTCGCGGTCTCTATGCATCCTCCGAACGCCAAGAGCCTAAGCTTACTAGAGGGCCCTATTCTATAGTGTCACCTAAAT
>SLC6A17_minigene
TAATACGACTCACTATAGGGAGACCCAAGCTACGTTGGTACCGAGCTCGGATCCACTAGTAACGGCCGCCAGTGTGCTGGAATTCGAGCTCACTCTCTTCCGCATCGCTGTCTGCGAGGTACCCTACCAGgtgagtatggatccctctaaaagcgggcatgacttctagagttccttagtatgatggctgcatgggaagccatggtagcagttaaattactatgtaagcactcagcatttatgttgtcttaccaagtcctcacagtcatcctctgaagttgatactcttgtgatctgcgtattggggaaactgaggctcagaggagtagggaaattgcccaagttcacacagcttcttaccactatgggctgctgcctctgaactctgttgaggctgagaaagggggtgtgcacatgaataaaaccatcccctctgtgtctcttttcttcctcctgccttggtcttctgccatagCTGGACAAGATGCTGGACCCCCAGGTGTGGCGGGAGGCAGCTACCCAGGTCTTCTTTGCCTTGGGCCTGGGCTTTGGTGGTGTCATTGCCTTCTCCAGCTACAATAAGCAGGACAACAACTGCCACTTCGATGCCGCCCTGGTGTCCTTCATCAACTTCTTCACGTCAGTGTTGGCCACCCTCGTGGTGTTTGCTGTGCTGGGCTTCAAGGCCAACATCATGAATGAGAAGTGTGTGGTCGAgtaggtggcatctctcctcctgtccctccttctccctgtctaccttacctgggagtgggcaggggtgggggcgcaggtgtgcatggggagagaggtcccctccactcagactgaggaatggagatcagaggagcactctctgtccccagctccgggccacagggacaagctcagagatgcctctgtcagtgacccatgaggttcccacctgggtgcctgggaagagcctccaggatctcacccattgcccacccctgccttcttacctggtcctctcggttttgtgctgcagGAATGCTGAGAAAATCCTAGGGTACCTTAACACCAACGTCCTGAGCCGGGACCTCATCCCACCCCACGTCAACTTCTCCCACCTGACCACAAAGGACTACATGGAGATGTACAATGTCATCATGACCGTGAAGGAGGACCAGTTCTCAGCCCTGGGCCTTGACCCCTGCCTTCTGGAGGACGAGCTGGACAAGgtgcggggacaggctgcccttcccaggacaggcaggaacccagagagcagctgtggccggcgggagcttgggctcaggcctcaggatgctgacaggtagtcattagtttacttggtaagcaaggatctgctgtgtgtccagagggagtgaaagggaagaaaggtattggccaaagtccctgcccagaggtaggcttgagcctagacaagaagtagggcagacacacacctctcagaagtcacagtaagtgtactcgagggcgtactaactgggccctttcccttttttttcctcagGTCGCGGTTGAGCTGCAGGACAAACTCTTCGCGGTCTCTATGCATCCTCCGAACGCCAAGAGCCTAAGCTTACTAGAGGGCCCTATTCTATAGTGTCACCTAAAT
Cell transfection
All plasmids were transfected at 2.5 μg using Fugene HD or lipofectamine 3000 according to the manufacturer’s recommendations into DU145 prostate cancer cell line seeded in a 6-well plate.
Total RNA extraction
Samples were harvested in TRIsure (Bioline, #BIO-38033). Following the aqueous phase collection, total RNA was isolated using RNA Clean & Concentrator-5 kit (Zymo Research, #R1014) according to the manufacturer’s guidelines.
DNAse treatment of RNA samples
Total RNA samples were DNAse-treated treated using a TURBO DNA-free™ Kit (Ambion #AM1907) for 30 min at 37 °C and according to the manufacturer’s instructions.
Reverse transcription
In total, 250 ng of DNAse-treated RNA were reverse transcribed with random hexamers (Invitrogen #N8080127) using SuperScript IIITM Reverse Transcriptase (Invitrogen #18080093) in a final volume of 20 μl at 50 °C for 1 h. Control reactions lacking the enzyme were systematically run in parallel as negative controls.
Semi-quantitative PCR
Semi-quantitative PCR reactions were performed on 1 μl of diluted RT samples (1/2) with primers PI12_F: GCTCACTCTCTTCCGCATC and PI12_R: CTTGGCGTTCGGAGGATG using One TaqR DNA polymerase (New England BioLabs #M0486S) and run on a thermocycler with the following conditions: (1) Initial denaturation 2’ at 94 °C; (2) {26 cycles} D 30” at 94 °C, A 30” at 58 °C, E 30” at 68 °C; (3) Final extension 4’ at 68 °C. PCR products were immediately loaded on a medium-sized 2% agarose gel pre-stained with Ethidium Bromide at a final concentration of 0.5 μg/ml, and run at 80 V for 1 h. Images were acquired on a BioRad Gel Doc system with exposure optimised for “faint bands” and ensuring not to overexpose the signal.
Cloning and Sanger sequencing of PCR products
PCR amplicons were excised from the gel, extracted using a ZymoCleanTM Gel Recovery Kit (Zymo Research #D4001), cloned into a TOPO vector (TOPOTM TA Cloning Kit for sequencing #450030) and Sanger-sequenced at Genewiz using a T7 fwd primer.
Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.
Supplementary information
Acknowledgements
We thank Christopher W.J. Smith for important discussions and critical feedback on the manuscript as well as Omer Ziv for relevant discussion over the design of minigene experiments. I.G.S., G.E.P. and M.H. are supported by the Wellcome Sanger Institute core grant. M.H. is also supported by startup funding from the Evergrande Center. H.Y.W. and C.K.K. are supported by the Shenzhen Basic Research Project [JCYJ20180507181642811]; Research Grants Council of the Hong Kong SAR, China Projects [CityU 11100421, CityU 11101519, CityU 11100218, N_CityU110/17]; Croucher Foundation Project [9509003]; the State Key Laboratory of Marine Pollution Director Discretionary Fund; City University of Hong Kong projects [7005503, 9667222, 9680261]. Additionally, this work was supported by Cancer Research UK (C13474/A18583, C6946/A14492 to E.A.M.) and the Wellcome Trust (104640/Z/14/Z, 092096/Z/10/Z to E.A.M.).
Author contributions
I.G.S. and G.E.P. conceived the study and carried out the computational analysis, supervised by E.A.M. and M.H. H.Y.W. carried out the G4 validation experiments and data analysis, supervised by C.K.K. The minigene experiment design was conceived by Ro.M. and subsequently performed by Ra.M. and G.F., while supervised by Ro.M., G.E.P. and E.A.M. I.G.S., G.E.P. and M.H. led the writing of the manuscript with input from Ro.M., H.Y.W., C.K.K. and E.A.M.
Peer review
Peer review information
Nature Communications thanks Wei Chen, Qiangfeng Zhang and the other, anonymous, reviewer for their contribution to the peer review of this work. Peer reviewer reports are available.
Data availability
The data of this manuscript have been uploaded to Zenodo with 10.5281/zenodo.6324564.
Code availability
All the associated code used for the generation of figures and presentation of data throughout the manuscript is deposited on GitHub at the following link: https://github.com/hemberg-lab/Georgakopoulos_Soares_and_Parada_2022.
Competing interests
I.G.S. and M.H. are founders of Neomer Diagnostics. E.A.M. is a founder and director of STORM Therapeutics. The remaining authors declare no competing interests.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
These authors contributed equally: Ilias Georgakopoulos-Soares, Guillermo E. Parada.
Supplementary information
The online version contains supplementary material available at 10.1038/s41467-022-30071-7.
References
- 1.Pan Q, Shai O, Lee LJ, Frey BJ, Blencowe BJ. Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nat. Genet. 2008;40:1413–1415. doi: 10.1038/ng.259. [DOI] [PubMed] [Google Scholar]
- 2.Wang ET, et al. Alternative isoform regulation in human tissue transcriptomes. Nature. 2008;456:470–476. doi: 10.1038/nature07509. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Licatalosi DD, Darnell RB. RNA processing and its regulation: global insights into biological networks. Nat. Rev. Genet. 2010;11:75–87. doi: 10.1038/nrg2673. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Oesterreich, F. C., Bowne-Anderson, H. & Howard, J. The contribution of alternative splicing probability to the coding expansion of the genome. bioRxiv10.1101/048124 (2016).
- 5.Bell LR, Maine EM, Schedl P, Cline TW. Sex-lethal, a Drosophila sex determination switch gene, exhibits sex-specific RNA splicing and sequence similarity to RNA binding proteins. Cell. 1988;55:1037–1046. doi: 10.1016/0092-8674(88)90248-6. [DOI] [PubMed] [Google Scholar]
- 6.Kalsotra A, Cooper TA. Functional consequences of developmentally regulated alternative splicing. Nat. Rev. Genet. 2011;12:715–729. doi: 10.1038/nrg3052. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Irimia M, Blencowe BJ. Alternative splicing: decoding an expansive regulatory layer. Curr. Opin. Cell Biol. 2012;24:323–332. doi: 10.1016/j.ceb.2012.03.005. [DOI] [PubMed] [Google Scholar]
- 8.Lim LP, Burge CB. A computational analysis of sequence features involved in recognition of short introns. Proc. Natl. Acad. Sci. USA. 2001;98:11193–11198. doi: 10.1073/pnas.201407298. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Wang Z, Burge CB. Splicing regulation: from a parts list of regulatory elements to an integrated splicing code. RNA. 2008;14:802–813. doi: 10.1261/rna.876308. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Barash Y, et al. Deciphering the splicing code. Nature. 2010;465:53–59. doi: 10.1038/nature09000. [DOI] [PubMed] [Google Scholar]
- 11.Vuong CK, Black DL, Zheng S. The neurogenetics of alternative splicing. Nat. Rev. Neurosci. 2016;17:265–281. doi: 10.1038/nrn.2016.27. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Shepard PJ, Hertel KJ. Conserved RNA secondary structures promote alternative splicing. RNA. 2008;14:1463–1469. doi: 10.1261/rna.1069408. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Ghosh A, Bansal M. A glossary of DNA structures from A to Z. Acta Crystallogr. Sect. D. Biol. Crystallogr. 2003;59:620–626. doi: 10.1107/S0907444903003251. [DOI] [PubMed] [Google Scholar]
- 14.Kumari S, Bugaut A, Huppert JL, Balasubramanian S. An RNA G-quadruplex in the 5′ UTR of the NRAS proto-oncogene modulates translation. Nat. Chem. Biol. 2007;3:218–221. doi: 10.1038/nchembio864. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Tellam J, et al. Regulation of protein translation through mRNA structure influences MHC class I loading and T cell recognition. Proc. Natl. Acad. Sci. USA. 2008;105:9319–9324. doi: 10.1073/pnas.0801968105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Bugaut A, Balasubramanian S. 5′-UTR RNA G-quadruplexes: translation regulation and targeting. Nucleic Acids Res. 2012;40:4727–4741. doi: 10.1093/nar/gks068. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Lam EYN, Beraldi D, Tannahill D, Balasubramanian S. G-quadruplex structures are stable and detectable in human genomic DNA. Nat. Commun. 2013;4:1796. doi: 10.1038/ncomms2792. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Rhodes D, Lipps HJ. G-quadruplexes and their regulatory roles in biology. Nucleic Acids Res. 2015;43:8627–8637. doi: 10.1093/nar/gkv862. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Kaushik M, et al. A bouquet of DNA structures: emerging diversity. Biochem Biophys. Rep. 2016;5:388–395. doi: 10.1016/j.bbrep.2016.01.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Georgakopoulos-Soares, I. et al. High-throughput characterization of the role of non-B DNA motifs on promoter function. Cell Genomics 100111 10.1016/j.xgen.2022.100111 (2022). [DOI] [PMC free article] [PubMed]
- 21.Siddiqui-Jain A, Grand CL, Bearss DJ, Hurley LH. Direct evidence for a G-quadruplex in a promoter region and its targeting with a small molecule to repress c-MYC transcription. Proc. Natl. Acad. Sci. USA. 2002;99:11593–11598. doi: 10.1073/pnas.182256799. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Hurley L, Vonhoff D, Siddiquijain A, Yang D. Drug targeting of the c-MYC promoter to repress gene expression via a G-Quadruplex silencer element. Semin. Oncol. 2006;33:498–512. doi: 10.1053/j.seminoncol.2006.04.012. [DOI] [PubMed] [Google Scholar]
- 23.Yang D, Hurley LH. Structure of the biologically relevant G-Quadruplex in the c-MYC promoter. Nucleosides, Nucleotides Nucleic Acids. 2006;25:951–968. doi: 10.1080/15257770600809913. [DOI] [PubMed] [Google Scholar]
- 24.Cogoi S, Xodo LE. G-quadruplex formation within the promoter of the KRAS proto-oncogene and its effect on transcription. Nucleic Acids Res. 2006;34:2536–2549. doi: 10.1093/nar/gkl286. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Wu Y, Brosh RM. G-quadruplex nucleic acids and human disease. FEBS J. 2010;277:3470–3488. doi: 10.1111/j.1742-4658.2010.07760.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.De S, Michor F. DNA secondary structures and epigenetic determinants of cancer genome evolution. Nat. Struct. Mol. Biol. 2011;18:950–955. doi: 10.1038/nsmb.2089. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Simone R, Fratta P, Neidle S, Parkinson GN, Isaacs AM. G-quadruplexes: emerging roles in neurodegenerative diseases and the non-coding transcriptome. FEBS Lett. 2015;589:1653–1668. doi: 10.1016/j.febslet.2015.05.003. [DOI] [PubMed] [Google Scholar]
- 28.Georgakopoulos-Soares I, Morganella S, Jain N, Hemberg M, Nik-Zainal S. Noncanonical secondary structures arising from non-B DNA motifs are determinants of mutagenesis. Genome Res. 2018;28:1264–1271. doi: 10.1101/gr.231688.117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Bevilacqua PC, Ritchey LE, Su Z, Assmann SM. Genome-wide analysis of RNA secondary structure. Annu. Rev. Genet. 2016;50:235–266. doi: 10.1146/annurev-genet-120215-035034. [DOI] [PubMed] [Google Scholar]
- 30.Kwok CK, Merrick CJ. G-Quadruplexes: prediction, characterization, and biological application. Trends Biotechnol. 2017;35:997–1013. doi: 10.1016/j.tibtech.2017.06.012. [DOI] [PubMed] [Google Scholar]
- 31.Strobel EJ, Yu AM, Lucks JB. High-throughput determination of RNA structures. Nat. Rev. Genet. 2018;19:615–634. doi: 10.1038/s41576-018-0034-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Kwok CK, Marsico G, Sahakyan AB, Chambers VS, Balasubramanian S. rG4-seq reveals widespread formation of G-quadruplex structures in the human transcriptome. Nat. Methods. 2016;13:841–844. doi: 10.1038/nmeth.3965. [DOI] [PubMed] [Google Scholar]
- 33.Buratti E, Baralle FE. Influence of RNA secondary structure on the pre-mRNA splicing process. Mol. Cell. Biol. 2004;24:10505–10514. doi: 10.1128/MCB.24.24.10505-10514.2004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Warf MB, Berglund JA. Role of RNA structure in regulating pre-mRNA splicing. Trends Biochem. Sci. 2010;35:169–178. doi: 10.1016/j.tibs.2009.10.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Hastings ML, Krainer AR. Pre-mRNA splicing in the new millennium. Curr. Opin. Cell Biol. 2001;13:302–309. doi: 10.1016/S0955-0674(00)00212-X. [DOI] [PubMed] [Google Scholar]
- 36.Gomez D, et al. Telomerase downregulation induced by the G-quadruplex ligand 12459 in A549 cells is mediated by hTERT RNA alternative splicing. Nucleic Acids Res. 2004;32:371–379. doi: 10.1093/nar/gkh181. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Marcel V, et al. G-quadruplex structures in TP53 intron 3: role in alternative splicing and in production of p53 mRNA isoforms. Carcinogenesis. 2011;32:271–278. doi: 10.1093/carcin/bgq253. [DOI] [PubMed] [Google Scholar]
- 38.Tsai ZT-Y, Chu W-Y, Cheng J-H, Tsai H-K. Associations between intronic non-B DNA structures and exon skipping. Nucleic Acids Res. 2014;42:739–747. doi: 10.1093/nar/gkt939. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Huang H, Zhang J, Harvey SE, Hu X, Cheng C. RNA G-quadruplex secondary structure promotes alternative splicing via the RNA-binding protein hnRNPF. Genes Dev. 2017;31:2296–2309. doi: 10.1101/gad.305862.117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Weldon C, et al. Specific G-quadruplex ligands modulate the alternative splicing of Bcl-X. Nucleic Acids Res. 2018;46:886–896. doi: 10.1093/nar/gkx1122. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Zhang J, Harvey SE, Cheng C. A high-throughput screen identifies small molecule modulators of alternative splicing by targeting RNA G-quadruplexes. Nucleic Acids Res. 2019;47:3667–3679. doi: 10.1093/nar/gkz036. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Biffi G, Tannahill D, McCafferty J, Balasubramanian S. Quantitative visualization of DNA G-quadruplex structures in human cells. Nat. Chem. 2013;5:182–186. doi: 10.1038/nchem.1548. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Dominski Z, Kole R. Selection of splice sites in pre-mRNAs with short internal exons. Mol. Cell. Biol. 1991;11:6075–6083. doi: 10.1128/mcb.11.12.6075. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Coolidge CJ, Seely RJ, Patton JG. Functional analysis of the polypyrimidine tract in pre-mRNA splicing. Nucleic Acids Res. 1997;25:888–896. doi: 10.1093/nar/25.4.888. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Li Y, et al. Human exonization through differential nucleosome occupancy. Proc. Natl. Acad. Sci. USA. 2018;115:8817–8822. doi: 10.1073/pnas.1802561115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Amit M, et al. Differential GC content between exons and introns establishes distinct strategies of splice-site recognition. Cell Rep. 2012;1:543–556. doi: 10.1016/j.celrep.2012.03.013. [DOI] [PubMed] [Google Scholar]
- 47.Kikin O, D’Antonio L, Bagga PS. QGRS Mapper: a web-based server for predicting G-quadruplexes in nucleotide sequences. Nucleic Acids Res. 2006;34:W676–W682. doi: 10.1093/nar/gkl253. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Huppert JL, Balasubramanian S. G-quadruplexes in promoters throughout the human genome. Nucleic Acids Res. 2007;35:406–413. doi: 10.1093/nar/gkl1057. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Varizhuk AM, et al. Conformational polymorphysm of G-rich fragments of DNA Alu-repeats. II. The putative role of G-quadruplex structures in genomic rearrangements. Biochem. (Mosc.), Suppl. Ser. B: Biomed. Chem. 2017;11:146–153. doi: 10.1134/S1990750817020093. [DOI] [PubMed] [Google Scholar]
- 50.Chambers VS, et al. High-throughput sequencing of DNA G-quadruplex structures in the human genome. Nat. Biotechnol. 2015;33:877–881. doi: 10.1038/nbt.3295. [DOI] [PubMed] [Google Scholar]
- 51.Marsico G, et al. Whole genome experimental maps of DNA G-quadruplexes in multiple species. Nucleic Acids Res. 2019 doi: 10.1093/nar/gkz179. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Bochman ML, Paeschke K, Zakian VA. DNA secondary structures: stability and function of G-quadruplex structures. Nat. Rev. Genet. 2012;13:770–780. doi: 10.1038/nrg3296. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Tapial J, et al. An atlas of alternative splicing profiles and functional associations reveals new regulatory programs and genes that simultaneously express multiple major isoforms. Genome Res. 2017;27:1759–1768. doi: 10.1101/gr.220962.117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Bradley RK, Merkin J, Lambert NJ, Burge CB. Alternative splicing of RNA triplets is often regulated and accelerates proteome evolution. PLoS Biol. 2012;10:e1001229. doi: 10.1371/journal.pbio.1001229. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Parada GE, Munita R, Cerda CA, Gysling K. A comprehensive survey of non-canonical splice sites in the human transcriptome. Nucleic Acids Res. 2014;42:10564–10578. doi: 10.1093/nar/gku744. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Sibley CR, Blazquez L, Ule J. Lessons from non-canonical splicing. Nat. Rev. Genet. 2016;17:407–421. doi: 10.1038/nrg.2016.46. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Erkelenz S, et al. Ranking noncanonical 5′ splice site usage by genome-wide RNA-seq analysis and splicing reporter assays. Genome Res. 2018;28:1826–1840. doi: 10.1101/gr.235861.118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Sheth N, et al. Comprehensive splice-site analysis using comparative genomics. Nucleic Acids Res. 2006;34:3955–3967. doi: 10.1093/nar/gkl556. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Bhattacharyya D, Mirihana Arachchilage G, Basu S. Metal cations in G-Quadruplex folding and stability. Front Chem. 2016;4:38. doi: 10.3389/fchem.2016.00038. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Sakharkar MK, Chow VTK, Kangueane P. Distributions of exons and introns in the human genome. Silico Biol. 2004;4:387–393. [PubMed] [Google Scholar]
- 61.Irimia M, et al. A highly conserved program of neuronal microexons is misregulated in autistic brains. Cell. 2014;159:1511–1523. doi: 10.1016/j.cell.2014.11.035. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Li YI, Sanchez-Pulido L, Haerty W, Ponting CP. RBFOX and PTBP1 proteins regulate the alternative splicing of micro-exons in human brain transcripts. Genome Res. 2015;25:1–13. doi: 10.1101/gr.181990.114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Parada GE, et al. MicroExonator enables systematic discovery and quantification of microexons across mouse embryonic development. Genome Biol. 2021;22:43. doi: 10.1186/s13059-020-02246-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Qiu, J. et al. Evidence for evolutionary divergence of activity-dependent gene expression in developing neurons. Elife5, e20337 (2016). [DOI] [PMC free article] [PubMed]
- 65.Sharma A, Lou H. Depolarization-mediated regulation of alternative splicing. Front. Neurosci. 2011;5:141. doi: 10.3389/fnins.2011.00141. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Sterne-Weiler T, Weatheritt RJ, Best AJ, Ha KCH, Blencowe BJ. Efficient and accurate quantitative profiling of alternative splicing patterns of any complexity on a laptop. Mol. Cell. 2018;72:187–200.e6. doi: 10.1016/j.molcel.2018.08.018. [DOI] [PubMed] [Google Scholar]
- 67.Zaia KA, Reimer RJ. Synaptic vesicle protein NTT4/XT1 (SLC6A17) catalyzes Na-coupled neutral amino acid transport. J. Biol. Chem. 2009;284:8439–8448. doi: 10.1074/jbc.M806407200. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Iqbal Z, et al. Homozygous SLC6A17 mutations cause autosomal-recessive intellectual disability with progressive tremor, speech impairment, and behavioral problems. Am. J. Hum. Genet. 2015;96:386–396. doi: 10.1016/j.ajhg.2015.01.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Placek K, et al. UNC13A polymorphism contributes to frontotemporal disease in sporadic amyotrophic lateral sclerosis. Neurobiol. Aging. 2019;73:190–199. doi: 10.1016/j.neurobiolaging.2018.09.031. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Merrill RA, Plum LA, Kaiser ME, Clagett-Dame M. A mammalian homolog of unc-53 is regulated by all-trans retinoic acid in neuroblastoma cells and embryos. Proc. Natl. Acad. Sci. USA. 2002;99:3422–3427. doi: 10.1073/pnas.052017399. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Chan KL, et al. Structural analysis reveals the formation and role of RNA G-quadruplex structures in human mature microRNAs. Chem. Commun. 2018;54:10878–10881. doi: 10.1039/C8CC04635B. [DOI] [PubMed] [Google Scholar]
- 72.Chan C-Y, Umar MI, Kwok CK. Spectroscopic analysis reveals the effect of a single nucleotide bulge on G-quadruplex structures. Chem. Commun. 2019;55:2616–2619. doi: 10.1039/C8CC09929D. [DOI] [PubMed] [Google Scholar]
- 73.An P, Grabowski PJ. Exon silencing by UAGG motifs in response to neuronal excitation. PLoS Biol. 2007;5:e36. doi: 10.1371/journal.pbio.0050036. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Lee J-A, et al. Depolarization and CaM kinase IV modulate NMDA receptor splicing through two essential RNA elements. PLoS Biol. 2007;5:e40. doi: 10.1371/journal.pbio.0050040. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Schor IE, Rascovan N, Pelisch F, Alló M, Kornblihtt AR. Neuronal cell depolarization induces intragenic chromatin modifications affecting NCAM alternative splicing. Proc. Natl. Acad. Sci. USA. 2009;106:4325–4330. doi: 10.1073/pnas.0810666106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Liu G, et al. A conserved serine of heterogeneous nuclear ribonucleoprotein L (hnRNP L) mediates depolarization-regulated alternative splicing of potassium channels. J. Biol. Chem. 2012;287:22709–22716. doi: 10.1074/jbc.M112.357343. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Fiszbein, A. & Kornblihtt, A. R. Alternative splicing switches: Important players in cell differentiation. Bioessays39, 1600157 (2017). [DOI] [PubMed]
- 78.Quesnel-Vallières M, et al. Misregulation of an activity-dependent splicing network as a common mechanism underlying autism spectrum disorders. Mol. Cell. 2016;64:1023–1034. doi: 10.1016/j.molcel.2016.11.033. [DOI] [PubMed] [Google Scholar]
- 79.Consortium TG, The GTEx Consortium The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science. 2020;369:1318–1330. doi: 10.1126/science.aaz1776. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Guiblet WM, et al. Non-B DNA: a major contributor to small- and large-scale variation in nucleotide substitution frequencies across the genome. Nucleic Acids Res. 2021;49:1497–1516. doi: 10.1093/nar/gkaa1269. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Van Nostrand EL, et al. A large-scale binding and functional map of human RNA-binding proteins. Nature. 2020;583:711–719. doi: 10.1038/s41586-020-2077-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82.Barash Y, et al. Deciphering the splicing code. Nature. 2010;465:53–59. doi: 10.1038/nature09000. [DOI] [PubMed] [Google Scholar]
- 83.Arora A, et al. Inhibition of translation in living eukaryotic cells by an RNA G-quadruplex motif. RNA. 2008;14:1290–1296. doi: 10.1261/rna.1001708. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84.Joachimi A, Benz A, Hartig JS. A comparison of DNA and RNA quadruplex structures and stabilities. Bioorg. Med. Chem. 2009;17:6811–6815. doi: 10.1016/j.bmc.2009.08.043. [DOI] [PubMed] [Google Scholar]
- 85.Neugebauer KM. On the importance of being co-transcriptional. J. Cell Sci. 2002;115:3865–3871. doi: 10.1242/jcs.00073. [DOI] [PubMed] [Google Scholar]
- 86.Shukla S, Oberdoerffer S. Co-transcriptional regulation of alternative pre-mRNA splicing. Biochim. Biophys. Acta. 2012;1819:673–683. doi: 10.1016/j.bbagrm.2012.01.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 87.Xiao S, Zhang J-Y, Zheng K-W, Hao Y-H, Tan Z. Bioinformatic analysis reveals an evolutional selection for DNA:RNA hybrid G-quadruplex structures as putative transcription regulatory elements in warm-blooded animals. Nucleic Acids Res. 2013;41:10379–10390. doi: 10.1093/nar/gkt781. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 88.Zeraati M, et al. I-motif DNA structures are formed in the nuclei of human cells. Nat. Chem. 2018;10:631–637. doi: 10.1038/s41557-018-0046-3. [DOI] [PubMed] [Google Scholar]
- 89.Ast G. How did alternative splicing evolve? Nat. Rev. Genet. 2004;5:773–782. doi: 10.1038/nrg1451. [DOI] [PubMed] [Google Scholar]
- 90.Artamonova II, Gelfand MS. Comparative genomics and evolution of alternative splicing: the pessimists’ science. Chem. Rev. 2007;107:3407–3430. doi: 10.1021/cr068304c. [DOI] [PubMed] [Google Scholar]
- 91.Du X, et al. Potential non-B DNA regions in the human genome are associated with higher rates of nucleotide mutation and expression variation. Nucleic Acids Res. 2014;42:12367–12379. doi: 10.1093/nar/gku921. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 92.Decorsière A, Cayrel A, Vagner S, Millevoi S. Essential role for the interaction between hnRNP H/F and a G quadruplex in maintaining p53 pre-mRNA 3′-end processing and function during DNA damage. Genes Dev. 2011;25:220–225. doi: 10.1101/gad.607011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 93.Izumi, H. & Funa, K. Telomere function and the G-quadruplex formation are regulated by hnRNP U. Cells8, 390 (2019). [DOI] [PMC free article] [PubMed]
- 94.Lee DSM, Ghanem LR, Barash Y. Integrative analysis reveals RNA G-quadruplexes in UTRs are selectively constrained and enriched for functional associations. Nat. Commun. 2020;11:527. doi: 10.1038/s41467-020-14404-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 95.Herdy B, et al. Analysis of NRAS RNA G-quadruplex binding proteins reveals DDX3X as a novel interactor of cellular G-quadruplex containing transcripts. Nucleic Acids Res. 2018;46:11592–11604. doi: 10.1093/nar/gky861. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 96.Balasubramanian S, Neidle S. G-quadruplex nucleic acids as therapeutic targets. Curr. Opin. Chem. Biol. 2009;13:345–353. doi: 10.1016/j.cbpa.2009.04.637. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 97.Neidle S. Quadruplex nucleic acids as novel therapeutic targets. J. Med. Chem. 2016;59:5987–6011. doi: 10.1021/acs.jmedchem.5b01835. [DOI] [PubMed] [Google Scholar]
- 98.Tippana R, Chen MC, Demeshkina NA, Ferré-D’Amaré AR, Myong S. RNA G-quadruplex is resolved by repetitive and ATP-dependent mechanism of DHX36. Nat. Commun. 2019;10:1855. doi: 10.1038/s41467-019-09802-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 99.Meijboom, K. E., Wood, M. J. A. & McClorey, G. Splice-switching therapy for spinal muscular atrophy. Genes8, 161 (2017). [DOI] [PMC free article] [PubMed]
- 100.Drygin D, et al. Anticancer activity of CX-3543: a direct inhibitor of rRNA biogenesis. Cancer Res. 2009;69:7653–7661. doi: 10.1158/0008-5472.CAN-09-1304. [DOI] [PubMed] [Google Scholar]
- 101.Karolchik D, et al. The UCSC genome browser database. Nucleic Acids Res. 2003;31:51–54. doi: 10.1093/nar/gkg129. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 102.Cer RZ, et al. Non-B DB v2.0: a database of predicted non-B DNA-forming motifs and its associated tools. Nucleic Acids Res. 2013;41:D94–D100. doi: 10.1093/nar/gks955. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 103.Georgakopoulos-Soares, I., Koh, G., Jiricny, J., Hemberg, M. & Nik-Zainal, S. Transcription-coupled repair and mismatch repair contribute towards preserving genome integrity at mononucleotide repeat tracts. 10.1101/584342 (2020). [DOI] [PMC free article] [PubMed]
- 104.Georgakopoulos-Soares I, et al. Asymmetron: a toolkit for the identification of strand asymmetry patterns in biological sequences. Nucleic Acids Res. 2021;49:e4. doi: 10.1093/nar/gkaa1052. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 105.Jiang M, Anderson J, Gillespie J, Mayne M. uShuffle: a useful tool for shuffling biological sequences while preserving the k-let counts. BMC Bioinforma. 2008;9:192. doi: 10.1186/1471-2105-9-192. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 106.Garrido-Martín D, Palumbo E, Guigó R, Breschi A. ggsashimi: Sashimi plot revised for browser- and annotation-independent splicing visualization. PLoS Comput. Biol. 2018;14:e1006360. doi: 10.1371/journal.pcbi.1006360. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 107.Köster J, Rahmann S. Snakemake–a scalable bioinformatics workflow engine. Bioinformatics. 2012;28:2520–2522. doi: 10.1093/bioinformatics/bts480. [DOI] [PubMed] [Google Scholar]
- 108.Köster, J., Forster, J., Schmeier, S., Salazar, V. & matrs. snakemake-workflows/rna-seq-star-deseq2: Version 1.2.0. 10.5281/zenodo.5245549 (2021).
- 109.Kim D, Paggi JM, Park C, Bennett C, Salzberg SL. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat. Biotechnol. 2019;37:907–915. doi: 10.1038/s41587-019-0201-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 110.Liao Y, Smyth GK, Shi W. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics. 2014;30:923–930. doi: 10.1093/bioinformatics/btt656. [DOI] [PubMed] [Google Scholar]
- 111.Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15:550. doi: 10.1186/s13059-014-0550-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 112.Baraniak AP, Lasda EL, Wagner EJ, Garcia-Blanco MA. A stem structure in fibroblast growth factor receptor 2 transcripts mediates cell-type-specific splicing by approximating intronic control elements. Mol. Cell. Biol. 2003;23:9327–9337. doi: 10.1128/MCB.23.24.9327-9337.2003. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The data of this manuscript have been uploaded to Zenodo with 10.5281/zenodo.6324564.
All the associated code used for the generation of figures and presentation of data throughout the manuscript is deposited on GitHub at the following link: https://github.com/hemberg-lab/Georgakopoulos_Soares_and_Parada_2022.