Abstract
RNA splicing may generate different kinds of splice junctions, such as linear, back-splice and fusion junctions. Only a limited number of programs are available for detection and quantification of splice junctions. Here, we present Assembling Splice Junctions Analysis (ASJA), a software package that identifies and characterizes all splice junctions from high-throughput RNA sequencing (RNA-seq) data. ASJA processes assembled transcripts and chimeric alignments from the STAR aligner and StringTie assembler. ASJA provides the unique position and normalized expression level of each junction. Annotations and integrative analysis of the junctions enable additional filtering. It is also appropriate for the identification of novel junctions. ASJA is available at https://github.com/HuangLab-Fudan/ASJA.
Keywords: RNA splicing, Splice junctions, RNA-seq, Circular RNA
1. Introduction
RNA splicing is a form of RNA processing in which introns are removed from a newly synthesized precursor messenger RNA producing a mature mRNA containing splice junctions. The human multiexon genes have precise alternative spliced variants, which gives rise to genetic diversity. Aberrations of RNA splicing are associated with a wide range of diseases, such as cancers, neurodegenerative diseases and muscular dystrophies [1,2]. Different from the canonical, normally spliced RNAs, noncanonical splicing processes, resulting in circular RNAs (circRNAs) and fusion gene, can be identified from chimeric RNAs. Fusion genes play a vital role in tumor initiation and progression [[3], [4], [5]].Chromosomal translocations are the common cause of oncogenic fusion genes. One of the best known examples is BCR-ABL1 that plays important factor in adult acute lymphoblastic leukemia and can be used as a biomarker for chronic myeloid leukemia [[6], [7], [8]]. Anaplastic lymphoma kinase (ALK) gene fusion are present in approximately 5% of non–small-cell lung cancers (NSCLCs) [9], which indicates a new therapeutic target in this molecularly defined subset of NSCLC. Besides, TMPRSS2–ETS fusions can be detected in approximately 50% of localized and approximately 40% of metastatic prostate cancer respectively [10,11]. CircRNAs with back-splice junctions are characterized by single-stranded, covalently closed loop structures. High-throughput sequencing computational approaches have demonstrated that circRNAs are widespread in the transcriptome. The functions of most circRNAs remains largely unexplored, but known functions include binding to microRNAs or proteins [[12], [13], [14]], regulating their parent genes [15,16], and producing protein products [17,18]. In addition, emerging studies have begun to reveal the critical role that some circRNAs play in the nervous system, cancer development and innate immune response [19].
Current tools for RNA-seq analysis perform well for predicting splice sites [20], but software for splice junction detection and quantification is limited. Nellore et al. found a large number of linear exon-exon junctions by aligning approximately twenty-thousand RNA-seq samples from the Sequence Read Archive (SRA) [21]. However, they extracted the splice junctions directly from mapping reads and many of these junctions might have a high false-positive rate due to incorrect read placement, sample-specific variation and regions where the genome is incorrectly assembled. For circRNA detection, there are a handful of tools and database available [22,23]. However, current computational tools for splice junction detection only focus on one or two types of junctions [12,24,25]. To fully evaluate the whole transcriptome based on RNA-seq data, it is necessary to develop a comprehensive tool to identify and characterize different kinds of splice junctions.
In this study, we present a software package called Assembling Splice Junctions Analysis (ASJA) that detects and annotates all splice junctions from RNA-seq data. It processes assembled transcripts and chimeric alignments from the STAR aligner [26] and Stringtie assembler [27]. ASJA provides the unique position and normalized expression level of each junction for comparison. We verified the workflow on a published RNA-seq dataset [28] and showed that ASJA could efficiently detect splice junctions and was also appropriate for the identification of novel junctions.
2. Methods
2.1. The ASJA Algorithm
ASJA identifies three types of splice junctions, using the following three steps: 1) Alignment and generation of assembled transcripts, 2) Junctions extraction, 3) Junctions annotation and integration.
2.2. Alignment and Generation of Assembled Transcripts
2.2.1. STAR Aligner Settings
FastQC software was used for RNA-seq raw read quality control. The STAR aligner (version 2.5.2a) was used to map all filtered reads with a 2-pass mapping process [29]. In the first pass, STAR was provided with genome indexes generated using the default settings. At the end of the first mapping pass, a new index was generated by combining splice junctions. Then, a second pass was performed using the new index generated by the first pass. The chimSegmentMin option was switched on for chimeric alignments.
2.2.2. Generation of Assembled Transcripts
The mapped reads obtained from the STAR aligner were used as input for StringTie to generate assembled transcripts for linear junction recognition. Annotations from known transcripts were used as a transcript model reference to guide the assembling process with the “-G” option. The StringTie options were as follows: stringtie < input mapped_bam > −f 0.1 -o < out file > −p 4 -G < GTF > .
2.3. Junction Extraction
2.3.1. Extraction of Linear Junctions
After mapping and assembling, a custom perl script was employed to extract linear splice junctions from the assembled transcripts as well as to calculate their expression levels using the formula, CPT(AB) = min (∑ cov (A), ∑ cov (B)) ∗ 10, 000, 000/TC, where cov(A) and cov(B) represent the coverage of each end of the exon. We combined the coverages of all flanking exons for each junction and used the minimal expression level (5′ or 3′ flanking exon level). Junction expression was further normalized by the total annotated junction coverage (TC).
2.3.2. Extraction of Back-Splice Junctions
A custom perl script was used to extract back-splice junctions from chimeric alignments. Back-splice junction candidates require: 1. chimeric reads mapping to the same chromosome and strand; 2. the distance between the 3′ splice donor site and 5′ splice acceptor site to be <3,000,000 bp; 3. junction type = 1 (GT/AG) or junction type =2 (CT/AC); 3. Back-splice junctions from the mitochondrial and other unannotated chromosomes, and those with unreasonable start and end positions were not considered.
2.3.3. Extraction of Fusion Junctions
Fusion junctions were also extracted from chimeric alignments. To reduce the false positive rate, the following steps were taken: 1. the junction reads mapping to mitochondrial and other unannotated chromosomes or unmapped contigs were filtered out; 2. chimeric reads with ‘junction type = 0’(other motif expect GT/AG and CT/AC) were filtered out; 3. back-spliced reads (circRNAs) were excluded; 4. putative fusion junctions were extracted for further filtering when the supporting read number was >1. ASJA calculates the spanning reads as according to the parameter SpanningReads in STARChip [24]. Supporting and spanning reads can be helpful for checking the validity of output fusions.
2.4. Junction Annotation and Integration
2.4.1. Preparation of the Junction Annotation File
ASJA generates a primary annotation file for linear junctions as well as a filtered annotation file without read-through and paralog genes that have been removed from the reference gene annotation (GTF). To annotate back-splice and fusion junctions, ASJA provides an annotation of exons in BED (Browser Extensible Data) format for straightforward assessment of annotation status.
2.4.2. Calculation of the Splice Ratio
For linear junctions, ASJA calculates the ratio of splice junctions by normalizing the maximum CPT of splice junctions within a gene as follows: Weight ratioi = CPTi/CPTm (CPTm is the maximum CPT of the gene the junction originated from). ASJA calculates the back-splice ratio of circRNA according to the following formula as previously reported [28]:
Where 5′ back_splicedread is the number of 5′ reads and linearread is the number of reads mapped across the 5′ splice site that are consistent with a linear junction.
ASJA calculates the ratio of fusion to linear junctions with the following formula:
Where linearread is the number of linear junction reads with the same splice site as the donor.
2.4.3. Integrative Analysis of the Three Types of Junctions
Based on gene annotation status and splice junction position, ASJA integrates the three types of junctions. The output context includes linear_junction_ID, gene_name, circRNA_ID and fusion_ID.
2.4.4. Junction Filtering
To get high confidence junctions, ASJA provides a script to filter junctions based on read count and ratio. We recommend the following settings: linear junctions with a reads count >1 and a weight ratio >0.08, back-splice junctions where the circRNA back-splicing read count is >1, fusion junction with at least one fusion splice site to locate on the boundaries of known exons.
2.5. Datasets
2.5.1. Validation Datasets for ASJA Performance
The RNA-seq datasets from poly(A)- and poly(A)-/RNaseR RNAs in human PA1 cells were download form GEO (GSE75733) [30] for the comparison of back-splice junction detection. Three RNA-seq datasets from glioma (GBM) were downloaded from NCBI SRA, including SRR934794, SRR934744 and SRR934930 [31] for the comparison of fusion junction detection.
2.5.2. Datasets for Application ASJA
RNA-seq datasets from twelve normal tissues, seven cancerous tissues and seven matched adjacent (NT) tissues from GEO (GSE77661) [28] were download to apply to ASJA. Normal tissues included brain, colon, heart, liver, lung and stomach, and the seven cancerous tissues contained bladder urothelial carcinoma (BLCA), breast cancer (BRCA), colorectal cancer (CRC), hepatocellular carcinoma (HCC), gastric cancer (GC), kidney clear cell carcinoma (KCA) and prostate adenocarcinoma (PRAD).
3. Results
3.1. Overview of ASJA
ASJA is a program to obtain, annotate, and integrate three types of splice junctions from reference-based assembled transcripts (linear junction) and chimeric alignments (back-splice and fusion junctions) (Fig. 1). ASJA works as follows: i) ASJA takes advantage of STAR aligner and StringTie assembler to generate chimeric alignments and assembled transcripts from RNA-seq reads respectively. ii) ASJA detects and quantifies linear junctions based on the assembled transcripts. Back-splice and fusion junctions were extracted from the chimeric alignments. iii) ASJA provides quantification and annotation of the three junction types and produces an integrated file accounting for their relationships.
Fig. 1.
A schematic overview of the ASJA workflow. The ASJA architecture consists of three layers (from top to bottom) including chimeric alignment by STAR, junction identification by different model based on characteristics of the three types of junctions, and finally integration of splicing junctions utilization ratio and gene status.
3.2. Performance Evaluation
3.2.1. Linear Junction
The gold standard of known splice junction candidates were defined by the annotated junctions (known genes) from 1-pass alignment with expression read count >1 for junction and FPKM >0.1 for corresponding gene (colon01, total:20,618) [32]. The sensitivity of ASJA known linear junctions is 97.3%. For novel linear junctions, the sensitivity is 89.8% from result of comparing the known splice of 2-pass without annotation with gold standard. Moreover, we downloaded MapSplice2 software which designs for mapping RNA-seq read to reference genome for splice junction discovery [33]. The sensitivity is 91.5% when we evaluated known splice junctions of Mapsplice2 against with gold standard.
3.2.2. Back-Splice Junction
For back-splice junction (circRNA) detection, we performed a comparison of ASJA with two other algorithms, circRNA_finder [12] and ACFS [25]. We used the RNA-seq datasets from 12 normal tissues (GSE77661). Of the three tools, ASJA detected the highest number of circRNAs. The high proportion of identical circRNAs (75.5%) was observed among the three tools (Fig. 2A).To assess the false positive rate,we used the RNase R (for validation of circRNA) digested RNA-seq and corresponding poly(A)- RNA-seq datasets (GSE75733) to explore the level of false positive circRNAs. We observed ASJA also has a similar false positive rate (31.2%) compared to the other algorithms, including ACFS (43%) and circRNA_finder (31.5%) (Fig. 2B).
Fig. 2.
Performance of ASJA on validation dataset. (A) Venn diagram shows the number of circRNAs predicted by three circRNA prediction tools, using 12 normal tissues. (B) Overlap of prediction results between two samples (RNaseR+, ribominus RNA treated with RNase R; RNaseR-, ribominus RNA) for three tools. (C) Times consumed (Minutes) by the softwares to analyse each run of validation samples.
3.2.3. Fusion Junction
For fusion junction detection, we made the comparison with two other fusion detectors, including MapSplice2 and deFuse [34]. We used three validation samples from GBM as positive sets, including SRR934794, SRR934744 and SRR934930. A total of 9 fusion genes has been validated in the three RNA-seq datasets [31]. We observed that ASJA has more precision rate than the other softwares, although they all have high recall (Table 1).
Table 1.
The performance of different fusion junction detectors using validated samples.
| Samples | Statistics | ASJA | MapSplice2 | deFuse |
|---|---|---|---|---|
| CGGA_661(1)* | Total | 3 | 24 | 35 |
| TP | 1 | 1 | 1 | |
| Recall | 100% | 100% | 100% | |
| Precision | 33.30% | 4.16% | 2.85% | |
| CGGA_374(3) | Total | 24 | 38 | 73 |
| TP | 2 | 2 | 3 | |
| Recall | 66.70% | 66.70% | 100% | |
| Precision | 8.30% | 5.20% | 4.10% | |
| CGGA_1329(5) | Total | 48 | 71 | 153 |
| TP | 4 | 5 | 5 | |
| Recall | 80% | 100% | 100% | |
| Precision | 8.33% | 7.04% | 3.26% |
Note: * The value in parentheses is the number of validated fusions. n.
3.2.4. Running Time
We also compared the running time of ASJA with other tools. ASJA shows an excellent performance in computational time, which is 2–10 fold faster than others (Fig. 2C). The time starts from the processing of the FASTQ file until the generation of final candidates, including linear, back-splice and fusion junctions.
3.3. Application to Various Samples
3.3.1. Detection of Splice Junctions in RNA-Seq Data from Normal and Cancerous Tissues by ASJA
We implemented ASJA to detect and quantify the splice junctions in our recent published RNA-seq data (GSE77661), which contains 12 normal tissues, 7 cancerous tissues and 7 matched adjacent (NT) tissues. In total, we detected 322,675 linear junctions, 81,484 back-splice junctions and 33 fusion junctions after removing duplicate junctions detected in the data set (Table 2). Each sample contains a median of 165,997 linear junctions, 5668 back-splice junctions, and 1 fusion junction. The number of linear junctions is significantly higher than the number of back-splice and fusion junctions. The comparison of read counts between back-splice and linear junctions showed that the ratio of back-splice junctions was about 1% except in brain tissue (4%). This result is consistent with the notion that circular RNAs are generally enriched in tissues where mitotic division is not prevalent [12]. In addition, fusion junctions were hardly detected in normal tissues while there were some fusion junctions in cancers such as BRCA.
Table 2.
The number of junctions in each sample.
| Sample | Linear junction | Back-splice junction | Fusion junction |
|---|---|---|---|
| Brain01 | 187,293 | 14,055 | 0 |
| Brain02 | 165,503 | 9542 | 0 |
| Colon01 | 164,370 | 5365 | 0 |
| Colon02 | 167,407 | 5736 | 1 |
| Stomach01 | 171,263 | 4226 | 1 |
| Stomach02 | 151,269 | 2733 | 1 |
| Liver01 | 152,997 | 3920 | 0 |
| Liver02 | 139,087 | 2894 | 1 |
| Heart01 | 162,130 | 8084 | 0 |
| Heart02 | 151,089 | 6351 | 0 |
| Lung01 | 173,994 | 6304 | 2 |
| Lung02 | 172,911 | 6169 | 1 |
| BLCA_N | 169,543 | 3465 | 0 |
| BLCA_T | 166,492 | 5936 | 2 |
| BRCA_N | 151,849 | 9875 | 0 |
| BRCA_T | 173,950 | 10,887 | 13 |
| CRC_N | 172,163 | 5600 | 3 |
| CRC_T | 172,762 | 4696 | 0 |
| GC_N | 146,109 | 3469 | 2 |
| GC_T | 155,763 | 2278 | 0 |
| HCC_N | 163,502 | 5338 | 0 |
| HCC_T | 171,358 | 5513 | 4 |
| KCA_N | 167,979 | 6101 | 1 |
| KCA_T | 163,515 | 7895 | 1 |
| PRAD_N | 169,059 | 6070 | 1 |
| PRAD_T | 161,242 | 3955 | 1 |
| Sum of unique junctions | 322,675 | 81,484 | 33 |
Note: BLCA: bladder urothelial carcinoma, BRCA:breast cancer, CRC: colorectal cancer, HCC: hepatocellular carcinoma, GC: gastric cancer, KCA: kidney clear cell carcinoma, PRAD: prostate adenocarcinoma.
3.3.2. The Characteristics of ASJA Detected Linear Junctions in Human Cells
Linear junction calls were performed using the default ASJA settings. In total, 322,675 distinct linear junctions were found in all tissues with 284,287 of these junctions containing at least two unique read counts and a weight ratio >0.08 (Fig. 3A). These linear junctions were derived from 100,774 known primary transcripts and categorized into four types according to their genomic origin. >78.5% of the linear junctions were located in protein-coding regions, whereas smaller fractions aligned with long noncoding RNAs, and pseudogenes with known transcripts (Fig. 3B). Moreover, we observed an average of 10.4 known linear splicing junctions per gene, and noted that 10,870 of 240,453 known linear junctions are in the 5’noncoding regions of mRNAs. Of note, 43,834 (15.6%) of the linear junctions are unannotated. Most of the novel junctions (33,480) overlapped with known genes, while the remaining 10,353 (23.6%) junctions originated in intergenic regions (Fig. 3C). Our results revealed that numerous novel junctions seem to be specifically expressed in various tissues. There were 7107 novel linear junctions in brain tissue, which is much higher than in other tissues (Fig. 3D). We also compared the junction expression profiles of cancer tissues and matched non-cancer tissues (NCTs), and identified 109 downregulated and 765 upregulated junctions in cancer (Fig. 3E).
Fig. 3.
The characteristics of ASJA identified linear junctions in human cells. (A) The pie chart shows the number of raw (grey) and high confidence (red) linear junctions. (B) The distribution of the linear junctions to annotated known genes. The unannotated junctions are shown as novel junctions. (C) The pie chart shows the proportion of gene isoforms and intergenic junctions in novel junctions. (D) Bar chart shows the number of novel junctions in different normal tissues. Gene isoforms are shown in red. Intergenic genes are shown in blue. (E) Clustering analysis of all differentially expressed junctions between cancer tissues and NCTs. The heatmap is based on expression values with log2(fold-change) > 1 and p < .01 (Wilcoxon test).
3.3.3. The Characteristics of ASJA Detected back-Splice Junctions (circRNA) in Human Cells
Of the 31,346 circRNAs identified in all samples (26 tissues) by ASJA, 20,475 were not yet annotated in circBase (Fig. 4A). There are only 2 (median, ranging from 1 to 72) back-splicing event per gene (Fig. 4B). We investigated the genomic origin of these circRNA candidates using Genecode references. >90% of the circRNAs consisted of protein-coding exons, while smaller fractions aligned to long noncoding RNAs, and antisense regions of known transcripts (Fig. 4C). ASJA quantified the abundance of each circRNA with respect to its alternative linear isoform by estimating the back-splice ratio at the 5’end or 3′ end. Although linearly splice products were absent in some cases, the back-splice ratio for these sites varied considerably. When using a stringent back-splice ratio and read count cut-off (mean of back ratio > 0.15; log2 (average of circRNA supporting read) > −1), we observed 404 high-abundance circRNAs (Fig. 4D).
Fig. 4.
The characteristics of ASJA identified back-splice junctions (circRNA) by in human cells. (A) The pie chart shows the number of annotated and unannotated circRNAs. (B) Bar plots showing the number of genes in different number of back splicing events. (C) Genomic origin of circRNAs. The pie chart shows the genomic distribution of all predicted/annotated circRNAs. (D) Multidimensional scaling screen for the identification highly abundant circRNAs. Red dots and grey dots represent highly abundant and low-abundance circRNAs, respectively.
4. Discussion
High throughput RNA-seq technologies have given rise to large amounts of data that bring about an unprecedented challenge for the development of computational tools. ASJA is developed to detect and quantify different splice junctions from RNA-seq data. ASJA is useful not only for cataloguing linear, back-splice and fusion junctions in samples but also for the identification of novel junctions. ASJA can quickly and accurately identify the different kinds of junctions and evaluates their expression levels, allowing for direct comparisons across different samples.
We compared the accuracy and speed of ASJA to several other splice junctions detecting programs, including MapSplice2, circRNA_finder, ACFS, using validation datasets. We showed that ASJA has better precision, especially for fusion genes. Most fusion gene detection softwares have higher recall compared to ASJA, however the false positives are very high, which requires further complex screening to obtain the real fusion genes. Moreover, the speed of ASJA is much faster than that of others. In brief, ASJA can get more accurate junction information in a shorter time.
Through comprehensive real data analysis, we prove that at least three major improvements of our software: (i) compared with existing common methods for junction identification based on RNA-seq data, the proposed ASJA procedure can simultaneously identify three kinds of junction for large sample sizes, while previous software could only identify a maximum of two kinds of junction. (ii) ASJA extracted linear splice junction from reference-based assembled transcripts to control false-positive rate (FDR) at desired levels, while competing methods have exceedingly inflated FDR levels when obtain linear junction based on mapped reads. (iii) our method can integrate multiple information about a sample including splicing junctions, gene annotation, read count, normalized expression level and splice ratio of each junctions for downstream analysis.
The ASJA pipeline and its default parameters are designed for simultaneously detecting three kinds of junctions. However, this procedure may miss splicesome as result of failing to consider some special circumstances. ASJA will miss a junction since it concerned GT-AG dinucleotides regulation, so junctions with non-canonical donor and acceptor sites will be ignored. Although ASJA can discover novel transcripts based on junction signal, it is limited to pre-RNA containing multiple exons and is not applicable to single-exon. Further investigation is emphasized to understand how the filter should be adjusted to reduce false positive and how to account for other factors like sequencing depth and coverage ratio or variability in junction profiles between samples.
5. Conclusions
In conclusion, ASJA is a powerful tool for the detection and characterization of different kinds of splice junctions including novel junctions. This method greatly facilitates the identification and cataloguing of splice junctions from RNA-seq data and offers new possibilities for exploring transcriptome complexity. As an application of this method, we have made available package for the research community to use and will regularly update it.
Authors' Contributions
SH, JZ and XH proposed the initial idea and designed the methodology. JZ, QL and YL implemented the concept and processed the results. SH, JZ, QL and QZ wrote the manuscript. All authors read and approved the final manuscript.
Declarations of Competing Interest
None.
Availability
The current version of ASJA can be downloaded from https://github.com/HuangLab-Fudan/ASJA.
Acknowledgements
This work was supported by grants from the National Natural Science Foundation of China (81702786, 8187229, 81672779) and the Shanghai Sailing Program (17YF1402600).
Contributor Information
Qiupeng Zheng, Email: zhengqiupeng@gmail.com.
Shenglin Huang, Email: slhuang@fudan.edu.cn.
References
- 1.Montes M., Sanford B.L., Comiskey D.F., Chandler D.S. RNA splicing and disease: animal models to therapies. Trends Genet. 2019;35:68–87. doi: 10.1016/j.tig.2018.10.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Scotti M.M., Swanson M.S. RNA mis-splicing in disease. Nat Rev Genet. 2016;17:19–32. doi: 10.1038/nrg.2015.3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Mertens F., Johansson B., Fioretos T., Mitelman F. The emerging complexity of gene fusions in cancer. Nat Rev Cancer. 2015;15:371–381. doi: 10.1038/nrc3947. [DOI] [PubMed] [Google Scholar]
- 4.Mitelman F., Johansson B., Mertens F. Fusion genes and rearranged genes as a linear function of chromosome aberrations in cancer. Nat Genet. 2004;36:331–334. doi: 10.1038/ng1335. [DOI] [PubMed] [Google Scholar]
- 5.Mitelman F., Johansson B., Mertens F. The impact of translocations and gene fusions on cancer causation. Nat Rev Cancer. 2007;7:233–245. doi: 10.1038/nrc2091. [DOI] [PubMed] [Google Scholar]
- 6.Westbrook C.A., Hooberman A.L., Spino C., Dodge R.K., Larson R.A., Davey F. Clinical significance of the BCR-ABL fusion gene in adult acute lymphoblastic leukemia: a Cancer and leukemia group B study (8762) Blood. 1992;80:2983–2990. [PubMed] [Google Scholar]
- 7.Druker B.J., Guilhot F., O'Brien S.G., Gathmann I., Kantarjian H., Gattermann N. Five-year follow-up of patients receiving imatinib for chronic myeloid leukemia. N Engl J Med. 2006;355:2408–2417. doi: 10.1056/NEJMoa062867. [DOI] [PubMed] [Google Scholar]
- 8.Tkachuk D.C., Westbrook C.A., Andreeff M., Donlon T.A., Cleary M.L., Suryanarayan K. Detection of bcr-abl fusion in chronic myelogeneous leukemia by in situ hybridization. Science. 1990;250:559–562. doi: 10.1126/science.2237408. [DOI] [PubMed] [Google Scholar]
- 9.Solomon B., Varella-Garcia M., Camidge D.R. ALK gene rearrangements: a new therapeutic target in a molecularly defined subset of non-small cell lung cancer. J Thorac Oncol. 2009;4:1450–1454. doi: 10.1097/JTO.0b013e3181c4dedb. [DOI] [PubMed] [Google Scholar]
- 10.Mehra R., Tomlins S.A., Shen R., Nadeem O., Wang L., Wei J.T. Comprehensive assessment of TMPRSS2 and ETS family gene aberrations in clinically localized prostate cancer. Mod Pathol. 2007;20:538–544. doi: 10.1038/modpathol.3800769. [DOI] [PubMed] [Google Scholar]
- 11.Tomlins S.A., Rhodes D.R., Perner S., Dhanasekaran S.M., Mehra R., Sun X.W. Recurrent fusion of TMPRSS2 and ETS transcription factor genes in prostate cancer. Science. 2005;310:644–648. doi: 10.1126/science.1117679. [DOI] [PubMed] [Google Scholar]
- 12.Memczak S., Jens M., Elefsinioti A., Torti F., Krueger J., Rybak A. Circular RNAs are a large class of animal RNAs with regulatory potency. Nature. 2013;495:333–338. doi: 10.1038/nature11928. [DOI] [PubMed] [Google Scholar]
- 13.Hansen T.B., Jensen T.I., Clausen B.H., Bramsen J.B., Finsen B., Damgaard C.K. Natural RNA circles function as efficient microRNA sponges. Nature. 2013;495:384–388. doi: 10.1038/nature11993. [DOI] [PubMed] [Google Scholar]
- 14.Wilusz J.E., Sharp P.A. Molecular biology. A circuitous route to noncoding RNA. Science. 2013;340:440–441. doi: 10.1126/science.1238522. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Ashwal-Fluss R., Meyer M., Pamudurti N.R., Ivanov A., Bartok O., Hanan M. circRNA biogenesis competes with pre-mRNA splicing. Mol Cell. 2014;56:55–66. doi: 10.1016/j.molcel.2014.08.019. [DOI] [PubMed] [Google Scholar]
- 16.Li Z., Huang C., Bao C., Chen L., Lin M., Wang X. Exon-intron circular RNAs regulate transcription in the nucleus. Nat Struct Mol Biol. 2015;22:256–264. doi: 10.1038/nsmb.2959. [DOI] [PubMed] [Google Scholar]
- 17.Legnini I., Di Timoteo G., Rossi F., Morlando M., Briganti F., Sthandier O. Circ-ZNF609 is a circular RNA that can be translated and functions in Myogenesis. Mol Cell. 2017;66:22–37.e9. doi: 10.1016/j.molcel.2017.02.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Yang Y., Gao X., Zhang M., Yan S., Sun C., Xiao F. Novel role of FBXW7 circular RNA in repressing glioma tumorigenesis. J Natl Cancer Inst. 2018;110 doi: 10.1093/jnci/djx166. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Li X., Yang L., Chen L.L. The biogenesis, functions, and challenges of circular RNAs. Mol Cell. 2018;71:428–442. doi: 10.1016/j.molcel.2018.06.034. [DOI] [PubMed] [Google Scholar]
- 20.Engstrom P.G., Steijger T., Sipos B., Grant G.R., Kahles A., Ratsch G. Systematic evaluation of spliced alignment programs for RNA-seq data. Nat Methods. 2013;10:1185–1191. doi: 10.1038/nmeth.2722. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Nellore A., Jaffe A.E., Fortin J.P., Alquicira-Hernandez J., Collado-Torres L., Wang S. Human splicing diversity and the extent of unannotated splice junctions across human RNA-seq samples on the sequence read archive. Genome Biol. 2016;17:266. doi: 10.1186/s13059-016-1118-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Hansen T.B., Veno M.T., Damgaard C.K., Kjems J. Comparison of circular RNA prediction tools. Nucleic Acids Res. 2016;44:e58. doi: 10.1093/nar/gkv1458. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Chen B., Huang S. Circular RNA: an emerging non-coding RNA as a regulator and biomarker in cancer. Cancer Lett. 2018;418:41–50. doi: 10.1016/j.canlet.2018.01.011. [DOI] [PubMed] [Google Scholar]
- 24.Akers N.K., Schadt E.E., Losic B. STAR chimeric post for rapid detection of circular RNA and fusion transcripts. BIOINFORMATICS. 2018;34:2364–2370. doi: 10.1093/bioinformatics/bty091. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.You X., Conrad T.O. Acfs: accurate circRNA identification and quantification from RNA-Seq data. Sci Rep. 2016;6 doi: 10.1038/srep38820. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Dobin A., Davis C.A., Schlesinger F., Drenkow J., Zaleski C., Jha S. STAR: ultrafast universal RNA-seq aligner. BIOINFORMATICS. 2013;29:15–21. doi: 10.1093/bioinformatics/bts635. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Pertea M., Pertea G.M., Antonescu C.M., Chang T.C., Mendell J.T., Salzberg S.L. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat Biotechnol. 2015;33:290–295. doi: 10.1038/nbt.3122. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Zheng Q., Bao C., Guo W., Li S., Chen J., Chen B. Circular RNA profiling reveals an abundant circHIPK3 that regulates cell growth by sponging multiple miRNAs. Nat Commun. 2016;7 doi: 10.1038/ncomms11215. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Li S., Hu Z., Zhao Y., Huang S., He X. Transcriptome-wide analysis reveals the landscape of aberrant alternative splicing events in liver cancer. HEPATOLOGY. 2019;69:359–375. doi: 10.1002/hep.30158. [DOI] [PubMed] [Google Scholar]
- 30.Zhang X.O., Dong R., Zhang Y., Zhang J.L., Luo Z., Zhang J. Diverse alternative back-splicing and alternative splicing landscape of circular RNAs. Genome Res. 2016;26:1277–1287. doi: 10.1101/gr.202895.115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Bao Z.S., Chen H.M., Yang M.Y., Zhang C.B., Yu K., Ye W.L. RNA-seq of 272 gliomas revealed a novel, recurrent PTPRZ1-MET fusion transcript in secondary glioblastomas. Genome Res. 2014;24:1765–1773. doi: 10.1101/gr.165126.113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Veeneman B.A., Shukla S., Dhanasekaran S.M., Chinnaiyan A.M., Nesvizhskii A.I. Two-pass alignment improves novel splice junction quantification. Bioinformatics. 2016;32:43–49. doi: 10.1093/bioinformatics/btv642. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Wang K., Singh D., Zeng Z., Coleman S.J., Huang Y., Savich G.L. MapSplice: accurate mapping of RNA-seq reads for splice junction discovery. Nucleic Acids Res. 2010;38:e178. doi: 10.1093/nar/gkq622. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.McPherson A., Hormozdiari F., Zayed A., Giuliany R., Ha G., Sun M.G. deFuse: an algorithm for gene fusion discovery in tumor RNA-Seq data. PLoS Comput Biol. 2011;7 doi: 10.1371/journal.pcbi.1001138. [DOI] [PMC free article] [PubMed] [Google Scholar]




