Abstract
Circular RNA (circRNA) has recently gained attention for its emerging biological activities, relevance to disease, potential as biomarkers, and promising an alternative modality for RNA vaccines. Nevertheless, sequencing circRNAs has presented challenges. In this context, we introduce a novel circRNA sequencing method called Induro-RT mediated circRNA-sequencing (IMCR-seq), which relies on a group II intron reverse transcriptase with robust rolling circle reverse transcription activity. The IMCR-seq protocol eliminates the need for conventional circRNA enrichment methods such as rRNA depletion and RNaseR digestion yet achieved the highest circRNA enrichment and detected 6–1000 times more circRNAs for the benchmarked human samples compared to other methods. IMCR-seq is applicable to any organism, capable of detecting circRNAs of longer than 7000 nucleotides, and is effective on samples as small as 10 ng of total RNA. These enhancements render IMCR-seq suitable for clinical samples, including disease tissues and liquid biopsies. We demonstrated the clinical relevance of IMCR-seq by detecting cancer-specific circRNAs as potential biomarkers from IMCR-seq results on lung tumor tissues together with blood plasma samples from both a healthy individual and a lung cancer patient. In summary, IMCR-seq presents an efficient and versatile circRNA sequencing method with high potential for research and clinical applications.
Graphical Abstract
Graphical Abstract.
Introduction
Circular RNAs (circRNAs) are endogenous RNA molecules that are formed by a back splicing event, joining the 3′ end of a downstream exon to a 5′ end of an upstream exon, resulting in a single-stranded covalently closed RNA. CircRNAs are evolutionarily conserved and display complex tissue, development stage, and disease specific expression patterns, indicating diverse and impactful biological functions (1). The biological roles of circRNAs include, but are not limited to, acting as miRNA or protein sponges, serving as translation templates with the help of internal ribosome entry sites (IRESs), promoting the transcription of their parental genes via interacting with U1 small nuclear ribonucleoprotein or by positively regulating RNA polymerase II mediated transcription (2–4).
CircRNAs have been implicated in many diseases such as neurological disorders, cancers, type 2 diabetes, atherosclerosis and autoimmune diseases such as lupus and rheumatoid arthritis (5–10). Several studies regarding the above-mentioned diseases showed the detectability and stability of circRNAs in liquid biopsy samples such as plasma, saliva, and urine and the strong potential of circRNAs to be used as biomarkers for diagnosis, prognosis, and therapy resistance (8,11–14). Furthermore, circRNAs are becoming increasingly attractive agents for RNA therapeutics because of their higher stability, lower immunogenicity, and increased protein production capacity compared to linear RNAs when transfected into mammalian cells (1,15,16). Due to circRNAs’ crucial endogenous biological functions, implications in many diseases, wide-ranging possibilities for disease biomarker analysis and liquid biopsy, and potential of utilization for RNA therapeutics; there is an exponentially growing demand for methods and tools to design, synthesize and sequence circRNAs in a high throughput manner.
CircRNAs are generally expressed at lower levels compared to linear RNAs (17). Therefore, ribosomal RNA (rRNA) depletion and/or RNase R treatment of total RNA from cell or tissue preparations are often employed to enrich for circRNAs for circRNA sequencing. This heavily treated RNA is usually sequenced by short-read RNA-seq and then the resulting fragments are searched for back splice junctions (BSJs). However, short-read RNA-seq yields a very low percentage of sequences with BSJs, as the fragments from the regions without the BSJ of a circRNA are indistinguishable from their linear RNA counterparts. Additionally, short-read RNA-seq does not sequence the full-length circular molecule and thus cannot detect heterogeneity such as splicing variants. Due to these limitations, there have been efforts to utilize long read sequencing with Oxford Nanopore Technologies to sequence circRNAs from rRNA depleted and RNase R treated RNA (18–20). The general logic of the long-read approach has been to either rely on rolling circle reverse transcription by conventional retroviral reverse transcriptases (RTs) (19,20) or using rolling circle amplification by Phi29 DNA polymerase after enzymatically selecting for cDNAs coming from circRNAs (18), followed by Nanopore sequencing. A consensus sequence is then generated from rolling circle products that contain many repeats of the circRNA to determine the full length circRNA sequence. While many novel full-length circRNA sequences have been discovered by these long-read circRNA sequencing techniques, there is still room for improvements in accuracy, efficiency, reproducibility and bias. Firstly, retroviral RTs have limited processivity and strand displacement activity and are suboptimal at high temperatures (21,22). Because of these features, their rolling circle reverse transcription capabilities are expected to be limited and they are expected to produce non-optimal numbers of repeats for consensus generation, especially for longer circRNAs. Due to their low strand displacement activity, in many cases, these retroviral RTs might not even be able to complete a full circle when random primers are used, and priming happens at several places in the circRNA. Additionally, the lower temperature dependence could make it more difficult to reverse transcribe highly structured circRNAs. Furthermore, the traditionally used circRNA enrichment workflow of rRNA depletion, poly(A) tailing (to improve RNase R digestion (23)) and the subsequent RNase R treatment and associated clean up processes can introduce nicks into the circRNAs and/or can lead to unwanted degradation and therefore introduce biases. This becomes a bigger problem for longer circRNAs as they are more sensitive to random nick formation and hydrolysis. Moreover, longer circRNAs were suggested to be less resistant to RNaseR digestion than shorter circRNAs (17,24). Since circRNAs are usually expressed at low levels (25), a rapid conversion from RNA to cDNA and omission of the traditionally used RNaseR digestion would lead to a more accurate profile of the circRNA in the cell, especially for longer circRNAs.
To address these issues, we developed a circRNA sequencing workflow, Induro-RT mediated circRNA-sequencing (IMCR-seq), which eliminates the need for conventional circRNA enrichment methods. IMCR-seq utilizes a group II intron RT to perform rolling circle reverse transcription of circRNA templates and is compatible with both Oxford Nanopore Technologies (Nanopore) and Pacific Biosciences (PacBio) sequencing technologies. We showed that group II intron RTs generated much longer cDNAs from circRNAs than retroviral RTs due to their exceptional processivity and robust strand displacement activity. This creates two very differently sized cDNA pools from linear RNAs and circular RNAs and allows us to use a simple SPRI-bead based size selection method to effectively enrich circRNA from total RNA. When applied to human brain RNA, IMCR-seq detected 1000 fold more circRNA isoforms than the short-read RNA-seq method and 6–10 times more circRNA isoforms than RNase R coupled long-read circRNA sequencing methods for the same number of raw reads. Finally, IMCR-seq allowed us to detect circRNAs over a wide size range from 38 nt to over 10 kb, and we experimentally validated a few long circRNA isoforms of 4033 nt, 5262 nt and 7166 nt. The IMCR-seq protocol is also versatile and applicable to any organism because it does not require an rRNA depletion step, which is a limiting factor for other methods. Finally, we showed that IMCR-seq can work with minimal RNA input, proving effective with as little as 10 ng total RNA. This feature positions IMCR-seq as the method of choice for handling limited clinical samples, including tumor tissue and plasma.
Materials and methods
Reverse transcription and quantitative PCR to compare different RTs
A circRNA precursor that contains partial sequences of a group I catalytic intron was designed according to the principles in (16). This circRNA precursor and linear RNA Fluc were in vitro transcribed with HiScribe T7 Quick High Yield RNA Synthesis Kit (NEB, E2050S) in 20 μl according to manufacturer's protocol, then DNA templates in the reaction were digested with 4 units of DNaseI (NEB, M0303) in a total volume of 50 μl. Linear and circRNA were purified with Monarch RNA Cleanup Kit (NEB, T2050S) according to manufacturer's protocols. 60 μg of the circRNA precursor was first incubated in water at 70°C for 5 minutes for denaturation and was incubated at 55°C for 10 minutes in 1× NEB Buffer 2 and 2 mM GTP for circularization (16). To remove the remaining linear RNAs, the circularization reaction products were treated with 5 units of XRN-1 (NEB, M0338S) and 25 units of RppH (NEB, M0356S) at 37°C for 1 hour and repurified. Reverse transcription reactions of linear and circular RNAs were carried out either with sequence specific primers or random hexamers with 4 different reverse transcriptases: ProtoScript II RT (PSII; NEB, M0368), SuperScript IV RT (SSIV; ThermoFisher Scientific, 18090010), Thermostable Group II Intron RT (TGIRT; InGex) and Induro RT (NEB, M0681S). SSIV, PSII, and reverse transcriptions were carried out according to manufacturers’ protocols. For TGIRT and Induro, primer annealing was performed by incubating linear/circular RNA with 2 μl of 10 μM specific primers or 4 μl of 60 μM random hexamers in a volume of 28 μl at 65°C for 5 minutes and then cooling down to 4°C with 0.1°C/s ramp. For TGIRT, as recommended by the manufacturer, we first added the TGIRT buffer and DTT to 1× concentration and 400 units of TGIRT enzyme to the annealed primer template mix followed by a 30-minute incubation at room temperature. Finally, 2 μl 10 mM dNTPs were added to initiate the reaction at 40 μl total volume and reverse transcription was carried out at 60°C for 1 hour. For Induro RT reaction, 2 μl 10 mM dNTPs and 400 units of Induro RT (NEB, M0681) were added to the annealed primer template mix in the 1× Induro-RT buffer supplemented with 3 mM MgCl2 followed by incubation at 55°C for 1 hour (for RTs using random primers a pre-extension step with 23°C incubation for 10 minutes, 30°C for 5 minutes then a ramp of 0.1°C/sec were added before the 1-hour incubation). When the reverse transcription reactions reached completion, 1.5 μl of 120 units/ml Thermolabile Proteinase K (NEB, P8111) was added to all the RT reactions and incubated for 15 minutes at 37°C followed by 20 minutes at 55°C. For agarose gel electrophoresis experiment, cDNAs were purified with SPRI beads and eluted in 26 μl water. 24 μl of this elution was mixed with 3 μl 10× RNaseH buffer, 7.5 units of RNaseH (NEB, M0297), and 75 units of RNase If (NEB M0243) to degrade the template RNA and incubated at 37°C for 30 minutes followed by 70°C for 20 minutes. 10 μl of each reaction was run on 1% agarose gel together with 1 kb ladder (NEB, N3232) to evaluate reverse transcription product. Quantification of cDNAs was via qPCR of diluted samples after proteinase K treatment and inactivation using Luna Universal qPCR Master Mix (NEB, M3003) according to the manufacturer's protocol.
Cell culture
HEK293 cells were cultured in Dulbecco's modified Eagle's medium (DMEM, Gibco) with 10% fetal bovine serum, 2 mM glutamine, 100 units/ml penicillin and 100 μg/ml streptomycin. Cells were maintained at 37°C in a 5% CO2 incubator and collected at 80–90% confluency for RNA extraction.
RNA extraction
Total RNA was extracted from HEK293 cells with Monarch Total RNA Miniprep Kit (NEB, T2010S) according to the manufacturer's protocol. Around 5 million HEK293 cells yielded 50 μg total RNA. The quality of the extracted RNA was assessed with Agilent 4200 TapeStation system using the High Sensitivity RNA Screen Tape (Agilent, 5067–5579).
Cell-free RNA was extracted from 3 ml of human plasma using Quick-cfRNA Serum & Plasma Kit (Zymo Research, R1059) and treated with DNaseI (NEB, M0303) according to manufacturers’ protocols. Around 10 ng of cell-free RNA was obtained from 3 ml of plasma.
Ribodepletion and RNase R digestion of RNA
50 μg RNA from human brain (BioChain, Catalog #: R1234035-P, Lot #:C404081) and HEK293 cells were depleted of rRNA by NEBNext rRNA Depletion Kit v2 (Human/Mouse/Rat) (NEB, E7400X) according to manufacturer's protocol. Remaining RNAs were treated with 200 units of RNase R (Lucigen, RNR07250) at 37°C for 30 min, then purified with NEBNext RNA Sample Purification Beads (NEB, E6351).
Reverse transcription with induro RT and size selection for concatemeric circRNA cDNAs
4 μg of total RNA from human brain (BioChain, Catalog #: R1234035-P, Lot #: C404081), HEK293 cells or lung tumor (BioChain, Catalog #: R1235152-50, Lot #: A808151) or ribo-depleted and RNaseR treated RNA (500–700 ng) from human brain and HEK293 cells were incubated with 2 μl from 100 μM random N6 primers (NEB, S1230S) in a total volume of 10 μl at 72°C for 5 minutes then cooled down to 4°C with 0.1°C/s ramp. Then 2 μl 10mM dNTPs and 400 units of Induro RT (NEB, M0681) were added to a final volume of 40 μl in the 1× Induro-RT buffer supplemented with 3 mM MgCl2. Reactions were incubated at 23°C for 10 minutes, 30°C for 5 minutes, and then the temperature was increased at a rate of 0.1°C/s to 55°C. The incubation time at 55°C was 1 hour. Induro RT was inactivated by 1 minute incubation at 95°C. cDNAs were size selected twice using NEBNext DNA Sample Purification Beads (NEB, E6178) by incubating with 0.97% PEG8000 and 97 mM MgCl2 for 30 minutes (26). After size selection, cDNAs eluted from the beads were amplified using phi29 DNA Polymerase (NEB, M0269) for 8 hours at 30°C following the manufacturer's recommendation. Debranching of the phi29 reaction products was performed by adding T7 Endonuclease I to a 1.9 U/μl final concentration and incubated for 1 hour at 37°C. Debranched dsDNA was size selected with NEBNext DNA Sample Purification Beads as described above.
For human plasma samples (AMSBIO, 100048013 and 100028013), around 10 ng of extracted, DNaseI treated RNA in 10 μl was mixed with 2 μl from 100 μM random N6 primers (NEB, S1230S) in a total volume of 12 μl and incubated at 72°C for 5 minutes then directly cooled down to 4°C. Then 2 μl 10 mM dNTPs and 400 units of Induro RT (NEB, M0681) were added to a final volume of 40 μl in the 1× Induro-RT buffer supplemented with 3 mM MgCl2. Reactions were incubated at 23°C for 10 minutes, 30°C for 5 minutes, and then the temperature was increased at a rate of 0.1°C/s to 55°C. The incubation time at 55°C was 1 hour. Induro RT was inactivated by 1 minute incubation at 95°C. cDNAs were size selected once using NEBNext DNA Sample Purification Beads (NEB, E6178) by incubating with 0.97% PEG8000 and 97 mM MgCl2 for 30 minutes (26). After size selection, cDNAs eluted from the beads were amplified using phi29 DNA Polymerase (NEB, M0269) for 16 hours at 30°C following the manufacturer's recommendation. Debranching of the phi29 reaction products was performed by adding T7 Endonuclease I to a 1.9 U/μl final concentration and incubated for 1 hour at 37°C. Debranched dsDNA was size selected with NEBNext DNA Sample Purification Beads as described above.
Library generation for Nanopore sequencing
3 μg of size selected dsDNA was used to make Nanopore libraries by using SQK-LSK109 kit (Oxford Nanopore Technologies) according to the manufacturer's protocol and ran on two flow-cells (FLO-MIN106D). About 10G bp data were generated per replicate.
Library generation for PacBio sequencing
5 μg of size selected dsDNA was used to make PacBio libraries using SMRTbell Express Template Prep Kit 2.0 (PN: 100-938-900), Sequencing Primer v2 (PN:101-847-900), SMRTbell Enzyme Cleanup Kit 2.0 (PN: 101-932-600) and Sequel II Binding Kit 2.2 (PN: 101-908-100). Finally, the libraries were sequenced on PacBio Sequel II platform on two SMRT cells using CCS mode. About 50G bp of CCS data were generated per replicate.
In vitro transcription and circularization of circRNAs in spike in control
Different regions of the Lambda bacteriophage DNA (NEB, N3011) were amplified to obtain a circRNA spike-in mix with circRNAs of different sizes and sequences using the listed primers in the supplementary primers.xlsx file. These amplicons were ligated with pUC19 vector backbone that contains partial sequences of a group I catalytic intron as described in (16) for circularization with NEBuilder® HiFi DNA Assembly (NEB, E5520). The vectors for the circRNA precursors were linearized by digesting with HpaI (NEB, R0105) for the 403 nt, 703 nt, 1,203 nt, and 1,703 nt circRNAs and NotI-HF (NEB, R3189) for the 1,953 nt and 2,703 nt circRNAs. Linearized vectors were in vitro transcribed with HiScribe T7 Quick High Yield RNA Synthesis Kit (NEB, E2050S) in 20 μl according to the manufacturer's protocol. DNA templates in the reaction were digested with 4 units of DNaseI (NEB, M0303) in a total volume of 50 μl and were purified with Monarch RNA Cleanup Kit (NEB, T2050S) according to the manufacturer's protocols. 60 μg of the circRNA precursors was first incubated in water at 70°C for 5 minutes for denaturation and was incubated at 55°C for 10 minutes in 1× NEB Buffer 2 and 2 mM GTP for circularization (16). To remove the remaining linear RNAs, the circularization reaction products were treated with 5 units of XRN-1 (NEB, M0338S) and 25 units of RppH (NEB, M0356S) at 37°C for 1 hour and repurified. The concentrations of the circRNAs were measured with Agilent 4200 TapeStation system using the High Sensitivity RNA Screen Tape (Agilent, 5067–5579) and equivalent molar concentrations of the different sized circRNAs were mixed and aliquoted to be spiked into RNaseR and Total IMCR-seq protocols in triplicates to compare the detection rates of each size per protocol. In addition to the circRNAs in the spike-in mix, we also ordered a smaller circRNA of 110 nt in length from bioSYNTHESIS Inc. to test the rolling circle reverse transcription cDNA product lengths for different sizes of circRNA templates.
RNaseR digestion assay
Two circular RNAs, one with a length of 789 nt and the other with a length of 1,742 nt, were generated as explained above. These circRNAs were combined and either 200 ng of this circRNA mixture or linear Fluc RNA was treated with 20 units of RNaseR in 1× RNase R buffer (Lucigen, RNR07250). The incubation took place at 37°C for 30 and 60 minutes. RNA quantities for circRNAs of different sizes and linear RNA were determined by measuring the intensity of corresponding peaks with Agilent 4200 TapeStation system using the High Sensitivity RNA Screen Tape (Agilent, 5067–5579). Peak intensities were normalized to the 1× buffer + water control of the respective RNA that was incubated at 37°C for 60 minutes to adjust for spontaneous RNA degradation.
Data analysis
PacBio consensus reads were generated using the SMRT tools (version 10.2.0.133434). Raw nanopore data were base-called in high accuracy mode with Dorado 7.0.9 and filtered by average quality (≥7). Sequencing reads in FASTQ format from either PacBio or Nanopore were first mapped to the rDNA reference sequences using minimap2 (27) and the unmapped reads were kept for downstream circRNA analysis. Non-rRNA reads from large PacBio and Nanopore sequencing datasets were split into subsets of approximately 3GB each. Then circular consensus sequences and candidate circRNAs were identified using the CIRI-long bioinformatics pipeline (19) for each subset. We applied an additional filter to remove consensus sequences that were called from any read that contains fewer than two copies of repeat sequences. Candidate circRNA sequences were identified based on the GRCh38 reference genome sequence and annotation. Then all candidate circRNA sequences from the subsets were aggregated together using the CIRI-long collapse program to determine BSJ sites and calculate the relative isoform usage index of each BSJ. Finally, for each BSJ, the read counts of its isoforms were calculated from the isoform usage index (*collapse.isoforms file) and the total read counts of that BSJ as isoform expression = isoform usage index * BSJ read counts.
Web-based DAVID tool (2021 update) (28,29) was used for GO term analysis, using the Homo sapiens database. Enrichment scores (the geometric mean (in –log scale) of member's P-values in a corresponding annotation cluster) were used to plot the results of GO term analysis.
Spearman's rank correlation was used to determine the reproducibility of circRNA BSJ/isoform counts in between technical replicates. For RNaseR digestion of circRNAs experiment, one-sample two-tailed t-test was used to compare the circRNA amounts with RNaseR to their corresponding no RNaseR control. To compare the relative amounts of circRNAs of different sizes with each other, paired two-tailed t-test was used because the digestions of both circRNAs were done together in one test tube for each of the triplicate experiments.
Reference datasets used in this study
The raw data of isoCirc (17) was downloaded from the GEO database (accession number GSE141693). The circFL-seq (19) datasets were downloaded from SRA (PRJNA722575). All the datasets were reanalyzed using the CIRI-long bioinformatics pipeline (19).
Results
Rolling circle reverse transcription of circRNA with group II intron RTs
Group II intron RTs have been shown to exhibit stronger strand displacement activity and higher processivity than retroviral RTs (21,22). This is because unlike retroviral RTs, which evolved to aid viruses in evading host defenses and exhibit poor processivity and fidelity, group II intron RTs evolved to reverse transcribe lengthy and highly structured intron. We hypothesized that these features would render group II intron RTs better enzymes for effectively performing rolling circle reverse transcription on circular RNA templates compared to retroviral RTs. To test this hypothesis, we compared rolling circle reverse transcription capability of two group II intron RTs: Induro RT (NEB) and TGIRT (Ingex) with two retroviral RTs: ProtoScript II (NEB) and SuperScript IV (ThermoFisher Scientific) with either specific (Figure 1A) or random primers (Figure 1C).
Figure 1.
Group II intron RTs allow rolling circle reverse transcription. (A) Schematic showing reverse transcription with sequence specific primers to linear or circular RNA templates. Linear and circular RNA templates are similar in size (∼1.2 kb) but different in sequence. (B) Agarose gel shows the cDNA products generated by MMLV-based (SS IV: SuperScript IV, PS II: ProtoScript II) and group II intron RTs (TGIRT, Induro RT) with L: linear or C: circular RNA templates. (C) Schematic showing reverse transcription of a circular RNA template with random hexamers. (D) RT-qPCR with varying amounts of starting circRNA template using PSII, SSIV, TGIRT, and Induro RT. (E) RT-qPCR of previously described circRNAs from human total brain RNA sample; primers that amplify actin and GAPDH linear RNAs and ZNF609, RIMS2, TULP4, XPO1, and HIPK3 circRNAs were used. Ct values were normalized with actin then each RT’s Ct was plotted relative to PSII levels (n = 3, error bars represent standard error of the mean.) Panels (A) and (C) are created with BioRender.com.
We first tested each RT with sequence specific primers on linear or circular RNA templates of approximately 1,200 nucleotides in size. The cDNA products were then analyzed using agarose gel electrophoresis (Figure 1B). We demonstrated that while all RTs can successfully reverse transcribe the linear template and the circular template, only group II intron RTs could make multiple copies (>10 copies in this case) and produce long cDNA products above 10 kb from circular templates showing their effective rolling circle reverse transcription activity (Figure 1B). It was previously demonstrated that Induro RT produces long cDNAs close to 10 kb even for short circRNAs as small as 42 nucleotides (30). To confirm this result, we performed a rolling circle reverse transcription experiment on three circular RNAs with different sizes: 110 nt, 403 nt and 1,289 nt. Our result showed that Induro RT consistently produced large cDNA products much longer than 7.2 kb (the largest band on the marker) for all three circular RNAs regardless of their sizes (Supplementary Figure S1A).
Secondly, we performed RT reactions with the same set of RTs, using random hexamers on total human brain RNA spiked in with different amounts of synthetic circular RNA and followed with qPCR (Figure 1C–E). The goals of these experiments were: (i) to compare group II intron RTs and retroviral RTs with varying circRNA template amounts and (ii) to compare the number of copies they can produce from endogenous circRNAs when primed with random hexamers. Induro RT produced more copies (lower Ct values) than the other RTs from all tested synthetic circular RNA spike-in amounts (Figure 1D). Similarly, when the levels of random endogenous circRNAs in the human brain RNA were analyzed by RT-qPCR, Induro RT showed the highest sensitivity (Figure 1E).
Collectively, these results show that Induro RT outperforms both tested retroviral RTs and the other group II intron RT TGIRT for rolling circle reverse transcription of circRNA templates producing higher numbers of transcribed copies and longer cDNA products.
IMCR-seq, a streamlined and versatile circRNA sequencing technique using Induro RT
We developed a new circRNA sequencing method called Induro-RT mediated circRNA-sequencing (IMCR-seq) by taking advantage of the robust rolling circle reverse transcription activity of the group II intron reverse transcriptase Induro RT and long read sequencing technologies.
The IMCR-seq workflow is summarized in Figure 2A. Notably, Induro RT consistently generates long cDNA molecules close to or above 10 kb regardless of the sizes of circRNAs (Figure 1B and Supplementary Figure S1A) with multiple consecutive copies leading to the formation of two distinct cDNA pools from linear and circular RNAs (Figure 2A). This allowed us to use a simple SPRI bead-based size selection method (26) to retain circRNA-derived large cDNAs longer than 5 kb (Supplementary Figure S1B) and eliminated most cDNAs coming from rRNAs and mRNAs. This approach effectively enriched circRNAs of a wide size range from total RNA without doing the rRNA depletion and RNaseR digestion treatments. The fact that only a small fraction (<1%) of the IMCR-seq raw reads map to rRNA sequences demonstrates the efficacy of our cDNA size selection strategy for removing linear RNAs from the pool (Supplementary Tables S1 and S2). After Induro RT rolling circle reverse transcription and circRNA enrichment by cDNA size selection, second strand synthesis and amplification are performed in one step with phi29 DNA Polymerase using random primers. Then, debranched long dsDNA is sequenced using any compatible long-read sequencing platform such as the Oxford Nanopore Technologies sequencers (Nanopore) or the PacBio instruments (Figure 2A).
Figure 2.
Validation of IMCR-seq. (A) IMCR-seq workflow (created with BioRender.com). (B) Heat maps showing Spearman correlation coefficients of human brain circRNA isoform read counts between the technical replicates of Nanopore and PacBio libraries of circ-enriched IMCR-seq (left) and total IMCR-seq (right). Isoforms with read count ≥5 were included in the analysis. (C) Comparison of IMCR-seq read counts with RT-qPCR quantification results of the same BSJs from human brain RNA. In the qPCR analysis, we utilized a mix of circRNA primers that were previously published and validated (23,25,41), along with custom-designed primers developed specifically for this study. The list of primers and sequences is supplied in the supplementary primers.xlsx Excel file. (D) Stacked barplots displaying the numbers of IMCR-seq detected human brain circRNAs that are common to the previously reported BSJs in the circBase (http://www.circbase.org) and/or the circAtlas (https://ngdc.cncb.ac.cn/circatlas/) databases or novel BSJs found with the circ-enriched protocol (left) or the total IMCR-seq protocol (right). BSJs with read count ≥5 were included in this analysis.
IMCR-seq can be performed directly from total RNA (total IMCR-seq) relying on the effective circRNA enrichment via the cDNA size selection. It can also be applied to circRNA enriched samples by incorporating additional circRNA enrichment steps of rRNA depletion and RNaseR treatment before the IMCR-seq protocol (circ-enriched IMCR-seq) (Figure 2A).
We validated IMCR-seq using circRNA-enriched and total RNA from human brain tissues and cultured HEK293 cells. For each sample, we conducted three technical replicates for both total and circ-enriched RNA and prepared Nanopore and PacBio libraries for each replicate to evaluate the robustness of the method. The size distributions of the raw reads and the consensus sequences deduced computationally from those reads by the CIRI-Long algorithm (19) agree well between technical replicates (Supplementary Figures S2 and S3). Furthermore, the read counts per circRNA isoform for both human brain and HEK293 samples are highly consistent between the technical replicates of both circ-enriched and total RNA IMCR-seq protocols, suggesting IMCR-seq is highly reproducible (Figure 2B, Supplementary Figure S4).
All full-length circRNA sequencing methods developed so far utilize Nanopore sequencing technology. We showed that the IMCR-seq method is adaptable to Nanopore and PacBio sequencing platforms for full-length circRNA analysis. The normalized read counts of the detected circRNAs from the PacBio triplicates correlate well with each other and with those of the Nanopore triplicates (Figure 2B and Supplementary Figure S4). The distributions of sequencing read lengths above 5 kb are very similar between PacBio and Nanopore libraries (Supplementary Figures S2A and S3A). The PacBio libraries have smaller proportions of shorter reads below 5 kb than the Nanopore libraries, possibly due to the BluePippin (Sage Science) enrichment for longer dsDNA at the PacBio library preparation step (Supplementary Figures S2A and S3A). 99% of the PacBio sequencing reads and 81% of the Nanopore reads contain at least 3 consecutive repeats (Figure 3C). The length distributions of the consensus reads derived from the repeats are consistent between the PacBio and Nanopore libraries (Supplementary Figures S2B and S3B).
Figure 3.
Comparison of IMCR-seq with previously published circRNA sequencing methods. (A) Bar plot showing circRNA detection efficiencies of human brain samples with Illumina sequencing, isoCirc (18), and circFL-seq (19). Illumina circRNA detection ratio was directly in reference to what was reported in Xin, Gao et al. isoCirc and circFL-seq raw data were downloaded from the GEO database and SRA. All the datasets including IMCR-seq were analyzed using the same computational pipeline of CIRI-long (18). Circles represent individual replicates. NP stands for Nanopore and PB stands for PacBio, circ: circ-enrichment with rRNA depletion and RNaseR, total: total RNA used directly in RT reaction. (B) Bar plot comparing numbers of BSJs detected with circ-enriched IMCR-seq with Nanopore sequencing, isoCirc and circFL-seq in HEK293 cells at a fixed number of raw reads (6 million reads per experiment). Pie charts above the bars show the percentage of BSJs that were supported by 2, 3, 4 or 5+ raw reads from each method. (C) Density plot comparing the distributions of the repeat numbers used for consensus generation per raw read. The results of circ-enriched IMCR-seq with Nanopore sequencing (NP) or PacBio sequencing (PB) for human brain RNA were compared with the human brain datasets of isoCirc and circFL-seq. (D) Density plot comparing the detected human brain circRNA isoform length distributions of isoCirc, circFL-seq, circ-enriched IMCR-seq with Nanopore sequencing, and total IMCR-seq with Nanopore sequencing. For (C) and (D), the histograms are normalized in a way that the area under the histograms equals 1, thus the y-axis shows the values for the probability density for the corresponding x-axis value.
We validated the IMCR-seq results by performing qPCRs to amplify and quantify 29 circRNAs which were previously reported as human circRNAs or detected by the IMCR-seq method in the human brain sample (supplementary file primers.xlsx). We showed that the read counts produced by both the circ-enriched and total IMCR-seq protocols correlate well with the qPCR results (Figure 2C), suggesting IMCR-seq provides accurate relative quantification of circRNA expression levels.
Additionally, to evaluate false positives of the IMCR-seq method, we included a mixture of linear RNA spike-ins of different sizes from Lexogen (SIRV-Set 4) in our human brain libraries. Although some linear spike-in sequences were detected in the raw reads, only a small percentage of these reads were also detected in the consensus sequences, and none were called circRNAs (Supplementary Figure S5). This result demonstrated the high specificity of the IMCR-seq method.
Circ-enriched and total IMCR-seq methods detected 84,483 and 93,566 different BSJs (BSJs with 5+ read count) from human brain RNA respectively. Of these BSJs, around 58,000 were already reported in one or both commonly used circRNA databases, circBase and circAtlas, further supporting the validity of IMCR-seq (Figure 2D). In addition, circ-enriched IMCR-seq and total IMCR-seq detected 26,642 and 37,087 novel circRNA BSJs which were not reported in either circRNA database.
These results demonstrate that the IMCR-seq method is a sensitive, reproducible, and versatile method for profiling full-length circRNAs.
IMCR-seq versus other full-length circRNA sequencing methods
We next benchmarked IMCR-seq with two other full-length circRNA sequencing methods, isoCirc (18) and CircFL-seq (20) using RNA from cultured HEK293 cells and human brain tissues.
When comparing circRNA results of technical replicates for HEK293, we found a greater overlap in detected circRNA isoforms between the technical replicates of IMCR-seq than isoCirc (18) (circFL-seq (20) replicate data is not available) (Supplementary Figure S6). Moreover, IMCR-seq produced higher Spearman correlation coefficients of the circRNA read counts between replicates than isoCirc (18) at both the BSJ and individual isoform levels (Supplementary Figure S7), suggesting IMCR-seq is a more reproducible method than isoCirc (18).
IMCR-seq also offers superior circRNA sequencing efficiency. Both IMCR-seq protocols (circ-enriched and total) produced higher percentages of circRNA reads (about 30–50%) than all the other methods for human brain and showed more than 3-fold, 5-fold, and 600-fold enrichment of circRNA reads compared to isoCirc (9.2%)(18), circFL-seq (5.3%) (20), and Illumina sequencing (∼0.05%)(18) respectively (Figure 3A). Similarly, IMCR-seq yielded the highest number of circRNA BSJs for HEK293 RNA compared to circFL-seq and isoCirc from the same number of raw reads (Figure 3B). Furthermore, IMCR-seq provided better coverage for the identified BSJs with a much higher percentage of BSJs (34.8%) having five or more read counts than isoCirc (11.7%) and circFL-seq (27.8%) (Figure 3B).
A higher repeat number of the circRNA sequence per cDNA concatemer improves the accuracy of circRNA sequencing (18) and this is especially critical for single-pass sequencing technologies such as Nanopore sequencing. Not surprisingly, IMCR-seq produced higher numbers of repeats per concatemer for individual circRNA sequences than the other two methods (Figure 3C), due to the high processivity of group II intron RTs (Figure 1B). For the human brain circ-enriched fminIMCR-seq datasets, 81% of the Nanopore reads and 99% of PacBio reads contain at least three consecutive repeats. In comparison, this ratio was 68.5% for the isoCirc dataset (18) and 21.4% for the circFL-seq dataset (20) (Figure 3C).
Finally, when we compared the lengths of the detected circRNA isoforms, both circ-enriched and total IMCR-seq protocols yielded more circRNAs of 1,000 nt or longer than isoCirc (18) and circFL-seq (20) (Figure 3D). 9% of human brain circRNAs were longer than 1000 nt in circ-enriched IMCR-seq, while this ratio was 49% for total IMCR-seq, 6% for isoCirc (18), and 0.2% for circFL-seq (20). The longest human brain circRNA detected by total IMCR-seq and circ-enriched IMCR-seq are 12,635 nt and 3,085 nt while the longest detected human brain circRNA by isoCirc (18) is 2,983 nt and circFL-seq (20) is 1,485 nt long.
These results demonstrate that IMCR-seq confers numerous advantages in comparison to extant full-length circRNA sequencing methodologies.
The RNaseR-free total IMCR-seq method enables accurate detection and quantification of long circRNAs
Total IMCR-seq detected significantly higher numbers of long circRNAs of 1,000 nt or bigger than the previously published full-length circRNA sequencing methods and circ-enriched IMCR-seq (Figure 3D). We also observed that the correlation of circRNA read counts of the two IMCR-seq protocols for short circRNAs below 700 nt is notably higher than that for all the circRNAs (Supplementary Figure S9). To investigate this further, we performed more qPCR experiments for 13 long circRNAs that are bigger than 1,000 nt. The results showed that while the correlation to the qPCR quantification remained high for the read counts generated by the total IMCR-seq protocol, the correlation for the circ-enriched workflow decreased when long circRNAs were added to the calculation (Supplementary Figure S8 and Figure 2C). We hypothesized that these observations could be due to the different circRNA enrichment strategies employed by the two protocols. To investigate the effect of the traditional RNaseR based circRNA enrichment method on the sizes of the detected circRNA, we prepared a circRNA spike-in control consisting of circRNAs of different sizes in similar quantities and conducted more IMCR-seq experiments. As expected, the ratio of circRNA read counts of the two different protocols decreases as the circRNA length increases, showing longer circRNAs are depleted in the circ-enriched protocol (rRNA depletion + RNase R) compared to the total RNA protocol (cDNA size selection) (Figure 4A). Previous studies have suggested that long circRNAs are more susceptible to RNaseR digestion (17,24). To validate this hypothesis, we treated two circRNAs of different sizes (1,742 nt and 789 nt) with RNaseR in 1× RNaseR buffer for 30 and 60 minutes or with water in 1× RNaseR buffer for 60 minutes at 37°C and measured the remaining circRNA amounts with Agilent TapeStation. We repeatedly observed that both circRNAs diminished over time with RNaseR treatment and this effect was stronger on the longer circRNA of 1,742 nt in length with a 60% decrease of the long circRNA at the end of the 60-minute treatment compared to a 40% decrease of the smaller circRNA compared to the controls that were incubated without the enzyme in the same reaction conditions (Figure 4B).
Figure 4.
Omission of the RNaseR based circRNA enrichment steps decreases the bias against longer circRNAs. (A) Plot showing the ratios of the circRNA read counts of the circ-enriched IMCR-seq protocol and the total RNA IMCR-seq protocol of different circRNA sizes. Error bars represent the standard error of the mean. (B) Plot showing the mean amounts of a linear RNA, a medium-size (789 nt) circRNA, and a large-size (1,742 nt) circRNA detected with Agilent TapeStation after in vitro RNaseR digestion at 30 and 60 minutes. For normalization, corresponding RNAs that were incubated at 37°C for 60 minutes in 1× reaction conditions without RNaseR were used (n = 3, *P= 0.023 calculated via one-sample, two-tailed t-test; ** P= 0.001 calculated via paired, two-tailed t-test). (C) Schematic showing experimental confirmation of long circRNAs detected with total IMCR-seq with divergent primers spanning the BSJ (a 7,166 nt-long CHIC1 circRNA isoform is illustrated here as an example, a similar divergent primer PCR strategy was followed for the other long circRNAs in Supplementary Figure S8). (D) Sanger sequencing of 7,166 nt CHIC1 circRNA isoform BSJ region.
The longest circRNA detected by the total IMCR-seq (12,635 nt) is 4 times longer than the circ-enriched IMCR-seq protocol (3,085 nt). To validate the long circRNAs that were detected from the RNaseR-free total IMCR-seq protocol, we chose three highly expressed long circRNAs (4,033 nt, 5,262 nt, and 7,166 nt) and designed divergent primers to specifically target the BSJ regions of the circRNA isoforms without amplifying the linear RNA isoforms (Figure 4C). Sanger sequencing results confirmed the amplification of the expected BSJs for all three selected long circRNAs (Figure 4D and Supplementary Figure S10).
These results highlight the advantage of the RNaseR-free total IMCR-seq protocol in minimizing bias from RNaseR treatment and providing a more complete and accurate circRNA profile.
Utilizing IMCR-seq to sequence circRNAs from non-small cell lung cancer tissue
We applied total-IMCR-seq coupled with PacBio sequencing to profile circRNAs in lung cancer tissue from a non-small cell lung cancer (NSCLC) patient. From 17,585,542 raw reads, we detected 114,484 different BSJs and 119,013 circRNA isoforms. These circRNA isoforms were expressed from 12,929 genes. To evaluate IMCR-seq NSCLC results, we compared the BSJs to the miOncoCirc lung cancer database. The miOncoCirc lung cancer database is a collection of BSJs from many patients. To mitigate patient-to-patient circRNA profile variabilities due to different genetic and epigenetic backgrounds, we created a consensus BSJ reference set by selecting the BSJs that were detected in at least 10 NSCLC patients and compared our IMCR-seq results to this reference set. Half of the consensus circRNAs from the NSCLC patients were detected by IMCR-seq in the patient sample we tested (Figure 5A), demonstrating IMCR-seq is an accurate and sensitive method for detecting circRNAs for cancer samples.
Figure 5.
IMCR-seq for NSCLC tissue. (A) Intersection BSJs that are shared by at least 10 non-small cell lung cancer patients from the miOncoCirc database. (B) Gene enrichment analysis of the genes, from which the top 6,500 BSJs with the highest scores are expressed, in the tested lung-tumor sample. Enrichment scores from DAVID (the geometric mean (in -log scale) of member's p-values in a corresponding annotation cluster) were used to generate the bar plot. (C) Examples of the circRNA-producing genes from the top 5 clusters from the GO term analysis. (D) Interactive genome browser view showing IMCR-seq read coverage track for the two most abundant circRNA isoforms of the RAB3IP gene from the analyzed lung tumor tissue.
It is expected that circRNAs with high expression levels are more likely to be detected by the traditional circRNA sequencing methods and recorded in the current circRNA databases. When focusing on the highly expressed circRNAs in the IMCR-seq dataset, we found that 8 out of the 10 most expressed BSJs and 548 out of the 1,000 most expressed BSJs in our study were reported in the miOncoCirc lung cancer circRNA database. Overall, 6,159 IMCR-seq detected BSJs existed in the miOncoCirc lung cancer database. The additional circRNAs detected by IMCR-seq are probably novel circRNAs that were not previously detected due to low expression and limitations of the existing methods.
We performed DAVID gene enrichment analysis on the 6,500 most abundant BSJs detected in the NSCLC sample and showed that the top circRNA-producing genes were significantly enriched for processes that are important for tumor development such as DNA repair, cell signaling, and cell cycle pathways (Figure 5B and C). Among the top circRNA producing genes in the analyzed lung cancer sample, were many that are commonly altered in lung cancers such as KRAS (altered in 30% of NSCLC patients (31)), ATM (altered in 7% of NSCLC patients (31)), and BRAF (altered in 5% of NSCLC patients (31)) (Figure 5C). The circRNAs expressed from these genes might be involved in the dysregulation of the expression of their corresponding linear RNAs and/or their downstream pathways thus playing a critical role in cancer development.
The circRNA isoform with the highest expression score in our lung cancer dataset is an exon-intron circRNA (eIciRNA) transcribed from the RAB3IP gene. This circRNA is so highly expressed that it constitutes 4.9% of the counts of all circRNA isoforms combined in our IMCR-seq dataset. Linear RNA expression from this gene in lung tissue is moderate in RNA-seq data with RAB3IP being the 5,121st most expressed gene (32). RAB3IP eIciRNA is 1,513 nucleotides in length and consists of exon 7, intron 7–8, and exon 8 (Figure 5D). There is also an alternatively spliced version of this circRNA in the IMCR-seq lung tumor dataset, which has the same BSJ (exons 7 and 8) but lacks the intron in between and therefore is much shorter (242 nt). IMCR-seq's capability to sequence full-length circRNAs allows us to unambiguously determine the isoform types and their relative abundance. The expression of the 242 nt RAB3IP circRNA isoform was also high but not comparable to the 1,513 nt eIciRNA version. In a previous report, both circRNAs were detected in healthy lung tissue, however, the 242 nt RAB3IP circRNA was reported to be the dominant isoform in healthy lung (18). The shorter variant of RAB3IP circRNA has been implicated in increasing cell proliferation in primary human endothelial cells (33), shown to promote osteosarcoma progression (34) and cause treatment resistance in prostate cancer (35). The detected high abundance of the 1,513 nt RAB3IP eIciRNA in our method could be due to our method's enhanced capability of detecting longer circRNAs compared to previously used methods for circRNA detection/sequencing and this circRNA could have a significant function in lung cancer biology. Alternatively, the high expression of RAB3IP eIciRNA might be specific to this patient's cancer. Further investigations on different RAB3IP circRNA isoforms in different cancers with an unbiased circRNA sequencing method like IMCR-seq could help acquire new knowledge and insights into the role and biology of these circRNAs in cancer.
Total RNA IMCR-seq enables circRNA sequencing from liquid biopsy
As IMCR-seq works effectively on total RNA and does not require ribosomal RNA depletion and RNaseR treatment, it works for much less input material than other full-length circRNA sequencing methods. This enables IMCR-seq to profile circRNAs from scarce biological samples such as human plasma.
To demonstrate this application, we applied IMCR-seq to two human plasma samples: one of a non-small cell lung cancer (NSCLC) patient and one of a control healthy individual. We extracted about 10 ng of total RNA from 3 ml of each of the plasma samples and performed total IMCR-seq with PacBio sequencing. We detected 317 different circRNA isoforms from the control individual and 259 circRNA isoforms from the NSCLC patient. These circRNA isoforms were of varying lengths, with many being longer than 1,000 nucleotides, suggesting that even the long circRNAs can resist the harsh conditions of the extracellular environment (Figure 6A).
Figure 6.
IMCR-seq enables circRNA profiling from liquid biopsies. (A) Histogram showing the length distribution of the circRNA isoforms detected with IMCR-seq from human plasma. (B) Interactive genome browser showing IMCR-seq read coverage tracks (light blue) for the circRNA isoforms of the RAB3IP gene from the NSCLC patient plasma (top track) and the control plasma (the 3rd track).
When we compared the circRNA expression at the gene level in NSCLC plasma versus non-NSCLC plasma, we saw a small intersection with only 17 common genes. However, when we compared circulating circRNAs detected in NSCLC plasma with the IMCR-seq results of an NSCLC tissue sample, we saw a bigger intersection of 61 circRNA-expressing genes. RAB3IP, a 1,513 nt-long eIciRNA was the most abundant circRNA isoform detected in the NSCLC tumor tissue. We detected the same isoform in the NSCLC plasma but not in the healthy plasma sample (Figure 6B). In the healthy plasma, a different isoform with exons 7 and 8 was detected at a lower level (Figure 6B). Among the 61 common circRNA-producing genes shared between the NSCLC tumor tissue and plasma, many genes were shown to play direct roles in lung cancer such as PRKN, WNK1, BPTF, CORO1C, ITGB1, XPO1 (35–39), and proto-oncogene FLI1 of which circRNAs were shown to drive lung cancer progression (40). Additionally, some other genes were reported to play important roles in cancer progression such as CXCL6, POLQ, RUNX1, JARID2, LTBP1, PDE5A, and proto-oncogenes AFF1 and RUNX1. These results suggest that IMCR-seq was able to detect tumor-specific circRNAs from limited human plasma material.
Discussion
In this work, we have shown that Group II intron RTs consistently produce large cDNAs from circular RNA templates of various sizes due to their high processivity and strand displacement activity. Taking advantage of this property of group II intron RTs, we developed an RNaseR-free, highly efficient, and comprehensive full-length circRNA sequencing method called IMCR-seq. IMCR-seq can be coupled with Nanopore or PacBio sequencing technologies and generates reproducible and accurate circRNA results.
IMCR-seq is the only full-length circRNA sequencing method available that does not require rRNA depletion and RNase R treatment and can be directly performed on untreated total cellular RNA. We confirmed that RNase R can degrade circRNAs probably due to its star activity as previously suggested (17,24), and further showed that this effect is exacerbated as the size of the circRNAs increases. Hence, the elimination of RNase R treatment confers a significant advantage and enables IMCR-seq to capture longer circRNAs, produce less biased quantification, and ultimately provide a more complete and accurate picture of the circRNA landscape in the cell. When applied to human brain total RNA, IMCR-seq detects many circRNAs that are longer than 2,000 nt, which accounts for more than 20% of all circRNA isoforms. Among them, 2,586 circRNA isoforms are 5,000 nt or longer. We were able to experimentally validate the 3 long circRNAs that we randomly selected for testing. The longest circRNA we confirmed is 7,166 nt long and contains exons 2 and 3 and intron 2–3 of the CHIC1 gene.
In addition, because the IMCR-seq protocol does not rely on rRNA depletion, it can be used to sequence circRNAs from any organism. The simple and non-destructive size selection-based enrichment strategy of IMCR-seq makes it possible to work with very low amounts of RNA input and thus enables new applications not accessible by other methods for circRNA analysis of scarce disease tissue samples or patient biological liquid samples such as plasma, saliva , and urine. We demonstrated such application by successfully applying IMCR-seq to 10 ng total RNAs extracted from human healthy and lung cancer plasma samples and detected cancer-specific circRNAs as potential biomarkers. The lowest RNA input limit of IMCR-seq remains to be determined. We believe IMCR-seq can potentially be adapted for single-cell applications, which will help advance our knowledge of cell-type specific circRNA expressions and their roles in complex biological contexts.
IMCR-seq can also be coupled with conventional circRNA enrichment methods such as rRNA depletion and/or RNaseR digestion. To distinguish it from the original rRNA depletion-free and RNaseR-free IMCR-seq protocol, we refer to this protocol as ‘circ-enriched IMCR-seq’ and the original protocol as ‘total IMCR-seq’ meaning it can be applied to total RNA directly. The total IMCR-seq protocol detected significantly higher numbers of long circRNAs of 1 kb or bigger than circ-enriched IMCR-seq and all the published full-length circRNA sequencing methods (Figure 3D). On the other hand, the circ-enriched IMCR-seq protocol, with the additional RNase R digestion, offers more complete removal of linear RNA from the library. It is particularly beneficial for samples with low circRNA to linear RNA ratios.
Limitations of the method
IMCR-seq is designed to enrich circRNAs thereby offering relative rather than absolute quantification of circRNA expression. Consequently, IMCR-seq read counts are not directly comparable with linear RNA expression. Instead, IMCR-seq provides information on the relative abundance between different circRNAs and allows detection of circRNAs that cannot be detected by existing methods. In addition, while the IMCR-seq read counts generally agree well with the RT-qPCR quantification, the correlation is not perfectly linear. Even though total IMCR-seq protocol eliminates biases from extensive RNA manipulation, notably RNaseR digestion, it may introduce amplification biases due to its reliance on phi29 DNA polymerase. Finally, while proof-of-principle experiments showcase the potential of IMCR-seq for profiling circRNA of human plasma samples, further benchmarking and optimization are required to extend and establish its applications to low input samples in both research and clinical settings.
Supplementary Material
Acknowledgements
The authors would like to thank their NEB colleagues Jennifer Ong, William Jack, Hoong Chuin Lim, Robert Trachman, and Sean Johnson for helpful discussions and feedback on the manuscript. This article's graphical abstract and certain figures (indicated in the figure legends) were created with BioRender.com.
Contributor Information
Irem Unlu, New England Biolabs Inc., Beverly, MA 01915, USA.
Sean Maguire, New England Biolabs Inc., Beverly, MA 01915, USA.
Shengxi Guan, New England Biolabs Inc., Beverly, MA 01915, USA.
Zhiyi Sun, New England Biolabs Inc., Beverly, MA 01915, USA.
Data availability
All the PacBio and Nanopore sequencing library data (raw and processed) are deposited to NCBI GEO database under accession: GSE248303.
Supplementary data
Supplementary Data are available at NAR Online.
Funding
New England Biolabs, Inc. Funding for open access charge: New England Biolabs, Inc.
Conflict of interest statement. Authors are current or previous employees of New England Biolabs, Inc. New England Biolabs is a manufacturer and vendor of molecular biology reagents, including several enzymes and buffers used in this study. This affiliation does not affect the authors’ impartiality, adherence to journal standards and policies, or availability of data. New England Biolabs has filed a patent application based on the inventions in this paper.
References
- 1. Jeck W.R., Sorrentino J.A., Wang K., Slevin M.K., Burd C.E., Liu J., Marzluff W.F., Sharpless N.E. Circular RNAs are abundant, conserved, and associated with ALU repeats. RNA. 2013; 19:141–157. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Memczak S., Jens M., Elefsinioti A., Torti F., Krueger J., Rybak A., Maier L., Mackowiak S.D., Gregersen L.H., Munschauer M. et al. Circular RNAs are a large class of animal RNAs with regulatory potency. Nature. 2013; 495:333–338. [DOI] [PubMed] [Google Scholar]
- 3. Zhang Y., Zhang X.O., Chen T., Xiang J.F., Yin Q.F., Xing Y.H., Zhu S., Yang L., Chen L.L. Circular intronic long noncoding RNAs. Mol. Cell. 2013; 51:792–806. [DOI] [PubMed] [Google Scholar]
- 4. Kristensen L.S., Andersen M.S., Stagsted L.V.W., Ebbesen K.K., Hansen T.B., Kjems J. The biogenesis, biology and characterization of circular RNAs. Nat. Rev. Genet. 2019; 20:675–691. [DOI] [PubMed] [Google Scholar]
- 5. Kristensen L.S., Jakobsen T., Hager H., Kjems J. The emerging roles of circRNAs in cancer and oncology. Nat. Rev. Clin. Oncol. 2022; 19:188–206. [DOI] [PubMed] [Google Scholar]
- 6. Fang Y., Wang X., Li W., Han J., Jin J., Su F., Zhang J., Huang W., Xiao F., Pan Q. et al. Screening of circular RNAs and validation of circANKRD36 associated with inflammation in patients with type 2 diabetes mellitus. Int. J. Mol. Med. 2018; 42:1865–1874. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Holdt L.M., Stahringer A., Sass K., Pichler G., Kulak N.A., Wilfert W., Kohlmaier A., Herbst A., Northoff B.H., Nicolaou A. et al. Circular non-coding RNA ANRIL modulates ribosomal RNA maturation and atherosclerosis in humans. Nat. Commun. 2016; 7:12429. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Li H., Li K., Lai W., Li X., Wang H., Yang J., Chu S., Wang H., Kang C., Qiu Y. Comprehensive circular RNA profiles in plasma reveals that circular RNAs can be used as novel biomarkers for systemic lupus erythematosus. Clin. Chim. Acta. 2018; 480:17–25. [DOI] [PubMed] [Google Scholar]
- 9. Shao Y., Chen Y. Roles of circular RNAs in neurologic disease. Front. Mol. Neurosci. 2016; 9:25. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Burd C.E., Jeck W.R., Liu Y., Sanoff H.K., Wang Z., Sharpless N.E. Expression of linear and novel circular forms of an INK4/ARF-associated non-coding RNA correlates with atherosclerosis risk. PLoS Genet. 2010; 6:e1001233. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Li Y., Zheng Q., Bao C., Li S., Guo W., Zhao J., Chen D., Gu J., He X., Huang S. Circular RNA is enriched and stable in exosomes: a promising biomarker for cancer diagnosis. Cell Res. 2015; 25:981–984. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Bahn J.H., Zhang Q., Li F., Chan T.M., Lin X., Kim Y., Wong D.T., Xiao X. The landscape of microRNA, piwi-interacting RNA, and circular RNA in human saliva. Clin. Chem. 2015; 61:221–230. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Brown J.R., Chinnaiyan A.M. The potential of circular RNAs as cancer biomarkers. Cancer Epidemiol. Biomarkers Prev. 2020; 29:2541–2555. [DOI] [PubMed] [Google Scholar]
- 14. Hansen E.B., Fredsoe J., Okholm T.L.H., Ulhoi B.P., Klingenberg S., Jensen J.B., Kjems J., Bouchelouche K., Borre M., Damgaard C.K. et al. The transcriptional landscape and biomarker potential of circular RNAs in prostate cancer. Genome Med. 2022; 14:8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Wesselhoeft R.A., Kowalski P.S., Parker-Hale F.C., Huang Y., Bisaria N., Anderson D.G. RNA circularization diminishes immunogenicity and can extend translation duration In vivo. Mol. Cell. 2019; 74:508–520. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Wesselhoeft R.A., Kowalski P.S., Anderson D.G. Engineering circular RNA for potent and stable translation in eukaryotic cells. Nat. Commun. 2018; 9:2629. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Dahl M., Daugaard I., Andersen M.S., Hansen T.B., Gronbaek K., Kjems J., Kristensen L.S. Enzyme-free digital counting of endogenous circular RNA molecules in B-cell malignancies. Lab. Invest. 2018; 98:1657–1669. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Xin R., Gao Y., Gao Y., Wang R., Kadash-Edmondson K.E., Liu B., Wang Y., Lin L., Xing Y. isoCirc catalogs full-length circular RNA isoforms in human transcriptomes. Nat. Commun. 2021; 12:266. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Zhang J., Hou L., Zuo Z., Ji P., Zhang X., Xue Y., Zhao F. Comprehensive profiling of circular RNAs with nanopore sequencing and CIRI-long. Nat. Biotechnol. 2021; 39:836–845. [DOI] [PubMed] [Google Scholar]
- 20. Liu Z., Tao C., Li S., Du M., Bai Y., Hu X., Li Y., Chen J., Yang E. circFL-seq reveals full-length circular RNAs with rolling circular reverse transcription and nanopore sequencing. eLife. 2021; 10:e69457. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Lambowitz A.M., Zimmerly S. Group II introns: mobile ribozymes that invade DNA. Cold Spring Harb. Perspect. Biol. 2011; 3:a003616. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Stamos J.L., Lentzsch A.M., Lambowitz A.M. Structure of a thermostable group II intron reverse transcriptase with template-primer and its functional and evolutionary implications. Mol. Cell. 2017; 68:926–939. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Xiao M.S., Wilusz J.E. An improved method for circular RNA purification using RNase R that efficiently removes linear RNAs containing G-quadruplexes or structured 3' ends. Nucleic Acids Res. 2019; 47:8755–8769. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Nielsen A.F., Bindereif A., Bozzoni I., Hanan M., Hansen T.B., Irimia M., Kadener S., Kristensen L.S., Legnini I., Morlando M. et al. Best practice standards for circular RNA research. Nat. Methods. 2022; 19:1208–1220. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. Salzman J., Chen R.E., Olsen M.N., Wang P.L., Brown P.O. Cell-type specific features of circular RNA expression. PLoS Genet. 2013; 9:e1003777. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Stortchevoi A., Kamelamela N., Levine S.S. SPRI beads-based size selection in the range of 2-kb. J. Biomol. Tech. 2020; 31:7–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27. Li H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. 2018; 34:3094–3100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Huang da W., Sherman B.T., Lempicki R.A. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat. Protoc. 2009; 4:44–57. [DOI] [PubMed] [Google Scholar]
- 29. Sherman B.T., Hao M., Qiu J., Jiao X., Baseler M.W., Lane H.C., Imamichi T., Chang W. DAVID: a web server for functional enrichment analysis and functional annotation of gene lists (2021 update). Nucleic Acids Res. 2022; 50:W216–W221. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30. Maguire S., Guan S. Rolling circle reverse transcription enables high fidelity nanopore sequencing of small RNA. PLoS One. 2022; 17:e0275471. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31. AACR Project GENIEConsortium AACR Project GENIE: powering precision medicine through an international consortium. Cancer Discov. 2017; 7:818–831. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32. Uhlen M., Fagerberg L., Hallstrom B.M., Lindskog C., Oksvold P., Mardinoglu A., Sivertsson A., Kampf C., Sjostedt E., Asplund A. et al. Proteomics. Tissue-based map of the human proteome. Science. 2015; 347:1260419. [DOI] [PubMed] [Google Scholar]
- 33. Josipovic N., Ebbesen K.K., Zirkel A., Danieli-Mackay A., Dieterich C., Kurian L., Hansen T.B., Papantonis A. circRAB3IP modulates cell proliferation by reorganizing gene expression and mRNA processing in a paracrine manner. RNA. 2022; 28:1481–1495. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34. Tang G., Liu L., Xiao Z., Wen S., Chen L., Yang P. CircRAB3IP upregulates twist family BHLH transcription factor (TWIST1) to promote osteosarcoma progression by sponging miR-580-3p. Bioengineered. 2021; 12:3385–3397. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35. Chen D., Wang Y., Yang F., Keranmu A., Zhao Q., Wu L., Han S., Xing N. The circRAB3IP mediated by eIF4A3 and LEF1 contributes to enzalutamide resistance in prostate cancer by targeting miR-133a-3p/miR-133b/SGK1 pathway. Front. Oncol. 2021; 11:752573. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36. Wang L., Zeng C., Chen Z., Qi J., Huang S., Liang H., Huang S., Ou Z. Circ_0025039 acts an oncogenic role in the progression of non-small cell lung cancer through miR-636-dependent regulation of CORO1C. Mol. Cell. Biochem. 2022; 477:743–757. [DOI] [PubMed] [Google Scholar]
- 37. Chang R., Xiao X., Fu Y., Zhang C., Zhu X., Gao Y. ITGB1-DT facilitates lung adenocarcinoma progression via forming a positive feedback loop with ITGB1/wnt/beta-Catenin/MYC. Front. Cell Dev. Biol. 2021; 9:631259. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38. Xiu M., Li L., Li Y., Gao Y. An update regarding the role of WNK kinases in cancer. Cell Death. Dis. 2022; 13:795. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39. Li Y., Lu X., Zhang J., Liu Q., Zhou D., Deng X., Qiu Y., Chen Q., Li M., Yang G. et al. Significance of Parkinson Family genes in the prognosis and treatment outcome prediction for lung adenocarcinoma. Front Mol. Biosci. 2021; 8:735263. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40. Li L., Li W., Chen N., Zhao H., Xu G., Zhao Y., Pan X., Zhang X., Zhou L., Yu D. et al. FLI1 Exonic circular RNAs as a novel oncogenic driver to promote tumor metastasis in small cell lung cancer. Clin. Cancer Res. 2019; 25:1302–1317. [DOI] [PubMed] [Google Scholar]
- 41. Rybak-Wolf A., Stottmeister C., Glazar P., Jens M., Pino N., Giusti S., Hanan M., Behm M., Bartok O., Ashwal-Fluss R. et al. Circular RNAs in the mammalian brain are highly abundant, conserved, and dynamically expressed. Mol. Cell. 2015; 58:870–885. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All the PacBio and Nanopore sequencing library data (raw and processed) are deposited to NCBI GEO database under accession: GSE248303.







