Abstract
Circular RNA transcripts were first identified in the early 1990s but knowledge of these species has remained limited, as their study has been difficult through traditional methods of RNA analysis. Now, novel bioinformatic approaches coupled with biochemical enrichment strategies and deep sequencing have allowed comprehensive studies of circular RNA species. Recent studies have revealed thousands of endogenous circular RNAs (circRNAs) in mammalian cells, some of which are highly abundant and evolutionarily conserved. Evidence is emerging that some circRNAs might regulate microRNA (miRNA) function, and roles in transcriptional control have also been suggested. Therefore, study of this class of non-coding RNAs has potential implications for therapeutic and research applications. We believe the key future challenge to the field will be to understand the regulation and function of these unusual molecules.
Circular RNAs (circRNAs) are a recent addition to the growing list of types of non-coding RNA. Although the existence of circular transcripts has been known for at least 20 years1, such molecules were long considered molecular flukes—artifacts of aberrant RNA splicing2 or specific to a few pathogens, such as the Hepatitis δ virus3 and some plant viroids4. However, recent work has revealed large numbers of circRNAs that are endogenous to mammalian cells, and many of these are abundant and stable. CircRNAs can arise from exons (exonic circRNA) or introns (intronic circRNA); these are distinct species with independent modes of generation. Potential functions in the regulation of gene expression are emerging for both exonic and intronic circRNAs5–7.
Most circRNAs have eluded identification until recently for several reasons. Circular RNAs, unlike miRNAs and other small RNAs, are not easily separated from other RNA species by size or electrophoretic mobility. Commonly used molecular techniques that require amplification and/or fragmentation destroy circularity, and because circRNAs have no free 3’ or 5’ end, they cannot be found by molecular techniques that rely on a polyadenylated free RNA end (such as rapid amplification of cDNA ends (RACE), or poly(A) enrichment of samples for RNA-seq studies). Furthermore, a key feature of circRNAs, an out-of-order arrangement of exons known as a ‘backsplice’ (described below), is not unique to circRNAs, and early RNA-seq mapping algorithms filtered out such sequences. These problems have recently been addressed through the development of exonuclease-based enrichment approaches, novel bioinformatic tools, sequencing with longer reads and higher throughput, and sequencing of ribosomal RNA (rRNA)-depleted RNA libraries (rather than polyA-enriched libraries).
The first hint of endogenously produced circRNAs emerged in the early 1990’s from studies of the DCC transcript in human cells1. The authors of that study described transcripts with exons out of the expected order: 5’ exons were ‘shuffled’ downstream of 3’ exons. Despite the non-canonical ordering, the exons were complete and used the usual splice donor and acceptor sites. This arrangement was referred to as ‘exon shuffling’ (distinct from the evolutionary process described by Gilbert8). The observed shuffled transcripts were less abundant than the expected transcripts by several orders of magnitude and were non-polyadenylated, predominantly cytoplasmic and expressed in human and rat tissues. The authors speculated that such a product might emerge from intra-molecular (cis) splicing, which would result in an exonic circRNA. A site at which the 3’ ‘tail’ of an expected downstream exon within the gene is joined to the 5’ ‘head’ of an exon that is normally upstream is referred to a ‘backsplice’. Early studies also detected circular RNAs by electron microscopy3,9, but this approach cannot easily distinguish circular RNAs from RNA lariats (which are byproducts of RNA splicing)10.
Subsequent reports identified shuffled transcripts from several other genes, including ETS-12,11, SRY12 and cytochrome P450 2C24 (CYPIIC24)13,14 in human, mouse and rat cells. In each case, discovery began with the serendipitous observation of PCR products with backsplice sequences. SRY is usually unspliced, but sites with the canonical splice site GT/AG sequence motifs were involved in the backsplice, suggesting the involvement of the canonical spliceosome. The splice junctions used in the exonic circRNA forms of ETS-1 and CYPIIC24 used splice donor and acceptor sites also involved in forward splicing11,13. A few additional circular RNAs were identified in the ensuing two decades15–18, but they were generally much less abundant than the linear products of their source gene. Therefore, before the era of massively parallel sequencing, circular RNAs were considered oddities of uncertain importance.
In this review we discuss methods for the identification of endogenous circRNAs, including molecular methods and genome-wide approaches, with a focus on the advantages and disadvantages of various techniques. Next we consider the findings from these genomic studies, focusing on exonic circRNAs, and describe the biochemical properties of circRNAs in vivo, including methods for validation of circularity. Finally, we discuss known and predicted circRNA functions and speculate on possible applications of the new-found understanding of this molecular species.
Detection methods
Identifying backsplices
Although progress in our knowledge of circRNAs has occurred mainly because of sequencing-based methods, analysis of the molecular characteristics of circRNAs is fundamental to the various strategies that we discuss in this section. Observation of sequences consistent with backsplice formation is crucial evidence of exonic circRNA production. We define an ‘apparent backsplice’ sequence as any case in which the ordering of the exons in a sequence is reversed relative to the annotated template. Importantly, apparent backsplice sequences may be produced by mechanisms other than formation of exonic circRNA: reverse transcriptase template switching; tandem duplication; and RNA trans-splicing (Fig. 1A).
Reverse transcriptase template switching (Fig. 1A, i) is an artifact of cDNA synthesis, occurring when an extending cDNA molecule dissociates from its template RNA and resumes extension from another RNA template, often in a homology-dependent manner. This effect produces spurious evidence of backsplice-containing products and is known to confound the analysis of rare splicing products19,20. However template switching is largely random and is not expected to produce abundant cDNA molecules of identical sequence. Therefore, high abundance of a particular apparent backsplice sequence in a cDNA library offers evidence that the sequence is also present in the RNA template. This may be assessed either by identification of multiple unique reads in deep sequencing data or by ‘divergent’ qPCR primers. These divergent primers, are oriented to amplify away from each other in a genomic context, but become ‘convergent’ and amplify a discrete amplicon when a backsplicing event brings outside sequences together (see Fig. 1B for illustration). Alternatively, the presence of the backsplice sequence in the RNA pool may be assessed directly by RNase protection2 or northern blot probing for the backsplice sequence (Fig. 1B).
Apparent backsplice sequences can also arise from tandem DNA duplications that can generate duplicated exons within a gene. When these sequences are transcribed, the mRNA contains an apparent backsplice sequence (Fig. 1A, ii) due to the difference between the annotated template sequence and the DNA template present in the cell. In addition, trans-splicing—a process in which two distinct molecules participate in splicing —can generate apparent backsplices when it occurs between two RNA molecules originating from the same gene (Fig. 1A, iii).
There are several approaches to distinguish such species from true exonic circRNA. Linear exonic RNAs will usually have 3’ polyadenylation, whereas circles have no 3’ end. Exonic circRNAs migrate more slowly in a gel than linear RNA of the same length, and this effect is augmented by increased gel cross-linking21 (Fig. 1B). However exonic circRNAs also contain less total nucleotide sequence than full length, trans-spliced, or tandem duplicated transcripts from the same gene, and therefore will migrate faster in a gel that has low cross-linking (Fig. 1B). Standard (or virtual22,23) northern blot can be used to assess these characteristics. A more conclusive assay uses either weak hydrolysis or targeted RNase H degradation12: a circular RNA will be linearized into a single product of a predictable size after RNase H degradation or after a single nick by hydrolysis (Fig. 1B). Additional methods offering strong evidence of circularity are 2D gel electrophoresis21 and gel trap electrophoresis24,25 (Fig. 1B): in 2D gel electrophoresis, circRNAs are revealed by their poor migration through highly cross-linked gels relative to less cross-linked gels. In gel trap electrophoresis, circRNA mixed with melted agarose becomes trapped by cross-links and does not migrate in an applied electric field. Enzymatic methods can also provide evidence of circularity. RNase R exonuclease26, tobacco acid phosphatase5 and terminator exonuclease treatment5 leave circular RNA intact but efficiently degrade most linear RNAs. In all three cases, quantification of a specific RNA species before and after treatment should reveal enrichment of circular transcripts. Finally, with sufficiently long sequencing reads or paired-end reads, it should be possible to identify sequences that are inconsistent with circRNA27. These could include an apparent backsplice sequence, but also include sequences from exons outside of the backsplice coordinates. For example an apparent backsplice from exon 3 to exon 2 in a longer sequence that includes exon 4 or exon 5 (see Fig 3a below) would be best explained by a trans-spliced RNA. The above methods have limitations and are best used in combination to validate circRNAs.
It is also important to distinguish exonic circRNA from RNA lariats. Lariat RNA is formed during canonical RNA splicing; it is mostly intronic and is biochemically distinct from exonic circular species by the presence of a 2’-5’ carbon linkage at the splicing branch-point. Through genomic analysis of lariats by assembly of highly expressed, non-polyadenylated intronic sequences7, it has recently become apparent that many lariat RNAs may be more stable than was previously appreciated. These stable lariat RNAs exhibit degradation of their 3’ tails, leaving a remnant molecule7. Such lariat-products have been called ‘circular intronic RNA’ (we use the nomenclature ‘intronic circRNA’) and are further distinguished from exonic circRNA by the presence of a 2’-5’ junction7. Exonic circRNAs do not feature a 2’-5’ linkage but instead consist of 3’-5’ links throughout the molecule.
Lariat RNAs behave similarly to exonic circRNA in the assays described above. They are largely exonuclease insensitive26, migrate more slowly than linear molecules21 and will form a single band when nicked within the loop component of the lariat. However, exonic circRNA is easily distinguished from intronic circRNA and lariat RNA by the features of the apparent backsplice sequence. Reverse transcription can occur inefficiently across the branch-point of a lariat, producing sequence that is superficially similar to a backsplice in having juxtaposed upstream and downstream sequences. But these branch-point-traversing sequences include intronic sequences, specifically the canonical GT of the splice donor site and sequences 5’ of the branch-point nucleotide. In addition, when reverse transcriptase traverses the 2’-5’ junction one or more untemplated bases are generated, which are readily identifiable in sequencing23,28. This signature can be used to rule out exonic circRNA origins for the intronic products. In principle, lariat RNAs could be preferentially depleted from other circular species by treatment with debranching enzyme before exonuclease digestion, as this enzyme selectively hydrolyzes 2’-5’ linkages. Therefore, although lariats are common circular RNA molecules, they can be readily distinguished from exonic circRNA.
Genomic methods
Recent genome-wide studies of circRNA have been enabled by developments in sequencing technology (deeper sequencing with longer read lengths), better algorithms for mapping RNA to its genomic source, and ribosomal RNA depletion strategies that enable sequencing of non-polyadenylated RNA. In general, two approaches have been taken: first, using a list of candidate junctions generated from existing transcript models27,29; or alternatively, identifying junctions by matching reads to the genomic sequence as is done in spliced alignment algorithms (for more on alignment of deep sequencing data, see recent review30). Both of these methods have identified circRNAs that have then been validated by sequencing, RNase R exonuclease testing and other methods.
The first identification of exonic circRNAs in a genomic study occurred serendipitously through using independent mapping of paired-end reads sequenced from opposite ends of a single cDNA fragment27. This approach identified an unexpected abundance of fragments in which two read pairs mapped to the same gene but were in the opposite order from that expected from the annotation (Fig. 2A). Realizing that these likely arose from circular RNAs, the authors imputed the location of a backsplice using the existing gene annotations. This approach was extended by scanning for out-of-order paired-end reads that were concentrated in fixed windows tiled across a gene (Fig. 2B). This approach has the advantage of being a fast way to analyze rRNA-depleted libraries but it is largely candidate-based, and so does not detect circRNAs from unannotated transcripts and does not provide direct evidence of circularity. However, qPCR validation assays on a number of individual species discovered by this method revealed that the transcripts were predominantly RNase R exonuclease resistant and lacked the expected properties of backsplice-containing linear RNAs (trans-spliced or duplication products) 27.
Other methods using rRNA-depleted libraries include identifying reads with apparent backsplice sequence from rRNA-depleted RNA-seq data from mammalian and nematode cells, without using a candidate-based approach6. The authors mapped reads to genomic locations de novo and identified apparent backsplice sequences in individual reads. They chose reads that could not be mapped directly to the genome and then mapped the two ends of a single read separately. By using single reads, the authors were able to identify the location of the putative backsplice to single nucleotide resolution (Fig. 2C). The authors also selected only apparent backsplice sequences that were flanked by GT/AG splice sites in the genomic context. Therefore, this method can identify unannotated splice sites but it might be less sensitive than a candidate-based approach.
A more nuanced approach to candidate-based analysis was recently developed using rRNA-depleted RNA-seq without exonuclease enrichment29. This method uses deep sequencing and 75bp paired-end reads to identify apparent backsplice sequences that map to a set of candidate junctions generated using existing gene annotations. The authors divided pairs of reads in which one read contained an apparent backsplice into two groups: a group in which the non-backsplice-containing read mapped to an exon between the backspliced exons, and therefore could potentially arise from a circRNA (Fig. 3A); and a group in which the second read in the pair mapped to an exon outside of the backspliced exons, and therefore could not be explained by a circRNA. This second group was assumed to represent artifacts of sequencing. The distributions of mapping quality statistics within the two groups was then used to generate a confidence score for each junction observed (Fig. 3B). This approach allowed the use of a false discovery rate cutoff rather than an arbitrary read depth–based threshold.
In addition to sequence-only based approaches, a biochemical approach for the genome-wide identification of circRNA has been described by others in archaea31 and by our group in mammals23. This technique, “CircleSeq” (Fig. 4), uses RNase R digestion prior to high-throughput sequencing to identify species that are RNase R resistant (Fig. 4A). The method uses a mapping algorithm capable of identifying apparent backsplice sequences (MapSplice32), rather than an algorithm requiring a choice of exon order. In mammals, but not archaea, rRNA depletion is required. Using this technique, Identification of exonic circRNAs is possible on the basis of two features: first, backsplice-containing reads are identified using a segmented mapping approach (Fig. 4B); and second, reads derived from circular species should be significantly enriched in the RNase R–treated sample compared to mock treated control (8–16 fold on average, though variable levels of enrichment were observed). The exons of linear RNAs should be depleted by exonuclease digestion, as should splice junctions not present in circRNAs. An example of data for a circular RNA (cANRIL)23 exhibiting these features is shown in Figure 4C.
CircleSeq also identifies lariat RNAs23, though these are easily distinguished from circRNAs in the sequencing data. As mentioned, lariats show RNase R enrichment of the loop sequence, RNase R depletion of the lariat tail, and the presence of ‘branch-point sequences’. Branch-point sequences resemble apparent backsplice sequences in that parts of the sequence are misordered compared to their genomic annotation. In branch-point sequences, though, the sequences at the backsplice are intronic and contain untemplated bases produced during cDNA synthesis. All abundant species identified by CircleSeq that contain a true backsplice and that have been subjected to specific characterization have been demonstrated to be exonic circRNA, suggesting the reliability of this approach to distinguish exonic circRNAs from linear backsplice-containing transcripts and lariats23,31.
Although CircleSeq generates deep coverage of circular and lariat products, it has some limitations. It requires more input total RNA than sequencing without enrichment, and is sensitive to endonuclease contamination. It might also bias against the detection of longer circRNA products, as a single nicking event would confer exonuclease sensitivity. Finally, exonuclease protection may extend to some linear products with protective 3’ end structures, such as the 3’ triple helix seen in the long non-coding RNAs MALAT1 and Menβ33,34, thus complicating the interpretation of the results. These caveats must be weighed against CircleSeq allowing the identification of circles at greater depth and provides greater confidence that backsplice-containing species identified through this approach are circular than unenriched sequencing alone.
Properties of circRNAs
General features
Studies of circRNAs have identified a number of shared features. Exonic circRNA is very stable in cells2, with most species exhibiting a half-life over 48 hours35, compared to an average half-life of 10 hours for mRNAs36. However, exonic circRNA is not stable in serum, with a half-life of italic>15 seconds37, presumably due to circulating RNA endonucleases38. Intracellular stability is likely due to circRNA resistance to RNA exonucleases2. Possibly due to this stability, some exonic circRNAs have been shown by sequencing read counting methods and qPCR-based methods to be at higher levels than the linear RNA gene product27,29,35. Exonic circRNA species also do not contain the characteristic 2’-5’ linkage of an RNA lariat and therefore are resistant to RNA debranching enzymes. Studies of exonic circRNA localization using different methods have reported cytoplasmic localization1,2,6,23,25,39,40, though the process of nuclear export remains obscure and it is possible that circular RNA escapes from the nucleus during mitosis. Exonic circRNAs are also susceptible to siRNA mediated decay6,23,25, a property that is useful in studying their possible functional roles.
Several shared sequence-based features have been described in exonic circular RNAs. First, exonic circRNAs described to date always involve a GT-AG pair of canonical splice sites, although this is biased by discovery methods which favor or require such sites in mapping6,23. Furthermore, exonic circRNAs almost always use at least one previously annotated splice site. Introns flanking sites involved in a backsplice tend to be longer than introns generally, but some flanking introns can be smaller than average29. As was first shown for the circRNA derived from SRY, complementary sequences in the introns flanking backsplice sites seem to promote circularization23. In particular, paired ALU repeats in inverted orientations upstream and downstream of backsplice sites are enriched 5-fold in sites of human exonic circRNA formation35. Likewise, circRNA overexpression constructs including complementary upstream and downstream sequences show enhanced circRNA formation relative to constructs without complementary sequences5,6. Lastly, the length of a given exon appears to influence circularization. This is most obvious for exonic circRNA comprised of a single exon, as the exons comprising single-exon circRNA are 3-fold longer compared to all expressed exons23,27. In aggregate, genomic features that appear to promote circularization are longer than average exons, flanked by longer than average introns containing inverted tandem repeats that likely promote intron pairing (described below).
Prevalence of circRNAs
Findings from recent genome-scale studies of circRNA are largely consistent with each other. These analyses have found strong evidence for thousands of circRNAs in diverse human cell types6,7,27,35. Statistical analysis of rRNA depleted sequencing data27, many validation approaches6 and CircleSeq data35 show that the majority of apparent backsplice sequences in RNA-seq data arise from circRNAs. Similarly, analyses of mouse RNA with the same techniques have revealed thousands of apparently circular transcripts6,7,29,35. Most human and mouse circRNAs arise from coding genes23,27. Work in C. elegans has found abundant circular RNAs and identified life cycle dependent regulation of some of these6. Recent work in human cell lines used in the ENCODE project has shown that exonic circRNA production is regulated independently from the expression of the underlying linear RNA gene and that transcription levels vary among cell types29. This evidence for regulated production, along with substantial conservation of exonic circRNA expression in mammals6,23, suggests a functional role for many of these transcripts.
Given difficulties in separating circular and linear RNAs solely on physical properties, the absolute abundance of circular species in the pool of total RNA is difficult to judge. In control replicates used for CircleSeq (see Genomic Methods, below), some backsplice species were represented by as many as 1:300,000 reads, while others by as few as 1:300,000,000. Thus circRNA analysis requires at least millions of reads, and preferably hundreds of millions, even in libraries with preferential enrichment by RNase R digestion. A recent estimate using paired-end reads coupled to qPCR-based quantification of some exonic circRNA transcripts estimated the relative abundance of exonic circRNA as 1% the amount of poly(A) RNA29. A estimate using an alternative methodology can be obtained using data from our recent study of human fibroblasts by identifying the number of reads with apparent backsplice sequences that were validated by RNase R enrichment. Reads mapping to these junctions comprise 0.1% of all sequencing from rRNA-depleted total RNA. As circular RNAs contain sequences in addition to the backsplice, it is necessary to improve this estimate by imputing exonic circRNA length. Each backsplice-traversing read can then be weighted based on predicted circular RNA length, (note though that not all intervening exons appear in all exonic circRNAs, e.g. cANRIL35,41, Fig. 4C), and this gave an estimate that exonic circRNA comprises 0.8% of all non-ribosomal RNA. Therefore, these differing approaches suggest a substantial fraction of non-ribosomal RNA is contained in exonic circRNA.
With regard to the identification of specific circular species, recent genome-wide studies of human exonic circRNAs also have substantially overlapping findings. In total, 358 circular RNAs were identified in all three studies and 1,233 were identified by at least two of the three studies (Fig. 5). It should be noted that the Rajewsky et. al.6 and Salzman et. al.27 analyses include some common source data, although the methodologic features of these analyses differ. Substantially more circRNAs were found using CircleSeq35 (shown on figure 5 as ‘Jeck et. al.35, low confidence’), which likely reflects a lower limitation of detection using a biochemical enrichment approach.. Assuming complete rRNA depletion, 700 megabases of non-ribosomal RNA per cell, and equal RNA representation across sequencing libraries, then the least abundant exonic circRNAs in this low confidence dataset number one copy per 80 cells. However, many circRNAs at low levels were also identified by approaches used by Rajewsky et al.6 and Salzman et al.27. Analysis of CircleSeq data with more stringent filtering (requiring backsplices to be observed in multiple replicates and in samples without RNase R treatment) gives a set of 10-fold fewer circular RNAs (shown on Fig. 5 as ‘Jeck et. al.35, high confidence’), and this set has a larger overlap with circular RNAs found using bioinformatic approaches (20% vs. 4%). In a few instances, an abundant exonic circRNA was found by one approach but not the others, possibly due to the different cell types analyzed.
Exonic circRNA production mechanism
Two mechanisms have been proposed for mammalian exonic circRNA formation (Fig. 6). Both involve the backsplice being formed by the canonical spliceosome. In the first mechanism—historically termed ‘mis-splicing’ but referred to here as ‘direct backsplicing’ to emphasize that these are not always “errors” of splicing—a downstream splice donor pairs with an unspliced upstream splice acceptor2 and the intervening RNA is circularized (Fig. 6A). The second mechanism— known as the ‘lariat intermediate’ or ‘exon skipping’ mechanism—involves splicing occurring within lariats produced from exon skipping14 (Fig. 6B).
Although it is likely that both mechanisms function in vivo, some evidence suggests that direct backsplicing may occur more frequently than exon skipping. In some cases a linear mRNA has been found that lacks the exons that are included in a circular RNA, providing evidence that exon skipping is at least a plausible mechanism of formation14,23,27, though such events could also plausibly occur after direct backsplicing. However, genome-wide exonic circRNA discovery studies have failed to find such linear RNAs for the majority of exonic circRNAs23,27. It is possible that such an analysis underestimates exon skipping, given that linear products are less stable than most circular species. There are examples of exonic circRNAs with no additional exons upstream or downstream of the circularized product in annotated linear transcript, such as in SRY and CDR1as12,25. For these transcripts to result from exon skipping, there would need to be unannotated upstream and downstream exons that skipped over the sequence that is eventually circularized. Given the abundance of the exonic circRNAs that result from these loci, the presence of such unannotated exons appears unlikely, further supporting direct backsplicing for the formation of these products. As additional evidence for the direct backsplicing model, constructs for the overexpression of circRNAs usually include the exon that will become circularized and partial sequences of the flanking introns5,6,42 (Fig. 6C), but no additional upstream or downstream exonic sequence. These constructs successfully produce exonic circRNA without the need for additional flanking exons, and therefore would be inconsistent with an exon skipping event.
We believe the common features of exonic circRNA producing loci suggest that some endogenous circRNAs are likewise produced by a direct backsplicing mechanism. First, long exons are preferentially circularized, perhaps because long exons are sterically more favorable for 3’ to 5’ splicing at canonical splice sites. Also, exonic circRNA producing genes are enriched for repeat elements in flanking introns within 500 bp of the backsplice sites. These repeat elements are most likely to be in orientations that promote RNA base pairing35 ,suggesting that RNA base pairing might bring upstream splice acceptors into physical proximity with downstream splice donors. The human HIPK3 locus appears typical in this regard (Fig. 7). This locus is particularly informative in that murine Hipk2/3 and human HIPK3 all harbor similar genomic structures with regard to intron and exon length, and also show a high degree of circularization. Only the human HIPK3 locus, however, contains tandem ALU repeats, and exhibits a much higher degree of circularization. It is worth noting, however, that some single exon circRNAs are as small as 204 nucletodies29 and that many exonic circRNAs are flanked by relatively short introns. Therefore, it seems likely that multiple mechanisms may generate an exonic circRNA.
Putative functions of exonic circRNAs
These genome-wide analyses have identified a large number of abundant exonic circRNAs. Additionally, cross-species comparisons have shown that sites of circularization are conserved in orthologous exons at a rate well above the amount expected by chance 27,29,35. These species’ abundance and evolutionary conservation in turn suggest exonic circRNA may play specific roles in cellular physiology, and several possible functions have been proposed including miRNA binding, protein binding, regulation of translation and translation into proteins.
miRNA Sponges
Recently two exonic circRNAs in mammals have been shown to function as miRNA sponges or competing endogenous RNAs43–45. Competing endogenous RNAs act as decoys for the binding of miRNA with their coding RNA targets, increasing the expression of those same coding RNAs with the increasing expression of the competing endogenous RNA. The exonic circRNAs of CDR1as and SRY have been shown to bind miRNAs without being degraded, making them excellent candidates for competing endogenous RNA activity. Binding of miRNAs by exonic circRNAs was first shown for CDR1as25, and support for CDR1as acting as a miRNA sponge comes from several lines of evidence. CDR1as has 74 miR-7 seed matches and is densely bound by Argonaute proteins (the proteins that bind to miRNAs)6. miR-7 and CDR1as are co-expressed in the mouse brain and in-vitro microscopy and co-immunoprecipitation experiments have shown colocalization of CDR1as and miR-75. Knockdown of CDR1as or overexpression of a miRNA previously shown to cleave CDR1as (miR-671)25, decreased expression of known miR-7 target genes. By contrast, CDR1as overexpression prevented knockdown of miR-7 targets. Furthermore, transgenic expression of mouse CDR1as in zebrafish embryos, which lack endogenous CDR1as, substantially reduced midbrain size, mimicking the phenotype of morpholinos knockdown of miR-7 in zebrafish6. Further loss-of-function experiments are needed test whether endogenous CDR1as also regulates brain size. Likewise, the circular SRY transcript has 16 binding sites for miR-138, co-precipitates with Argonaute 2 (AGO2) when miR-138 is overexpresses and miR-138-mediated knockdown is attenuated when SRY is overexpressed in mouse cells5. While these two examples are striking, an analysis of the largest yet identified set of exonic circRNAs, those identified by CircleSeq, suggests that very few circRNAs in mammalian cells beyond CDR1as and SRY have this structure of more than ten miRNA binding sites for a single given miRNA (W.R.J. and N.E.S. unpublished observations). Many ecircRNA, however, contain a smaller number of putative miRNA binding sites. Therefore, effective miRNA sponging by exonic circRNA may be relatively unusual, or may not require the large number of miRNA binding sites that characterize SRY and CDR1as. Efforts are also underway to explore the potential use of circRNA sponges as potential therapeutics; for example to target oncogenic miRNAs46,47.
Regulation of transcription
Mechanisms have been suggested whereby circularization could regulate transcription. For example, in mice the formin (Fmn) gene is essential for limb development. Exonic circRNAs are produced from the Fmn transcript through backsplicing involving a splice acceptor upstream of the Fmn coding sequence48. Knockout mice lacking this splice acceptor have no detectable expression the exonic circRNAs and normal limb development, but have an incompletely penetrant renal agenesis phenotype48. The inability to produce an exonic circRNA from the targeted Fmn locus therefore appears to lead to aberrant expression of the formin protein, although the 5’ UTR exons deleted in the animal model could have functions unrelated to circularization. The authors propose that the formation of an exonic circRNA acts as an ‘mRNA trap’ by sequestering the translation start site, leaving a non-coding linear transcript and thereby reducing the expression level of the formin protein.
The mRNA trap mechanism could be very widespread. For example, in human and murine cells, exonic circRNAs are derived from the second exons of the HIPK2 and HIPK3 loci and this exon contains the canonical ATG (i.e. a translation start codon). In the case of HIPK3, the exonic circRNA is considerably more abundant than the linear, protein-coding transcript23 (Fig. 7). The circularization of the ATG-containing exon would be inconsistent with the production of the canonical protein from the locus, and therefore circularization can be considered a form of ‘alternative splicing’ that regulates protein translation. We have observed that in human fibroblasts, 34% of single exon circles contain a translation start compared with only 14% of all exons containing a translation start (p bold> 10−10 by chi square test, W.R.J. and N.E.S. unpublished observations). This finding is consistent with a widespread role for circular RNAs as mRNA traps that regulation protein expression.
The mRNA trap role of exonic circRNA suggests potential functions for circRNAs through their mechanism of formation, rather than as end products. For example, the dystrophin (DMD) gene is known to produce several circular products49 and a study of exonic circRNA in DMD in patients with a dystrophinopathy showed that exonic circRNA formation may lead to inactive DMD transcripts in individuals with certain deletion mutations50, further reducing the pool of mRNAs that can be translated. This suggests that the mRNA-trap effect of exonic circRNAs might enhance the disease phenotype in these patients. Splicing regulation is already a target of emerging therapies for these dystrophinopathies. For example, antisense oligonucleotides against certain exons that induce exon skipping and restore open reading frames are currently in clinical trials51. Circularization might be a future target for therapies: either to decrease the circularization of functional transcripts or to sequester, through an mRNA-trap, exons contributing to dysfunctional transcripts.
Interactions with RNA binding proteins
It has previously been shown that some linear non-coding RNA transcripts sequester RNA binding proteins52 and exonic circRNAs might have a similar role. For example, circRNA can stably associate with AGO proteins and Pol II, and there is no evidence to exclude other RBPs as well. Exonic circRNAs also have some properties that suggest they might act as ‘scaffolding’ for RBPs by binding multiple RBPs and facilitating stable interaction through the underlying increased stability of the circular RNA transcript. They may also have roles as sequence targeting elements, binding simultaneously to RBPs and regions of RNA or DNA that are complementary to the circRNA sequence, as has been proposed6. Circular RNAs could adopt tertiary structures distinct from related linear molecules of the same sequence due to the limitations of circularization or new protein binding sites could be generated by the sequences that are brought together in the circular RNA; such features might result in circRNAs being able to bind different sets of proteins than related linear RNAs.
Translation of circRNAs
It is possible that some circRNAs are translated, as inclusion of an internal ribosome entry site (IRES) allows translation of engineered circRNAs53. Presumably an endogenous circRNA with an IRES and ATG could undergo translation. At least one naturally occurring circular RNA is known to encode a protein in mammalian cells: the Hepatitis δ agent3, a circular RNA satellite virus of the Hepatitis B virus. Hepatitis δ leads to production of a single viral protein associated with pathogenicity, but the mechanism of translation is non-canonical and is probably specific to certain viral agents.
It is interesting to consider the nature of possible proteins encoded by circRNAs. If the number of base pairs is divisible by three, a protein-encoding circRNA would be read recurrently in-frame to produce a repeated polypeptide sequence, as has been experimentally demonstrated53. If the circRNA contained a number of base pairs not divisible by three, it would be read in alternative reading frames each time the ribosome passed around the circle, and a stop codon would usually be encountered when read out-of-frame. In theory, it would be possible to have an exonic circRNA read in all 3 alternate reading frames without ever encountering a stop (i.e. a ‘Möbius protein’), but we have not identified a naturally occurring exonic circRNA with this property.. We have considered the protein coding potential of many ATG-containing exonic circRNAs produced in human fibroblasts, but to date we35 and others29 have not been able to identify a naturally occuring exonic circRNA that undergoes translation. For example, we have not been able to find exonic circRNA undergoing translation (i.e. bound to polysomes), and we have not identified peptides in mass spectrometry data that could only occur as a results of translation of exonic circRNAs.
Conclusions
Until recently, only a handful of circRNAs had been identified in mammalian cells and these were largely thought to have arisen from errors in RNA splicing. This view is not compatible with the recent discoveries of thousands of distinct exonic circRNAs in various human cell types and with many of these circRNAs being abundant, stable and evolutionarily conserved. There is evidence of functionality from the finding that some exonic circRNA act as miRNA sponges and apparent regulation of the expression of linear protein-encoding RNA products by ‘mRNA trap’ mechanisms, and it seems likely that additional functions will be described.
As studies of circRNAs proliferate, it will become increasingly necessary to develop a standard nomenclature, especially if they are to be incorporated into RNA databases, including RefSeq and UCSC genome browser annotations. Although rare exonic circRNAs—such as CDR1as—can be defined by their source gene, an alternative naming system is needed for cases where circRNAs arise from the gene bodies of protein-coding linear transcripts as several genes produce multiple circRNAs. For example, the name ‘circular ANRIL’ would be ambiguous because several circRNAs can arise from ANRIL. The recent identification of stable intronic circular RNAs further confuses the terminology. We favor a naming convention that identifies the source gene and adds a numeric identifier (e.g. ecircHIPK3-1 to designate the first exonic circular RNA isoform of HIPK3). Name standardization would assist both bioinformatic and experimental research into circRNA origins and function.
Finally, exonic circRNAs have potential therapeutic roles. If properly packaged and delivered, their high cytoplasmic stability could make them long-acting regulators of cellular behavior. CircRNA overexpression constructs could be used to generate high levels of stable RNA circles in cells for a variety of purposes, such as to act as miRNA sponges to reduce the activities of oncogenic miRNAs(e.g. miR-21 and miR-22147) in the context of cancer. CircRNAs containing IRES sequences could be used to produce unusual peptides; for example, long repeating polypeptides that might be useful in the production of new biologic materials. As more functions of exonic circRNAs are discovered, further uses of this newly appreciated class of RNAs are likely to emerge.
References
- 1.Nigro JM, et al. Scrambled exons. Cell. 1991;64:607–613. doi: 10.1016/0092-8674(91)90244-s. [DOI] [PubMed] [Google Scholar]
- 2.Cocquerelle C, Mascrez B, Hétuin D, Bailleul B. Mis-splicing yields circular RNA molecules. The FASEB Journal: Official Publication of the Federation of American Societies for Experimental Biology. 1993;7:155–160. doi: 10.1096/fasebj.7.1.7678559. [DOI] [PubMed] [Google Scholar]
- 3.Kos A, Dijkema R, Arnberg AC, van der Meide PH, Schellekens H. The hepatitis delta (delta) virus possesses a circular RNA. Nature. 1986;323:558–560. doi: 10.1038/323558a0. [DOI] [PubMed] [Google Scholar]
- 4.Sanger HL, GKDRHJGAKK Viroids are single-stranded covalently closed circular RNA molecules existing as highly base-paired rod-like structures. Proc Natl Acad Sci USA. 1976;73:3852. doi: 10.1073/pnas.73.11.3852. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Hansen TB, et al. Natural RNA circles function as efficient microRNA sponges. Nature. 2013 doi: 10.1038/nature11993. [DOI] [PubMed] [Google Scholar]
- 6.Memczak S, et al. Circular RNAs are a large class of animal RNAs with regulatory potency. Nature. 2013 doi: 10.1038/nature11928. [DOI] [PubMed] [Google Scholar]
- 7.Zhang Y, et al. Circular Intronic Long Noncoding RNAs. Molecular Cell. 2013;51:792–806. doi: 10.1016/j.molcel.2013.08.017. [DOI] [PubMed] [Google Scholar]
- 8.Gilbert W. Why genes in pieces? Nature. 1978;271:501–501. doi: 10.1038/271501a0. [DOI] [PubMed] [Google Scholar]
- 9.Arnberg AC, van Ommen GJ, Grivell LA, Van Bruggen EF, Borst P. Some yeast mitochondrial RNAs are circular. Cell. 1980;19:313–319. doi: 10.1016/0092-8674(80)90505-x. [DOI] [PubMed] [Google Scholar]
- 10.van der Veen R, et al. Excised group II introns in yeast mitochondria are lariats and can be formed by self-splicing in vitro. Cell. 1986;44:225–234. doi: 10.1016/0092-8674(86)90756-7. [DOI] [PubMed] [Google Scholar]
- 11.Cocquerelle C, Daubersies P, Majérus MA, Kerckaert JP, Bailleul B. Splicing with inverted order of exons occurs proximal to large introns. The EMBO Journal. 1992;11:1095–1098. doi: 10.1002/j.1460-2075.1992.tb05148.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Capel B, et al. Circular transcripts of the testis-determining gene Sry in adult mouse testis. Cell. 1993;73:1019–1030. doi: 10.1016/0092-8674(93)90279-y. [DOI] [PubMed] [Google Scholar]
- 13.Zaphiropoulos PG. Differential Expression of Cytochrome P450. 2C24 Transcripts in Rat Kidney and Prostate: Evidence Indicative of Alternative and Possibly Trans Splicing Events. Developmental Cell. 1993;192:778–786. doi: 10.1006/bbrc.1993.1482. [DOI] [PubMed] [Google Scholar]
- 14.Zaphiropoulos PG. Circular RNAs from transcripts of the rat cytochrome P450 2C24 gene: correlation with exon skipping. Proc Natl Acad Sci USA. 1996;93:6536–6541. doi: 10.1073/pnas.93.13.6536. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Caudevilla C, et al. Natural trans-splicing in carnitine octanoyltransferase pre-mRNAs in rat liver. Proceedings of the National Academy of Sciences. 1998;95:12185–12190. doi: 10.1073/pnas.95.21.12185. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Frantz SA, et al. Exon repetition in mRNA. Proc Natl Acad Sci USA. 1999;96:5400–5405. doi: 10.1073/pnas.96.10.5400. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Surono A. Circular dystrophin RNAs consisting of exons that were skipped by alternative splicing. Human Molecular Genetics. 1999;8:493–500. doi: 10.1093/hmg/8.3.493. [DOI] [PubMed] [Google Scholar]
- 18.Rigatti R. Exon repetition: a major pathway for processing mRNA of some genes is allele-specific. Nucleic Acids Research. 2004;32:441–446. doi: 10.1093/nar/gkh197. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Cocquet J, Chong A, Zhang G, Veitia RA. Reverse transcriptase template switching and false alternative transcripts. Genomics. 2006 doi: 10.1016/j.ygeno.2005.12.013. [DOI] [PubMed] [Google Scholar]
- 20.McManus CJ, Duff MO, Eipper-Mains J, Graveley BR. Global analysis of trans-splicing in Drosophila. Proceedings of the National Academy of Sciences. 2010;107:12975–12979. doi: 10.1073/pnas.1007586107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Tabak HF, et al. Discrimination between RNA circles, interlocked RNA circles and lariats using two-dimensional polyacrylamide gel electrophoresis. Nucleic Acids Research. 1988;16:6597–6605. doi: 10.1093/nar/16.14.6597. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Hurowitz EH, Brown PO. Genome-wide analysis of mRNA lengths in Saccharomyces cerevisiae. Genome Biol. 2003;5:R2. doi: 10.1186/gb-2003-5-1-r2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Jeck WR, et al. Circular RNAs are abundant, conserved, and associated with ALU repeats. RNA. 2012 doi: 10.1261/rna.035667.112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Schindler CW, Krolewski JJ, Rush MG. Selective trapping of circular double-stranded DNA molecules in solidifying agarose. Developmental Cell. 1982;7:263–270. doi: 10.1016/0147-619x(82)90007-5. [DOI] [PubMed] [Google Scholar]
- 25.Hansen TB, et al. miRNA-dependent gene silencing involving Ago2-mediated cleavage of a circular antisense RNA. The EMBO Journal. 2011;30:4414–4422. doi: 10.1038/emboj.2011.359. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Suzuki H, et al. Characterization of RNase R-digested cellular RNA source that consists of lariat and circular RNAs from pre-mRNA splicing. Nucleic Acids Research. 2006;34:e63. doi: 10.1093/nar/gkl151. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Salzman J, Gawad C, Wang PL, Lacayo N, Brown PO. Circular RNAs are the predominant transcript isoform from hundreds of human genes in diverse cell types. PLoS ONE. 2012;7:e30733. doi: 10.1371/journal.pone.0030733. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Gao K, Masuda A, Matsuura T, Ohno K. Human branch point consensus sequence is yUnAy. Nucleic Acids Research. 2008;36:2257–2267. doi: 10.1093/nar/gkn073. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Salzman J, Chen RE, Olsen MN, Wang PL, Brown PO. Cell-type specific features of circular RNA expression. PLoS Genet. 2013;9:e1003777. doi: 10.1371/journal.pgen.1003777. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Fonseca NA, Rung J, Brazma A, Marioni JC. Tools for mapping high-throughput sequencing data. Bioinformatics. 2012;28:3169–3177. doi: 10.1093/bioinformatics/bts605. [DOI] [PubMed] [Google Scholar]
- 31.Danan M, Schwartz S, Edelheit S, Sorek R. Transcriptome-wide discovery of circular RNAs in Archaea. Nucleic Acids Research. 2012;40:3131–3142. doi: 10.1093/nar/gkr1009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Wang K, et al. MapSplice: accurate mapping of RNA-seq reads for splice junction discovery. Nucleic Acids Research. 2010;38:e178. doi: 10.1093/nar/gkq622. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Brown JA, Valenstein ML, Yario TA, Tycowski KT, Steitz JA. Formation of triple-helical structures by the 3'-end sequences of MALAT1 and MENβ noncoding RNAs. Proceedings of the National Academy of Sciences. 2012;109:19202–19207. doi: 10.1073/pnas.1217338109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Wilusz JE, et al. A triple helix stabilizes the 3' ends of long noncoding RNAs that lack poly(A) tails. Genes & Development. 2012;26:2392–2407. doi: 10.1101/gad.204438.112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Jeck WR, et al. Circular RNAs are abundant, conserved, and associated with ALU repeats. RNA. 2013;19:141–157. doi: 10.1261/rna.035667.112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Schwanhäusser B, et al. Global quantification of mammalian gene expression control. Nature. 2011;473:337–342. doi: 10.1038/nature10098. [DOI] [PubMed] [Google Scholar]
- 37.Umekage S, Uehara T, Fujita Y, Suzuki H, Kikuchi Y. In Vivo Circular RNA Expression by the Permuted Intron-Exon Method. Innovations in Biotechnology. 2012 at < http://cdn.intechopen.com/pdfs/28708/InTech-In_vivo_circular_rna_expression_by_the_permuted_intron_exon_method.pdf>.
- 38.Haupenthal J, Baehr C, Kiermayer S, Zeuzem S, Piiper A. Inhibition of RNAse A family enzymes prevents degradation and loss of silencing activity of siRNAs in serum. Biochemical Pharmacology. 2006;71:702–710. doi: 10.1016/j.bcp.2005.11.015. [DOI] [PubMed] [Google Scholar]
- 39.Li XF, Lytton J. A circularized sodium-calcium exchanger exon 2 transcript. The Journal of Biological Chemistry. 1999;274:8153–8160. doi: 10.1074/jbc.274.12.8153. [DOI] [PubMed] [Google Scholar]
- 40.Dubin RA, Kazmi MA, Ostrer H. Inverted repeats are necessary for circularization of the mouse testis Sry transcript. Developmental Cell. 1995;167:245–248. doi: 10.1016/0378-1119(95)00639-7. [DOI] [PubMed] [Google Scholar]
- 41.Burd CE, et al. Expression of linear and novel circular forms of an INK4/ARF-associated non-coding RNA correlates with atherosclerosis risk. PLoS Genet. 2010;6:e1001233. doi: 10.1371/journal.pgen.1001233. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Pasman Z, Been MD, Garcia-Blanco MA. Exon circularization in mammalian nuclear extracts. RNA. 1996;2:603–610. [PMC free article] [PubMed] [Google Scholar]
- 43.Ebert MS, Neilson JR, Sharp PA. MicroRNA sponges: competitive inhibitors of small RNAs in mammalian cells. Nat Meth. 2007;4:721–726. doi: 10.1038/nmeth1079. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Franco-Zorrilla JM, et al. Target mimicry provides a new mechanism for regulation of microRNA activity. Nat Genet. 2007;39:1033–1037. doi: 10.1038/ng2079. [DOI] [PubMed] [Google Scholar]
- 45.Poliseno L, et al. A coding-independent function of gene and pseudogene mRNAs regulates tumour biology. Nature. 2010;465:1033–1038. doi: 10.1038/nature09144. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Hansen TB, Kjems J, Damgaard CK. Circular RNA and miR-7 in Cancer. Cancer Res. 2013;73:5609–5612. doi: 10.1158/0008-5472.CAN-13-1568. [DOI] [PubMed] [Google Scholar]
- 47.Liu Y, et al. Construction of circular miRNA sponges targeting miR-21 or miR-221 and demonstration of their excellent anticancer effects on malignant melanoma cells. Int J Biochem Cell Biol. 2013 doi: 10.1016/j.biocel.2013.09.003. [DOI] [PubMed] [Google Scholar]
- 48.Chao CW, Chan DC, Kuo A, Leder P. The mouse formin (Fmn) gene: abundant circular RNA transcripts and gene-targeted deletion analysis. Molecular Medicine (Cambridge, Mass) 1998;4:614–628. [PMC free article] [PubMed] [Google Scholar]
- 49.Surono A, et al. Circular dystrophin RNAs consisting of exons that were skipped by alternative splicing. Human Molecular Genetics. 1999;8:493–500. doi: 10.1093/hmg/8.3.493. [DOI] [PubMed] [Google Scholar]
- 50.Gualandi F. Multiple exon skipping and RNA circularisation contribute to the severe phenotypic expression of exon 5 dystrophin deletion. J Med Genet. 2003;40:100e–100. doi: 10.1136/jmg.40.8.e100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Spitali P, Aartsma-Rus A. Splice modulating therapies for human disease. Cell. 2012 doi: 10.1016/j.cell.2012.02.014. [DOI] [PubMed] [Google Scholar]
- 52.Romeo T. Global regulation by the small RNA-binding protein CsrA and the non-coding RNA molecule CsrB. Mol Microbiol. 1998;29:1321–1330. doi: 10.1046/j.1365-2958.1998.01021.x. [DOI] [PubMed] [Google Scholar]
- 53.Chen CY, Sarnow P. Initiation of protein synthesis by the eukaryotic translational apparatus on circular RNAs. Science. 1995;268:415–417. doi: 10.1126/science.7536344. [DOI] [PubMed] [Google Scholar]
- 54.Zaphiropoulos PG. Exon skipping and circular RNA formation in transcripts of the human cytochrome P-450 2C18 gene in epidermis and of the rat androgen binding protein gene in testis. Mol Cell Biol. 1997;17:2985–2993. doi: 10.1128/mcb.17.6.2985. [DOI] [PMC free article] [PubMed] [Google Scholar]