Abstract
Previous studies have demonstrated the highly specific expression of circular RNAs (circRNAs) in different tissues and organisms, but the cellular architecture of circRNA has never been fully characterized. Here, we present a collection of 171 full-length single-cell RNA-seq datasets to explore the cellular landscape of circRNAs in human and mouse tissues. Through large-scale integrative analysis, we identify a total of 139,643 human and 214,747 mouse circRNAs in these scRNA-seq libraries. We validate the detected circRNAs with the integration of 11 bulk RNA-seq based resources, where 216,602 high-confidence circRNAs are uniquely detected in the single-cell cohort. We reveal the cell-type-specific expression pattern of circRNAs in brain samples, developing embryos, and breast tumors. We identify the uniquely expressed circRNAs in different cell types and validate their performance in tumor-infiltrating immune cell composition deconvolution. This study expands our knowledge of circRNA expression to the single-cell level and provides a useful resource for exploring circRNAs at this unprecedented resolution.
Subject terms: Data mining, RNA splicing, RNA sequencing, Computational platforms and environments, RNA sequencing
Studies of circular RNAs have often been limited to the tissue or organism level. Here, authors investigate the comprehensive expression landscape of circRNAs in human and mouse at single-cell resolution, revealing highly specific and dynamic changes of circRNAs during multiple biological processes.
Introduction
Circular RNAs are a large class of RNAs that widely exist in eukaryotic cells. Recent studies have demonstrated the emerging roles of circRNAs in regulating biological processes through promoting protein functions1–3 or encoding peptides4. So far millions of circRNAs have been identified across various species, and several comprehensive databases have been developed to reveal the circRNA expression landscape in different tissues and organisms5–7. Generally, most circRNAs are expressed at low levels, and exhibit high tissue- and species- specificity compared to the cognate linear mRNAs8,9. Thus, most studies using the traditional bulk RNA-seq method cannot fully characterize the intrinsic heterogeneity between individual cells, and the complexity of circRNAs at the single-cell level needs further exploration.
The advent of single-cell RNA sequencing methods has enabled the study of the transcriptome at single-cell resolution. However, only limited attempts have been made to characterize circRNA expression patterns at single-cell resolution10,11, which focused on studying the maternal effect of circRNAs in 69 mouse embryo samples or the heterogeneity of circRNAs among 45 single HEK293T cells. Considering the high species- and tissue-specificity of circRNAs, the cellular architecture of circRNAs in different tissues and carcinoma samples remain unexamined. Specifically, a recent study suggested that bulk RNA-seq based data were strongly affected by the cell composition in different samples, which may lead to a misleading interpretation of observed results12. Although most single-cell RNA-seq methods implement poly(A) selection where circRNAs should be theoretically depleted, recent studies have also demonstrated that circRNAs can be still widely detected, although with lower efficiency, in these poly(A) selected libraries, which elucidated the possibility of characterizing circRNAs using full-length scRNA-seq datasets13–15. Thus, the investigation of circRNAs at the single-cell level has become an emerging problem in circRNA studies.
Here, we employ a compendium of full-length single-cell RNA-sequencing datasets composed of 172,137 high-confidence cells from 171 public studies to generate a comprehensive map of circRNAs in human and mouse single cells. Through large-scale integration of these scRNA-seq datasets, we demonstrate the high cell-type specificity of circRNAs in these two species at the single-cell resolution. Particularly, we elucidate the neuron-specific expression of circRNAs in brain samples and revealed the dynamic transition between maternal and zygotic circRNA expression during embryo development. We disclose the inter- and intra-tumor heterogeneity of circRNAs in 20 breast cancer patients, where circRNAs exhibit highly similar expression in the primary and metastasis tumor from the same patient. Furthermore, we unveil the cell type-specific circRNAs expression in both species and validate the applicability of circRNAs as promising biomarkers in decomposing tumor-infiltrating immune cells using bulk RNA-seq data. We also construct the circSC online platform for exploring circRNAs expression at the single-cell level, which provides unique and useful resources for the circRNA community.
Results
Large-scale single-cell investigation reveals circRNAs with high cellular specificity
To elucidate the cellular architecture of circRNAs, we collected public full-length scRNA-seq datasets from 171 studies involving 58 different human and mouse tissues or cell types (Fig. 1a and Supplementary Data 1). Considering that most 3’ RNA sequencing methods were unable to detect circRNAs that lack poly(A) tails, only full-length sequencing technologies including MATQ-seq16, Quartz-seq17, RamDA-seq18, SMARTer19, Smart-seq20, Smart-seq221, SUPeR-seq10 and Tang’s method22 were collected in our study. Then, the single-cell level expression values of both genes and circRNAs were calculated using a comprehensive pipeline embedding multiple state-of-art tools (Fig. 1b). In brief, the HISAT223 and StringTie24 pipeline were used to generate the gene expression matrix, and the quality control step was utilized to filter high-confidence cells using the Scater25 package. To eliminate the batch effect across different studies, the anchor-based canonical correlation analysis (CCA) method in Seurat26 package was performed, and cells were clustered using principal component analysis and k-nearest neighborhood clustering. Then, cell clusters were annotated using published results and manually curated using cell markers. At the same time, the single-cell expression matrix of circRNAs was obtained using the CIRI227,28 and CIRIquant29 pipeline, and circRNAs with at least 2 supporting reads were kept for the downstream analysis. Then, the expression level of circRNAs was consequently normalized using gene expression profiles (see Methods). In summary, 40,604 human and 131,533 mouse single cells passed quality control (approximate 1000 cells per experiments), and circRNAs in these cells were detected for downstream analysis.
To evaluate the reliability of circRNA detection, all circRNAs in single-cell data were comprehensively compared against our previous database circAtlas v2.07 or the integration of other 10 bulk RNA-seq based circRNA databases (Supplementary Table 1). Considering that only the circAtlas database provides the assembled full-length sequence and conservation score of reported circRNAs30,31, the circRNA set obtained from the circAtlas database was analyzed separately. As shown in Fig. 1c, a total of 354,390 circRNAs were detected in the scRNA-seq cohort, where 76,824 (21.67%) circRNAs can be simultaneously detected in all three circRNA sets (Supplementary Fig. 1a, b). In summary, 32.43% of circRNAs were also present in these bulk RNA-seq databases, while the remaining 67.57% of the circRNAs were uniquely detected in single-cell data. Notably, circRNAs that were uniquely detected in circAtlas have significantly lower expression levels (measured by counts per million, CPM) and shorter lengths than those shared in both circAtlas and single-cell datasets (Fig. 1d, e), indicating that scRNA-seq can effectively capture most high-abundance circRNAs. Besides, these shared circRNAs exhibited high tissue specificity measured by MCS score according to our previously described method7; 48.9% of these overlap circRNAs were conserved across more than two species (MCS score ≥2), demonstrating the high reliability of our identified circRNAs (Fig. 1f).
For all circRNAs detected in the scRNA-seq datasets, a positive correlation (R = 0.53) between the number of expressing cells and their mean expression level were detected (Fig. 1g and Supplementary Fig. 1c), and several highly-expressed circRNAs like mmu-Cdr1_0001, mmu-Tulp4_0006, and hsa-RIMS1_0021 were also reported in previous studies32–34, which further supported the circRNA identification results. Meanwhile, circRNAs that were uniquely detected in scRNA-seq data were generally expressed in a lower number of cells (Fig. 1h, p < 0.001, Wilcoxon rank-sum test) but have similar expression levels compared to circRNAs validated by other databases (Fig. 1i, p = 0.09, Wilcoxon rank-sum test), suggesting the high cell-specific expression of these circRNAs. Specifically, ~90% of scRNA-seq specific circRNAs were expressed in less than 10 cells in both human and mouse samples, which makes it almost impossible to be detected using bulk RNA-seq techniques (Fig. 1j and Supplementary Fig. 1d). Taken together, these results indicated the high sensitivity and reliability of full-length scRNA-seq to reveal circRNAs with high cell specificity, while most of which could be falsely neglected due to the relatively lower proportion of expressing cells in traditional bulk RNA-seq samples. Additionally, these scRNA-seq specific circRNAs were also widely expressed in cells that have more than 10 back-spliced junction (BSJ) reads (Fig.1k). Despite the small number of expressing cells, these circRNAs were also originated from exons that have higher conservation scores (Supplementary Fig. 1e, f). Besides, a proportion of 16.0% human and 5.0% mouse scRNA-seq specific circRNAs also exhibited conservative expression in more than two species (Supplementary Fig. 1g), which suggested that a large fraction of conserved circRNAs with potential biological functions remain undiscovered in the previous bulk RNA-seq datasets.
Brain circRNAs display cell-specific expression patterns in inhibitory and excitatory neurons
Previous studies have shown that circRNAs are widely expressed across eukaryotic tissues, and especially enriched in mammalian brains8,9,35,36. However, the cellular expression pattern of circRNAs in this tissue has never been examined. To rigorously investigate the cellular landscape of circRNAs, we first collected and analyzed 18 studies of mouse brain samples, which also constitute the largest cohort among our collected datasets. All human and brain cells were analyzed and integrated as described above. A total of 41,911 cells were divided into 14 clusters, and 64,311 circRNAs were detected (Fig. 2a). As shown in Fig. 2b, most cells were clustered into GABAergic neurons (GABA), glutamatergic neurons (GLUT), and microglia cells (MG). Despite the similar number of cells in these clusters, circRNAs tend to be specifically enriched in GABAergic and glutamatergic neurons. Although the top 10 most abundant circRNAs have shown conserved expression across different cell types, cell-type-specific circRNAs exhibited disparate patterns between neurons, immune cells, glial cells, and vascular cells, demonstrating the high cell specificity of these circRNAs. For experimental validation of these circRNAs, RT-PCR of 12 cell-type-specific circRNAs that expressed in less than 10 cells were performed using outward primers targeting the BSJ region, and the back-spliced junction sequence of these circRNAs were successfully validated using Sanger sequencing (Supplementary Table 2). Then, the widely used Tau method37 was implemented to measure the cellular specificity of circRNAs, and genes were divided into circRNA hosting genes and other genes for further comparison. As shown in Fig. 2c, circRNAs exhibited a significantly higher specificity than both groups of genes. Meanwhile, the circRNA hosting genes also showed significantly lower specificity than other non-hosting genes, as that circRNAs tend to be originated from genes with higher expression levels (Supplementary Fig. 2a), which resulted in a relatively lower cell specificity. For instance, 10 of 12 circRNAs from the mouse Taf1 gene were specifically detected in neuron cells, and a distinct expression pattern was also observed in GABAergic and glutamatergic neurons (Fig. 2d).
To further validate the circRNAs expression landscape in the human brain, four scRNA-seq datasets (GSE67835, GSE71315, GSE75140, and GSE125288) of healthy human brains were also analyzed, and the enriched expression of circRNAs in GABAergic and glutamatergic neurons was observed accordingly (Supplementary Fig. 2b, c). Afterward, the orthologs between human and mouse circRNAs were extracted from the circAtlas database. As shown in Fig. 2e, circRNAs with higher expression levels were more likely to be conserved in both species, whereas species-specific circRNAs tend to have lower expression levels. Consistent with previous results, the majority of these conserved circRNAs were highly enriched in GABAergic and glutamatergic neurons, and a proportion of circRNAs were also exhibited to be generally expressed in all types of cells (Fig. 2f). As mentioned in previous studies, the expression level of circRNAs is largely correlated to the activity of RNA-binding proteins (RBP)9,38,39. Thus, to explain these patterns, the Spearman correlation coefficient between all circRNAs and circRNA hosting genes or RBPs in all cells was calculated for comparison. The correlation coefficient between circRNAs and RBPs was significantly higher (p < 0.001) than that of hosting genes (Fig. 2g). In particular, the polypyrimidine tract binding proteins PTBP1 (R = 0.76) and PTBP2 (R = 0.66) exhibited a high correlation against circRNAs, where a relatively low level of PTBP1 and a high level of PTBP2 were observed in both GABAergic and glutamatergic neurons. Our previous study has shown that the decrease of PTBP1 activity could result in a dramatic outburst of circRNAs29, which can partially explain the enormous number of neuron-specific circRNAs detected in the single-cell cohort. As expected, the circRNA expression level (e.g., circCdr1) and circular-to-linear ratio were highly correlated with the downregulation of PTBP1 and the upregulation of its compensator PTBP2 in most cell types (Fig. 2h). Furthermore, only a small proportion of overlap between circRNA-generating loci in GABAergic and glutamatergic neurons was observed (Supplementary Fig. 2d), which indicated the cell-specific expression in these two types of neurons. The gene ontology analysis also demonstrated the enrichment of excitatory synapse and glutamate decarboxylase complex in GABAergic- and glutamatergic-specific circRNAs, which is consistent with the biological characteristic of GABAergic and glutamatergic neurons, respectively (Supplementary Fig. 2h). Taken together, these results demonstrate the highly cellular-specific expression landscape of circRNAs, and further reveal the complex association between circRNA biogenesis and RBP activity, especially in these inhibitory and excitatory neurons.
The dynamic expression of maternal and zygotic circRNAs during early embryo development
Single-cell RNA sequencing has enabled the study of gene heterogeneity in embryonic development stages40, but the change of circRNA expression pattern during this process still needs further exploration. Here, we analyzed 11 studies of human and mouse embryos containing samples from 16 different stages covering oocytes to early buds (Fig. 3a). A total of 41,041 and 24,818 circRNAs were detected in human and mouse embryonic cells, respectively. To reveal the dynamic changes between circRNAs in the embryo developing process, the Pearson correlation between circRNA expression levels in different stages was calculated. As shown in Fig. 3b, a high correlation between cells in the first 3-4 days after fertilization was observed, which is consistent with the maternal effect of circRNAs during early embryonic development10,41. Moreover, cells from blastocyst to implanted embryos exhibited a different expression pattern of circRNAs, suggesting the expression of zygotic circRNAs after the blastocyst stage. Besides, an increase in both the circRNA diversity and junction ratio of detected circRNAs within developing stages were observed on both human and mouse samples, which also verified the accumulation of these zygotic circRNAs in the embryo developing process (Fig. 3c and Supplementary Fig. 3a). Considering that only a relatively small number of cells were collected in the human datasets, only mouse embryos were included in the downstream analysis. To eliminate the randomness effect, the expression pattern of circRNAs that can be detected in more than two stages was plotted in Fig. 3d. As expected, the gradual degradation of maternal circRNAs was observed, and most other circRNAs exhibited a stage-specific expression profile. To further investigate the dynamic expression changes of circRNAs during the maternal-to-zygotic transition, samples were divided into four-time points including totipotent blastomeres (TB), first lineage (TE/ICM), second lineage (EPI/PE), and implanted embryo, reflecting the changes of totipotency and lineage segregation in the development process. Subsequently, genes and circRNAs were clustered into 5 groups using a noise-robust clustering method42. As shown in Fig. 3e, circRNAs and genes in cluster 1 and 2 were highly expressed in the early TB stage, then continuously decreased with the embryo development. In contrast, cluster 3 to 5 of circRNAs represent zygotic circRNAs that were specifically expressed after fertilization. To determine whether the activation of zygotic circRNAs were byproducts of host gene expression, the correspondence between circRNAs and their host genes was examined. Notably, a large fraction of zygotic circRNAs (67.50% in cluster 3, 69.2% in cluster 4, and 83.9% in cluster 5) were generated from maternally expressed genes, which suggested the unique biogenesis mechanism of these zygotic circRNAs during embryo development (Fig. 3h).
To further investigate the difference between the zygotic gene and circRNA activation process, the composition of reads from genes and circRNAs in each cluster was calculated. Similarly, only circRNAs that simultaneously expressed in more than one of four stages were included. In contrast to the gentle increase of zygotic gene reads during the developing stages (Supplementary Fig. 3b), the dramatic outbreak of zygotic circRNAs after 8-cell stages was observed in Fig. 3g, providing convincing evidence of maternal circRNA degradation and zygotic circRNA activation. For instance, the different expression patterns of two zygotic and three maternal circRNAs were plotted. As shown in Fig. 3h, the mmu-Erdr1_0001 and mmu-Erdr1_0002 derived from erythroid differentiation regulator-1 (Erdr1), a secreted factor that regulates cell survival, apoptosis43,44, were highly expressed in the implanted embryo. Meanwhile, the Erdr1 gene was lowly expressed in cells from all stages, suggesting that the possible biological function of mmu-Erdr1_0001 and mmu-Erdr1_0002 in the development of the implanted embryo (Supplementary Fig. 3c). Moreover, mmu-Pola1_0001, mmu-C130026I21Rik_0001, and mmu-Ndrg3_0005 also exhibited a stronger maternal effect compared to their host genes. Thus, the highly specific expression of these circRNAs demonstrated that circRNAs undergo a more significant maternal-to-zygotic transition process compared to their linear counterparts. Finally, the gene ontology enrichment analysis was performed on the parental gene of maternal and zygotic circRNAs. As shown in Fig. 3i, microtubule-based movement and cilium assembly were enriched in the maternal circRNAs, while splicing-related processes were enriched in the zygotic circRNAs, which is consistent with the polarity establishment and embryonic genome activation in developing embryos. Collectively, these results demonstrated the highly cellular specific expression profile of circRNAs and the substantial activation of zygotic circRNAs in embryo development, which also suggested the important role of these maternal and zygotic circRNAs during this process.
Inter- and intra-tumor circRNA heterogeneity in human breast cancer metastasis
Recent studies have demonstrated the emerging role of circRNAs in regulating cancer progression and proliferation45–48. However, the comprehensive landscape of circRNA expression at the single-cell level has not been thoroughly examined. To extensively profile circRNAs across breast cancer tumorigenesis, a total of 26 primary and metastasis tumor scRNA-seq samples from 20 breast invasive carcinoma (BRCA) patients with different luminal stages including 19 TNBC, 3 HER2 negative, 2 luminal A, and 2 luminal B samples were investigated49–51. Firstly, all cells from 20 patients were integrated and analyzed as described above, and CopyKAT52 was performed to identify normal cells and tumor cells with copy number variations (Fig. 4a). As shown in Fig. 4b, more than 49.88% and 67.28% of normal and carcinoma populations were identified as epithelial cells. Then, the difference of circRNA expression levels between normal and carcinoma populations was further investigated (Fig. 4b and Supplementary Fig. 4a). Consistent with previous studies, tumor cells with aneuploid rearrangement exhibited significantly lower expression of circRNAs in both metastasis and primary tumors (Fig. 4c), and the same pattern was also observed in most identified cell types (Fig. 4d). In particular, the expression of several well-known circRNAs was plotted in Supplementary Fig. 4b, whereas cancer-related circRNAs like hsa-CDYL_0005, has-BARD1_0006, hsa-HIPK3_0001, and hsa-FAM120A_0006 can be successfully detected in both normal and carcinoma cells53–56. Besides, a cell-specific expression pattern of circRNA isoforms derived from BARD1 and KRD36C gene were also plotted, indicating the sparse expression of circRNAs in scRNA-seq data (Supplementary Fig. 4c). Interestingly, both normal and carcinoma cells from low-grade (luminal A, luminal B, and HER2-negative) tumors with better prognosis tended to express more circRNA than high-grade triple-negative breast carcinoma (TNBC) cells, indicating the less accumulation of circRNAs in TNBC cells with faster progression rate.
Given the dominant number of epithelial cells in this cohort and the important role of epithelial to mesenchymal transition (EMT) in tumor invasion and metastasis, the circRNA dynamic during EMT was further investigated. Firstly, all epithelial cells were clustered, and trajectory inference analysis was performed to reveal the dynamic cell differentiation process (Fig. 4f). To better explore the transition state of individual cells, the EMT score was consequently calculated using a reported method57. As shown in Fig. 4g, the cell trajectory results generally fitted the increase of EMT score accordingly. Then, gene ontology (GO) enrichment analysis was performed on each cell cluster. As expected, epithelial cells proliferation processes were enriched in clusters with lower EMT scores, while cell migration and mesenchymal related processes were enriched in the clusters with higher EMT levels. Furthermore, the proportion of carcinoma cells in each cluster was calculated, and a positive correlation between tumor cell percentage and EMT score was observed accordingly (Fig. 4h). This result can be explained as the EMT score was calculated using a cancer specific EMT signature matrix. Interestingly, after the intermediate EMT state (branch point 1 in Fig. 4f), epithelial cells were differentiated into two branches. The upper branch, which mainly consisted of cluster 10-12, had a significantly more proportion of carcinoma cells and a higher EMT score compared to the other branch that was made up of more normal cells. Finally, the circRNA expression level in each cluster was calculated (Supplementary Fig. 4d). With the transition from the epithelial cell (cluster 1-2) to the intermediate EMT state (cluster 3-5), the average expression level of circRNA increased accordingly (Fig. 4i), which is consistent with the global activation of circRNAs during EMT38. However, in the later stage of EMT, an unexpected decrease in circRNA expression was observed. In particular, the circRNA expression level in carcinoma cells was decreased in the mesenchymal stage (cluster 9-12) compared to that of normal cells. The difference between normal and carcinoma cells in the mesenchymal stage suggested the weakened accumulation effect of circRNAs with tumor cell proliferation in the later stage of EMT. Finally, the heterogeneity of circRNA between patients was investigated. As shown in Supplementary Fig. 4e, the metastasis and primary tumor from one patient exhibited similar expression patterns, and a large variation in cells from different patients could be observed. Taken together, we profiled the detailed profile of circRNA expression during EMT, revealing the complex inter- and intra-tumor heterogeneity of circRNAs between primary and metastasis samples from breast cancer patients.
Cell-specific circRNAs providing insights into optimal cell type discrimination
In previous studies, many computational methods have been developed to explore the heterogeneity of tumor-infiltrating immune cells in bulk RNA-seq datasets using cell type-specific marker genes58–61. Based on the high cellular specificity of circRNAs, we further speculated the possibility of using circRNAs as biomarkers to improve the performance of cell type decomposition. To construct a high-quality circRNA signature matrix, the scRNA-seq cohort from 17 different human and mouse tissues along with cognate cancer samples were investigated (Fig. 5a). Then, we also collected 446 and 777 bulk normal and tumor RNA-seq datasets from the circAtlas7 and MiOncoCirc62 databases to validate the performance of circRNA in cell-type deconvolution. In brief, all scRNA-seq samples were analyzed using the Seurat26 pipeline (see Methods). Next, cell composition in human and mouse datasets were predicted using marker genes from published databases63,64 and literature, and then curated using SingleR65 prediction results. Firstly, all circRNAs were divided into five groups according to the expression pattern in different cell types and tissues (Fig. 5b). In summary, a total of 12,625 circRNAs across all samples were only detected in one cell type, of which 6,623 (52.5%) were also reported in the bulk RNA-seq based resource (Supplementary Fig. 5a). As shown in Supplementary Fig. 5b, these circRNAs were mutually detected in a variety of tissues and samples, indicating the potential of these circRNAs as biomarkers for cell-type classification. Besides, 3.24% of circRNAs were detected in multiple cell types within a single tissue, which also validated the tissue-specific expression pattern of these circRNAs. About 50% of circRNAs exhibited a constitutive expression in more than 50% of cells or expressed in multiple cell types, suggesting the “housekeeping” role of these circRNAs in specific tissues or cell types. Meanwhile, the majority of circRNAs were “lowly expressed” in only one cell, which is consistent with the randomized biogenesis of most circRNAs reported in a recent study66. Afterward, the cell-type-specific architecture of circRNAs in human and mouse samples was summarized, and the relationship of shared circRNAs was plotted in Fig. 5c. Similar to the gene expression landscape reported in the previous study67, circRNAs also exhibited distinct expression clusters between cell types with different functions. Specifically, several orthologous cell-type-specific circRNAs between human and mouse cells were also detected, implying the conserved biological function of these circRNA subsets.
To validate the potential of circRNA serving as cell type biomarkers, the overlap between expressed circRNAs in different cell types and bulk RNA-seq datasets were further calculated. As shown in Fig. 5d, circRNA detected in bulk RNA-seq data exhibited a highly specific overlap with cellular expressed circRNAs. For instance, 39.36% of circRNAs detected in GABAergic neurons can be simultaneously detected in normal brain samples, and the overlap of circRNAs in human and brain samples was also highly enriched in cell types identified in the previous results. To compare the performance of circRNAs and genes as cellular biomarkers in profiling tumor-infiltrating cells, only cell types that were annotated in human tumor samples were included in the downstream analysis. Afterward, the cell-type specificity of all expressed circRNAs, marker genes from public databases, and 1,000 randomly selected genes were calculated. Notably, the cell type specificity of circRNAs was significantly higher than that of marker genes and random control genes, which further indicated the ability of circRNAs to serve as cell-type biomarkers (Fig. 5e). Then, the composition of tumor-infiltrating immune cells in cancer-related bulk RNA-seq datasets was calculated using CIBERSORT68 with marker genes from the LM22 gene set and cell-type-specific circRNAs from immune cells, respectively (Fig. 5f). The performance of cell-type decomposition was assessed by log-scale root-mean-square error (RMSE) provided in the CIBERSORT results, which represent the bias between original and imputed marker gene expression values. As shown in Fig. 5g, the deconvolution results using circRNAs have significantly lower RMSE values (p = 0.015, Wilcoxon test), which represents better accuracy in estimating cell compositions. These results demonstrated the applicability of circRNAs serves as better cell-type biomarkers in exploring the heterogeneity of tumor-infiltrating immune cells, which also suggested the important biological roles of these circRNAs in certain cell types.
To this end, we further integrated the cellular architecture of circRNAs and the circRNA signature matrix in immune cells into a web server called the circRNA single-cell portal (circSC). The circSC portal provides comprehensive information including cellular expression profile, differentially expressed results, and the catalogue of circRNAs identified in an enormous number of human and mouse cells (Fig. 6). The circSC portal has been integrated into circAtlas as an individual module (http://circatlas.biols.ac.cn/), providing convenient browsing and searching functions of both the single-cell and bulk RNA-seq expression pattern of circRNAs of interest. Thus, we believe that our database can serve as an important resource for exploring the dynamic changes of circRNAs in embryo development, tissue differentiation, and cancer biogenesis process, and it provides a unique and useful platform for the circRNA community.
Discussion
In this study, we reported the single-cell landscape of circRNAs using a large-scale full-length scRNA-seq cohort. We identified a total of 139,643 and 214,747 circRNAs in human and mouse single cells, respectively. We also validated detected circRNAs using public resources based on bulk RNA-seq data and discovered 216,602 high-confidence circRNAs ( 5 supporting reads) that were uniquely detected in the single-cell cohort. Based on these datasets, we rigorously investigated the single-cell expression pattern of circRNAs in different tissues, developing stages, and cell states. Furthermore, we revealed the relatively higher cell specificity of circRNAs compared to the linear mRNAs and demonstrated the promising role of circRNAs in improving the performance of cell composition estimation from bulk RNA-seq datasets.
Given that circRNAs do not have poly(A) structures like their cognate linear RNAs, the most widely used oligo(dT) priming methods could not detect circRNAs effectively. However, recent studies have demonstrated the circRNAs could also be detected at low levels in poly(A) enriched libraries14, which further validated the feasibility of studying the single-cell circRNA landscape using the tremendous number of public scRNA-seq datasets. As most circRNAs are derived from exonic regions, the identification of circRNAs largely relies on the detection of back-splicing junction sequences. Thus, most 3’ end sequencing methods like Drop-seq and 10X Genomics Chromium Single Cell 3’ method are not likely to generate fragments spanning the junction site and thus are not suitable for circRNA detection. Therefore, only datasets generated from 8 full-length scRNA-seq methods were collected in our study, which provided the basis for exploring circRNA expression with an unprecedented resolution.
The large-scale integration of scRNA-seq datasets provides an opportunity to reveal the dynamic changes of circRNAs in different cell types or developing stages. In this study, we found that circRNAs were highly enriched in neurons compared to other cells in brain samples. The inhibitory and excitatory neurons also exhibited cell-specific circRNA expression patterns that were correlated with RBP expression levels, suggesting the highly specific expression of circRNAs under the regulation of RBP in diverse cell types. We also explored the dynamic changes of circRNAs during the human and mouse embryo development process. Aside from the maternal effect of circRNAs reported in previous studies10,41, we further demonstrated the dramatic increase of circRNA expression during late MZT stages, which indicated the strong activation of zygotic circRNAs in pre-implantation embryos.
The circRNA expression in tumor samples has been extensively elucidated using bulk RNA-seq datasets. However, the results are often affected by the cancer-to-normal cell ratios among the studied tumor specimens, where the difference of tumor purity between samples could result in biased or false-positive results. In contrast to the well-known role of ciRS-7 as an oncogene69, a recent study has experimentally validated that the expression of the ciRS-7 is absent in stromal tumor cells but highly expressed in stromal cells within tumors12. Thus, the investigation of circRNAs at the single-cell level has become an emerging aspect in studying circRNA function in tumor genesis and metastasis. In this study, we comprehensively investigated the expression landscape circRNAs in 20 breast carcinoma patients and demonstrated the heterogeneity of circRNAs between lesions and cell types. We utilized the EMT-score to measure the differentiation state of cells during the EMT process and further revealed the distinct changing pattern of normal and carcinoma cells. Finally, we also investigated the intertumoral heterogeneity of circRNAs between patients with different lineage stages. The circRNAs exhibited a similar expression pattern from primary and metastasis tumors from the same patient but have disparate expression patterns between patients. These high heterogeneities of circRNAs suggested the importance of single-cell level investigation of circRNAs, which provides an important basis to understand the role of circRNAs during tumorigenesis.
Our previous studies have revealed the highly specific expression of circRNAs in different tissues and species7,9,70,71. Here, we explored the highly cell-specific expression of circRNAs at single-cell resolution and identified 12,625 circRNAs that were only detected in one cell type. Moreover, we generated the circRNA reference of 8 immune cell types and validated that the cell composition deconvolution results using circRNAs as cell-type signatures have better accuracy compared to that using gene markers only, which suggested the emerging role of these circRNAs as promising biomarkers in profiling tumor-infiltrating immune cells. This study further explores the cellular landscape and reveals high cell specificity of circRNAs in human and mouse samples, which largely expands our understanding of circRNA biogenesis during complex biological processes. Therefore, we developed the circSC database to investigate the circRNA in single-cell resolution, which will provide a useful platform for the circRNA community. Nevertheless, the construction of the full panorama of circRNAs is still limited by the low circRNA capture efficiency in state-of-art scRNA-seq methods, and the performance of cell-type decomposition can also be affected by the relatively low expression of these cell-type specific circRNAs in bulk RNA-seq samples. At the same time, recent nanopore sequencing based strategies like isoCirc72 and CIRI-long73,74 have been proved to be able to capture lowly expressed circRNAs in bulk RNA-seq libraries with high efficiency. However, further comparison demonstrates these methods still have inadequate capacity in detecting cell-type specific circRNAs (Supplementary Fig. 6). Taken together, our study has demonstrated the highly specific expression of circRNAs at an unprecedent resolution, which suggests the emerging importance of developing further single-cell or spatial level sequencing technologies specifically for detecting circRNAs.
Methods
Single-cell RNA-seq dataset collection
Full-length single-cell RNA sequencing datasets were collected from publicly available resources across multiple tissues and cell types of human and mouse samples. Raw sequencing data were downloaded from the Single Cell Expression Atlas (https://ebi.ac.uk/gxa/sc/home) and the Gene Expression Omnibus (https://ncbi.nlm.nih.gov/geo) using the SRA-Toolkit (v2.9.4). Metadata information of these datasets was retrieved from the corresponding literature. To ensure the effective capture of circRNAs, only full-length and high-resolution single-cell transcriptome sequencing methods including MATQ-seq16, Quartz-seq17, RamDA-seq18, SMARTer19, Smart-seq20, Smart-seq221, SUPeR-seq10, and Tang’s method22 were included. The detailed information of study accession number and cell numbers of the collected cohort was provided in the Supplementary Data 1.
Single-cell RNA-seq analysis and integration
For analysis of scRNA-seq data, the human reference genome (GRCh38) and mouse reference genome (GRCm38) were downloaded from the GENCODE project. Then, raw sequencing reads were aligned using HISAT2 (v2.0.5)23, and StringTie (v1.2.4)24 was performed for gene quantification. Next, a quality control step was implemented by Scater (v1.18.6)25 to filter high confidence cells, where the appropriate thresholds of library size, gene expression values, mitochondrial reads, and the total amount of mRNA indicators in each study were estimated by perCellQCMetrics function. Afterward, the outlier cells were identified based on median-absolute-deviation (MAD) using isOutlier function.
Then, we used Seurat (v4.0.2)26 to perform downstream analysis including normalization, batch effect removal, dimensional reduction, clustering, and data visualization. The anchor-based canonical correlation analysis (CCA) method in the Seurat package was performed for dataset integration and batch effect correction. Then, the integrated data was adopted to highly variable genes analysis, principal component analysis (PCA), neighborhood graph, and cell type clustering using the default parameters. Considering the inconsistency between different datasets, the normalized expression of mRNA and circRNAs was calculated by the size factor from integrated data performed by Scater25.
Cell type annotation
Cell clusters were annotated based on canonical cell markers from published literature (Supplementary Table 3) and databases including CellMarker63 and PanglaoDB64. Then, annotation results are curated using the SingleR (v1.4.1)65 algorithm with various reference datasets (Blueprint/ENCODE, human primary cell atlas, Novershtern hematopoietic data, Monaco immune data, and Database of Immune Cell Expression). The curated annotation results are determined by combining both results from our pipeline and the original studies. The CopyKAT (v1.0.4)52 workflow is used to determine normal cells and carcinoma cells with aneuploid rearrangement. The list of abbreviations for cell type names is listed in Supplementary Table 4.
CircRNA detection and quantification
For circRNA analysis, we used bwa (v0.7.12)75 for split-mapping of raw reads, then the CIRI2 (v2.0.6)27 and CIRIquant (v1.1)29 pipeline was performed for circRNA identification and quantification. Then, stringent circRNAs were further filtered with a threshold of 2 supporting BSJ reads in the whole single cell dataset. The circRNA expression levels are measured using counts per million mapped reads (CPM). To eliminate the batch effect between different datasets, the number of supporting reads of each circRNA is normalized using size factor from gene normalization results, and the expression matrix at single-cell level is generated as output. Then, the circRNA expression profile in various tissues are aggregated by summing the expression value of circRNA in each cell. Finally, FindAllMarkers function in Seurat was used for differential expression analysis.
Rerverse transcription PCR (RT-PCR) validation
To validate the reliability of circRNA detection in the scRNA-seq data. Outward primers were specifically designed to validate the back-spliced junction sequence of 12 randomly selected cell-type specific circRNAs that were detected in less than 10 cells. For RT-PCR, total RNA from the brain of one healthy adult mice (C57BL/6, female, 17 weeks) was isolated using TRIzol (Invitrogen, 15596026 and 15596018), and the quality was assessed with Qsep 100 Bio-Fragment Analyzer (BiOptic). The linear RNAs were digested with 20 U of RNase R (Lucigen, RNR07250) in a 50 reaction for 30 min according to a previous study76, and ribosomal RNA was removed using KAPA RiboErase Kit (Human/Mouse/Rat, KK8481) according to the manufacturer’s instructions. Here, a 2.2x RNA Clean XP (Beckman, A63987) cleanup was performed after each step. Finally, cleaned RNA was reverse transcribed using random primers and the Hifair® II 1st Strand cDNA Synthesis Kit (Yeasen, 11121ES60) following the manufacturer’s instruction. Then RT-PCR experiments of 12 circRNAs were performed using Rapid Taq Master Mix (Vazyme P222) under the following conditions: 95 °C for 3 min; 35 cycles of 95 °C for 15 s, 55 °C for 15 s, and 72 °C for 60 s; 72 °C for 10 min. Finally, the sequences of PCR products were determined using Sanger sequencing. All sequences of primers and PCR products were supplied in the Supplementray Table 2.
Trajectory analysis
For branching trajectory and pseudo-time analysis, Monocle 2 (v.2.8.2)77 was performed on scRNA-seq data to reveal the cell differentiation state. Cluster information was extracted from the Seurat results, and high variable genes were selected to determine the transition state or development process.
Public circRNA databases and bulk RNA-seq data
To validate the circRNAs detected in scRNA-seq data, a total of 10 public circRNA resources, including circAtlas (v2.0)7, circbank78, circBase5, CIRCpedia (v2)6, CircRic15, circRNADb79, MiOncoCirc (v2.0)62, deepbase (v2.0)80, TCSD8 and CSCD81 were collected. The circRNA coordinate was converted to the hg38/mm10 genome using liftover, and all circRNAs were integrated for downstream analysis. The length of the full-length assembled circRNAs in circAtlas was extracted for comparison. The bulk RNA-seq data of normal and tumor samples were downloaded from circAtlas and MiOncoCirc database and analyzed using the same method described above.
Gene ontology enrichment analysis
Gene set enrichment analysis against Gene Ontology pathways was performed by the ClusterProfiler (v4.0)82 and Enrichr83 software. The significant GO terms were filtered by a threshold of p < 0.05 values using the modified Fisher’s exact test.
Maternal and zygotic circRNAs cluster
Hierarchical clustering in embryo development for gene and circRNA was performed based on fuzzy c-means clustering by Mfuzz (v2.50.0)42.
Cell specificity calculation
The cell specificity of gene and circRNAs was calculated using the following equation:
1 |
Where is the average expression value of genes or circRNAs in different cells, and is the number of tissues or cell types.
Cell composition inference
All carcinoma cells were integrated and clustered as described above, then the epithelial cells, fibroblast, and endothelial cells were removed from the cluster results. The remaining cells were clustered again, and cell clusters were annotated to different immune cell types (macrophages, monocytes, T cells, mast cells, dendritic cells, B-cell, NK cells, neutrophils, eosinophils, and plasma cells). The circRNA signature was then filtered using the following criteria: (1) circRNAs expressed in at least 2 cell types; (2) circRNAs exhibited a significantly higher expression in one cell type than the others. The LM22 gene signature matrix was downloaded from the CIBERSORT webserver68, and cell composition deconvolution results were aggregated to the cell types described above. The RMSE and correlation from CIBERSORT results were used for comparison.
Statistics & reproducibility
No statistical method was used to predetermine sample size. No data were excluded from the analyses and all analyses were not randomized.
Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.
Supplementary information
Acknowledgements
This work was supported by grants to F.Z. from the National Natural Science Foundation of China [32130020, 32025009, 91940306] and National Key R&D Project [2021YFA1300500] and to J.Z. from the National Key R&D Project [2021YFA1302000].
Source data
Author contributions
F.Z. conceived the project. W.W. and J.Z. analyzed the data. X.C. and Z.C. performed the experiments. W.W. designed the database. J.Z., W.W., and F.Z. wrote the manuscript. The authors read and approved the final manuscript.
Peer review
Peer review information
Nature Communications thanks Leng Han, and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Data availability
The cellular expression results of circRNAs reported in this study have been deposited in the Genome Sequence Archive84, China National Center for Bioinformation under accession number “PRJCA009653”. The RNA-seq datasets used for circRNA identification are listed in the Supplementary Data 1. Source data have been deposited in “zenodo [10.5281/zenodo.6528434]”. All other relevant data supporting the key findings of this study are available within the article and its Supplementary Information files or from the corresponding author upon reasonable request. Source data are provided with this paper.
Code availability
The analysis pipeline is available at the “circSC” module in “circAtlas [http://circatlas.biols.ac.cn]” and in the “Github repository [https://github.com/bioinfo-biols/Code_for_circSC]“85.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
These authors Contributed equally: Wanying Wu, Jinyang Zhang.
Supplementary information
The online version contains supplementary material available at 10.1038/s41467-022-30963-8.
References
- 1.Liu CX, et al. Structure and degradation of circular RNAs regulate PKR activation in innate immunity. Cell. 2019;177:865–880 e821. doi: 10.1016/j.cell.2019.03.046. [DOI] [PubMed] [Google Scholar]
- 2.Liu B, et al. An inducible circular RNA circKcnt2 inhibits ILC3 activation to facilitate colitis resolution. Nat. Commun. 2020;11:4076. doi: 10.1038/s41467-020-17944-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Zhao Q, et al. Targeting mitochondria-located circRNA SCAR alleviates NASH via reducing mROS output. Cell. 2020;183:76–93 e22. doi: 10.1016/j.cell.2020.08.009. [DOI] [PubMed] [Google Scholar]
- 4.Gao X, et al. Circular RNA-encoded oncogenic E-cadherin variant promotes glioblastoma tumorigenicity through activation of EGFR-STAT3 signalling. Nat. Cell Biol. 2021;23:278–291. doi: 10.1038/s41556-021-00639-4. [DOI] [PubMed] [Google Scholar]
- 5.Glazar P, Papavasileiou P, Rajewsky N. circBase: A database for circular RNAs. RNA. 2014;20:1666–1670. doi: 10.1261/rna.043687.113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Dong R, Ma XK, Li GW, Yang L. CIRCpedia v2: An updated database for comprehensive circular RNA annotation and expression comparison. Genomics Proteom. Bioinforma. 2018;16:226–233. doi: 10.1016/j.gpb.2018.08.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Wu W, Ji P, Zhao F. CircAtlas: An integrated resource of one million highly accurate circular RNAs from 1070 vertebrate transcriptomes. Genome Biol. 2020;21:101. doi: 10.1186/s13059-020-02018-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Xia S, et al. Comprehensive characterization of tissue-specific circular RNAs in the human and mouse genomes. Brief. Bioinform. 2017;18:984–992. doi: 10.1093/bib/bbw081. [DOI] [PubMed] [Google Scholar]
- 9.Ji P, et al. Expanded expression landscape and prioritization of circular RNAs in mammals. Cell Rep. 2019;26:3444–3460 e3445. doi: 10.1016/j.celrep.2019.02.078. [DOI] [PubMed] [Google Scholar]
- 10.Fan X, et al. Single-cell RNA-seq transcriptome analysis of linear and circular RNAs in mouse preimplantation embryos. Genome Biol. 2015;16:148. doi: 10.1186/s13059-015-0706-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Zhong C, Yu S, Han M, Chen J, Ning K. Heterogeneous circRNA expression profiles and regulatory functions among HEK293T single cells. Sci. Rep. 2017;7:14393. doi: 10.1038/s41598-017-14807-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Kristensen LS, et al. Spatial expression analyses of the putative oncogene ciRS-7 in cancer reshape the microRNA sponge theory. Nat. Commun. 2020;11:4551. doi: 10.1038/s41467-020-18355-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Wang PL, et al. Circular RNA is expressed across the eukaryotic tree of life. PLoS One. 2014;9:e90859. doi: 10.1371/journal.pone.0090859. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Szabo L, Salzman J. Detecting circular RNAs: Bioinformatic and experimental challenges. Nat. Rev. Genet. 2016;17:679–692. doi: 10.1038/nrg.2016.114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Ruan H, et al. Comprehensive characterization of circular RNAs in ~ 1000 human cancer cell lines. Genome Med. 2019;11:55. doi: 10.1186/s13073-019-0663-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Sheng K, Cao W, Niu Y, Deng Q, Zong C. Effective detection of variation in single-cell transcriptomes using MATQ-seq. Nat. Methods. 2017;14:267–270. doi: 10.1038/nmeth.4145. [DOI] [PubMed] [Google Scholar]
- 17.Sasagawa Y, et al. Quartz-Seq: A highly reproducible and sensitive single-cell RNA sequencing method, reveals non-genetic gene-expression heterogeneity. Genome Biol. 2013;14:R31. doi: 10.1186/gb-2013-14-4-r31. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Hayashi T, et al. Single-cell full-length total RNA sequencing uncovers dynamics of recursive splicing and enhancer RNAs. Nat. Commun. 2018;9:619. doi: 10.1038/s41467-018-02866-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Verboom K, et al. SMARTer single cell total RNA sequencing. Nucleic Acids Res. 2019;47:e93. doi: 10.1093/nar/gkz535. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Ramskold D, et al. Full-length mRNA-Seq from single-cell levels of RNA and individual circulating tumor cells. Nat. Biotechnol. 2012;30:777–782. doi: 10.1038/nbt.2282. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Picelli S, et al. Smart-seq2 for sensitive full-length transcriptome profiling in single cells. Nat. Methods. 2013;10:1096–1098. doi: 10.1038/nmeth.2639. [DOI] [PubMed] [Google Scholar]
- 22.Tang F, et al. mRNA-Seq whole-transcriptome analysis of a single cell. Nat. Methods. 2009;6:377–382. doi: 10.1038/nmeth.1315. [DOI] [PubMed] [Google Scholar]
- 23.Kim D, Langmead B, Salzberg SL. HISAT: A fast spliced aligner with low memory requirements. Nat. Methods. 2015;12:357–360. doi: 10.1038/nmeth.3317. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Pertea M, et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol. 2015;33:290–295. doi: 10.1038/nbt.3122. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.McCarthy DJ, Campbell KR, Lun AT, Wills QF. Scater: pre-processing, quality control, normalization and visualization of single-cell RNA-seq data in R. Bioinformatics. 2017;33:1179–1186. doi: 10.1093/bioinformatics/btw777. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Butler A, Hoffman P, Smibert P, Papalexi E, Satija R. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat. Biotechnol. 2018;36:411–420. doi: 10.1038/nbt.4096. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Gao Y, Zhang J, Zhao F. Circular RNA identification based on multiple seed matching. Brief. Bioinform. 2018;19:803–810. doi: 10.1093/bib/bbx014. [DOI] [PubMed] [Google Scholar]
- 28.Gao Y, Zhao F. Computational strategies for exploring circular RNAs. Trends Genet. 2018;34:389–400. doi: 10.1016/j.tig.2017.12.016. [DOI] [PubMed] [Google Scholar]
- 29.Zhang J, Chen S, Yang J, Zhao F. Accurate quantification of circular RNAs identifies extensive circular isoform switching events. Nat. Commun. 2020;11:90. doi: 10.1038/s41467-019-13840-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Zheng Y, Zhao F. Visualization of circular RNAs and their internal splicing events from transcriptomic data. Bioinformatics. 2020;36:2934–2935. doi: 10.1093/bioinformatics/btaa033. [DOI] [PubMed] [Google Scholar]
- 31.Zhang J, Zhao F. Reconstruction of circular RNAs using Illumina and Nanopore RNA-seq datasets. Methods. 2021;196:17–22. doi: 10.1016/j.ymeth.2021.03.017. [DOI] [PubMed] [Google Scholar]
- 32.Xu H, Guo S, Li W, Yu P. The circular RNA Cdr1as, via miR-7 and its targets, regulates insulin transcription and secretion in islet cells. Sci. Rep. 2015;5:12453. doi: 10.1038/srep12453. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Chen XJ, et al. The Circular RNome of Developmental Retina in Mice. Mol. Ther. Nucleic Acids. 2020;19:339–349. doi: 10.1016/j.omtn.2019.11.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Li L, et al. Comprehensive analysis of circRNA expression profiles in humans by RAISE. Int J. Oncol. 2017;51:1625–1638. doi: 10.3892/ijo.2017.4162. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Salzman J, Chen RE, Olsen MN, Wang PL, Brown PO. Cell-type specific features of circular RNA expression. PLoS Genet. 2013;9:e1003777. doi: 10.1371/journal.pgen.1003777. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Rybak-Wolf A, et al. Circular RNAs in the Mammalian Brain Are Highly Abundant, Conserved, and Dynamically Expressed. Mol. Cell. 2015;58:870–885. doi: 10.1016/j.molcel.2015.03.027. [DOI] [PubMed] [Google Scholar]
- 37.Yanai I, et al. Genome-wide midrange transcription profiles reveal expression level relationships in human tissue specification. Bioinformatics. 2005;21:650–659. doi: 10.1093/bioinformatics/bti042. [DOI] [PubMed] [Google Scholar]
- 38.Conn SJ, et al. The RNA binding protein quaking regulates formation of circRNAs. Cell. 2015;160:1125–1134. doi: 10.1016/j.cell.2015.02.014. [DOI] [PubMed] [Google Scholar]
- 39.Gao Y, et al. Comprehensive identification of internal structure and alternative splicing events in circular RNAs. Nat. Commun. 2016;7:12060. doi: 10.1038/ncomms12060. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.He P, et al. The changing mouse embryo transcriptome at whole tissue and single-cell resolution. Nature. 2020;583:760–767. doi: 10.1038/s41586-020-2536-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Dang Y, et al. Tracing the expression of circular RNAs in human pre-implantation embryos. Genome Biol. 2016;17:130. doi: 10.1186/s13059-016-0991-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Kumar L, E Futschik M. Mfuzz: A software package for soft clustering of microarray data. Bioinformation. 2007;2:5–7. doi: 10.6026/97320630002005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Lee J, Jung MK, Park HJ, Kim KE, Cho D. Erdr1 suppresses murine melanoma growth via regulation of apoptosis. Int J. Mol. Sci. 2016;17:107. doi: 10.3390/ijms17010107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Soto R, et al. Microbiota promotes systemic T-cell survival through suppression of an apoptotic factor. Proc. Natl Acad. Sci. USA. 2017;114:5497–5502. doi: 10.1073/pnas.1619336114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Smid M, et al. The circular RNome of primary breast cancer. Genome Res. 2019;29:356–366. doi: 10.1101/gr.238121.118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Cao L, et al. Circular RNA circRNF20 promotes breast cancer tumorigenesis and Warburg effect through miR-487a/HIF-1alpha/HK2. Cell Death Dis. 2020;11:145. doi: 10.1038/s41419-020-2336-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Chen S, et al. circVAMP3 drives CAPRIN1 phase separation and inhibits hepatocellular carcinoma by suppressing c-Myc translation. Adv. Sci. (Weinh.) 2022;9:e2103817. doi: 10.1002/advs.202103817. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Chen S, Zhang J, Zhao F. Screening linear and circular RNA transcripts from stress granules. Genomics Proteom. Bioinforma. 2022 doi: 10.1016/j.gpb.2022.01.003. [DOI] [PubMed] [Google Scholar]
- 49.Chung W, et al. Single-cell RNA-seq enables comprehensive tumour and immune cell profiling in primary breast cancer. Nat. Commun. 2017;8:15081. doi: 10.1038/ncomms15081. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Karaayvaz M, et al. Unravelling subclonal heterogeneity and aggressive disease states in TNBC through single-cell RNA-seq. Nat. Commun. 2018;9:3588. doi: 10.1038/s41467-018-06052-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Davis RT, et al. Transcriptional diversity and bioenergetic shift in human breast cancer metastasis revealed by single-cell RNA sequencing. Nat. Cell Biol. 2020;22:310–320. doi: 10.1038/s41556-020-0477-0. [DOI] [PubMed] [Google Scholar]
- 52.Gao R, et al. Delineating copy number and clonal substructure in human tumors from single-cell transcriptomes. Nat. Biotechnol. 2021;39:599–608. doi: 10.1038/s41587-020-00795-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Zheng Q, et al. Circular RNA profiling reveals an abundant circHIPK3 that regulates cell growth by sponging multiple miRNAs. Nat. Commun. 2016;7:11215. doi: 10.1038/ncomms11215. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Okholm TLH, et al. Circular RNA expression is abundant and correlated to aggressiveness in early-stage bladder cancer. NPJ Genom. Med. 2017;2:36. doi: 10.1038/s41525-017-0038-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Zhao J, et al. Circlular RNA BARD1 (Hsa_circ_0001098) overexpression in breast cancer cells with TCDD treatment could promote cell apoptosis via miR-3942/BARD1 axis. Cell Cycle. 2018;17:2731–2744. doi: 10.1080/15384101.2018.1556058. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Cheng X, et al. Comprehensive circular RNA profiling identifies CircFAM120A as a new biomarker of hypoxic lung adenocarcinoma. Ann. Transl. Med. 2019;7:442. doi: 10.21037/atm.2019.08.79. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Chae YK, et al. Epithelial-mesenchymal transition (EMT) signature is inversely associated with T-cell infiltration in non-small cell lung cancer (NSCLC) Sci. Rep. 2018;8:2918. doi: 10.1038/s41598-018-21061-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Aran D, Hu Z, Butte AJ. xCell: digitally portraying the tissue cellular heterogeneity landscape. Genome Biol. 2017;18:220. doi: 10.1186/s13059-017-1349-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Newman AM, et al. Determining cell type abundance and expression from bulk tissues with digital cytometry. Nat. Biotechnol. 2019;37:773–782. doi: 10.1038/s41587-019-0114-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Finotello F, et al. Molecular and pharmacological modulators of the tumor immune contexture revealed by deconvolution of RNA-seq data. Genome Med. 2019;11:34. doi: 10.1186/s13073-019-0638-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Li T, et al. TIMER2.0 for analysis of tumor-infiltrating immune cells. Nucleic Acids Res. 2020;48:W509–W514. doi: 10.1093/nar/gkaa407. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Vo JN, et al. The landscape of circular RNA in cancer. Cell. 2019;176:869–881 e813. doi: 10.1016/j.cell.2018.12.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Zhang X, et al. CellMarker: a manually curated resource of cell markers in human and mouse. Nucleic Acids Res. 2019;47:D721–D728. doi: 10.1093/nar/gky900. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Franzen, O., Gan, L. M. & Bjorkegren, J. L. M. PanglaoDB: A web server for exploration of mouse and human single-cell RNA sequencing data. Database (Oxford)2019, baz046 (2019). [DOI] [PMC free article] [PubMed]
- 65.Aran D, et al. Reference-based analysis of lung single-cell sequencing reveals a transitional profibrotic macrophage. Nat. Immunol. 2019;20:163–172. doi: 10.1038/s41590-018-0276-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Xu C, Zhang J. Mammalian circular RNAs result largely from splicing errors. Cell Rep. 2021;36:109439. doi: 10.1016/j.celrep.2021.109439. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Karlsson, M. et al. A single-cell type transcriptomics map of human tissues. Sci. Adv.7, eabh2169 (2021). [DOI] [PMC free article] [PubMed]
- 68.Newman AM, et al. Robust enumeration of cell subsets from tissue expression profiles. Nat. Methods. 2015;12:453–457. doi: 10.1038/nmeth.3337. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Weng W, et al. Circular RNA ciRS-7-A promising prognostic biomarker and a potential therapeutic target in colorectal cancer. Clin. Cancer Res. 2017;23:3918–3928. doi: 10.1158/1078-0432.CCR-16-2541. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Zheng Y, Ji P, Chen S, Hou L, Zhao F. Reconstruction of full-length circular RNAs enables isoform-level quantification. Genome Med. 2019;11:2. doi: 10.1186/s13073-019-0614-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Gao Y, Wang J, Zhao F. CIRI: An efficient and unbiased algorithm for de novo circular RNA identification. Genome Biol. 2015;16:4. doi: 10.1186/s13059-014-0571-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Xin R, et al. isoCirc catalogs full-length circular RNA isoforms in human transcriptomes. Nat. Commun. 2021;12:266. doi: 10.1038/s41467-020-20459-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Zhang J, et al. Comprehensive profiling of circular RNAs with nanopore sequencing and CIRI-long. Nat. Biotechnol. 2021;39:836–845. doi: 10.1038/s41587-021-00842-6. [DOI] [PubMed] [Google Scholar]
- 74.Zhang J, Zhao F. Characterizing circular RNAs using nanopore sequencing. Trends Biochem Sci. 2021;46:785–786. doi: 10.1016/j.tibs.2021.06.002. [DOI] [PubMed] [Google Scholar]
- 75.Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv [q-bio.GN] (2013).
- 76.Vromman, M. et al. Validation of Circular RNAs Using RT‐qPCR After Effective Removal of Linear RNAs by Ribonuclease R. Curr. Protocols1, e181 (2021). [DOI] [PubMed]
- 77.Qiu X, et al. Reversed graph embedding resolves complex single-cell trajectories. Nat. Methods. 2017;14:979–982. doi: 10.1038/nmeth.4402. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Liu M, Wang Q, Shen J, Yang BB, Ding X. Circbank: a comprehensive database for circRNA with standard nomenclature. RNA Biol. 2019;16:899–905. doi: 10.1080/15476286.2019.1600395. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Chen X, et al. circRNADb: A comprehensive database for human circular RNAs with protein-coding annotations. Sci. Rep. 2016;6:34985. doi: 10.1038/srep34985. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Xie F, et al. deepBase v3.0: expression atlas and interactive analysis of ncRNAs from thousands of deep-sequencing data. Nucleic Acids Res. 2021;49:D877–D883. doi: 10.1093/nar/gkaa1039. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Feng J, et al. CSCD2: an integrated interactional database of cancer-specific circular RNAs. Nucleic Acids Res. 2022;50:D1179–D1183. doi: 10.1093/nar/gkab830. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82.Wu T, et al. clusterProfiler 4.0: A universal enrichment tool for interpreting omics data. Innovation. 2021;2:100141. doi: 10.1016/j.xinn.2021.100141. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Kuleshov MV, et al. Enrichr: A comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Res. 2016;44:W90–W97. doi: 10.1093/nar/gkw377. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84.Members, C.-N. & Partners. Database resources of the National Genomics Data Center, China National Center for Bioinformation in 2022. Nucleic Acids Res. 50, D27–D38 (2022). [DOI] [PMC free article] [PubMed]
- 85.Zhang, J. Exploring the cellular landscape of circular RNAs using full-length single-cell RNA sequencing. https://github.com/bioinfo-biols/Code_for_circSC, 10.5281/zenodo.6558694 (2022). [DOI] [PMC free article] [PubMed]
- 86.Maag JLV. gganatogram: An R package for modular visualisation of anatograms and tissues based on ggplot2. F1000Research. 2018;7:1576. doi: 10.12688/f1000research.16409.1. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The cellular expression results of circRNAs reported in this study have been deposited in the Genome Sequence Archive84, China National Center for Bioinformation under accession number “PRJCA009653”. The RNA-seq datasets used for circRNA identification are listed in the Supplementary Data 1. Source data have been deposited in “zenodo [10.5281/zenodo.6528434]”. All other relevant data supporting the key findings of this study are available within the article and its Supplementary Information files or from the corresponding author upon reasonable request. Source data are provided with this paper.
The analysis pipeline is available at the “circSC” module in “circAtlas [http://circatlas.biols.ac.cn]” and in the “Github repository [https://github.com/bioinfo-biols/Code_for_circSC]“85.