Abstract
Summary
MicroRNAs (miRNAs) are master regulators of gene expression in cancers. Their sequence variants or isoforms (isomiRs) are highly abundant and possess unique functions. Given their short sequence length and high heterogeneity, mapping isomiRs can be challenging; without adequate depth and data aggregation, low frequency events are often disregarded. To address these challenges, we present the Tumor IsomiR Encyclopedia (TIE): a dynamic database of isomiRs from over 10 000 adult and pediatric tumor samples in The Cancer Genome Atlas (TCGA) and The Therapeutically Applicable Research to Generate Effective Treatments (TARGET) projects. A key novelty of TIE is its ability to annotate heterogeneous isomiR sequences and aggregate the variants obtained across all datasets. Results can be browsed online or downloaded as spreadsheets. Here, we show analysis of isomiRs of miR-21 and miR-30a to demonstrate the utility of TIE.
Availability and implementation
TIE search engine and data are freely available to use at https://isomir.ccr.cancer.gov/.
Supplementary information
Supplementary data are available at Bioinformatics online.
1 Introduction
MicroRNAs (miRNAs) are small non-coding RNAs (∼22nt) which modulate the expression of the vast majority of human transcripts (Gebert and MacRae, 2019). Each mature miRNA can present as a variety of isoforms (isomiRs) that differ in length and sequence composition (Morin et al., 2008). It is well-documented that miRNAs play a critical role in cancer development and cancer disparities (Esquela-Kerscher and Slack, 2006; Telonis and Rigoutsos, 2018). The alteration in isomiR profiles correlates with cancer progression, suggesting a unique role for isomiRs in tumorigenesis (McCall et al., 2017; Telonis et al., 2015, 2017). Multiple reports suggest that isomiRs can be used as biomarkers for tumor detection, classification and prognosis (Parafioriti et al., 2020; Wang et al., 2019). These observations highlight the importance of having tools to analyze isomiR expression profiles.
However, it is difficult to investigate isomiRs with existing databases. The main challenges are the disparity in their sequences and abundance. While the overall isomiRs could be dominant for certain miRNAs, the abundance of a large portion of the individual isomiRs is close to the level of sequencing noise. As a result, many low frequency isomiRs are often discarded with current mapping algorithms. Here, we aim to provide the field with a database that contains a comprehensive analysis of isomiRs from the largest existing cohort of human tumor samples. Toward this, the TIE database analyzes ∼97 billion reads and ∼16 billion small RNA sequences, and reports isomiRs profiles (Fig. 1A,B). The depth of the TIE database and the unique mapping algorithm allow the detection of highly specific modifications.
2 Results
2.1 Database usage
The TIE database dynamically aligns (Bofill-De Ros et al., 2019) the small-RNA-seq data collected from 11 667 patient samples across 33 types of adult and 3 pediatric tumors (Data accessible via dbGaP Study Accession: phs000178.v11.p8 and phs000218.v22.p8). A summary is described in Supplementary Table S1. Users can query the isomiR profile of any miRNA documented in the miRBase. The output includes isomiR expression and sequence composition among other parameters. TIE also reports the effective number of seeds, which measures the diversity of seed sequences generated by 5’ isomiRs via using an inverse Simpson index as previously described (Bofill-De Ros, et al., 2019). For each miRNA, isomiR profiles of cancer samples are compared to those from corresponding normal tissues and shown as pie-charts. Detailed isomiR analyses of individual patients and cancer types are reported as column-wise tables that are easily downloaded as spreadsheets.
A screenshot of the website is found in Supplementary Figure S1C. IsomiR profiles of many highly expressed miRNAs were precompiled and ready (Browse isomiR tab). Alternatively, users can run an isomiR analysis for their miRNA of interest across all cancer types (Run query tab). A primary aim of the TIE database is to aggregate all miRNA isoforms detected in tumors. The Align files tab is an entry point for browsing the sequence variability of any given miRNA across this big pan-cancer dataset. Results can be filtered to visualize only isomiRs with templated or non-templated nucleotides. Additional functionalities of TIE are listed in Supplementary Table S2.
2.2 Case study
One of the most well-studied oncogenic miRNAs is miR-21-5p (Medina et al., 2010). Analysis of its expression across different tumors and normal samples collected in the TIE database confirms previous reports of its general overexpression in cancer (Supplementary Fig. S1A). miR-21 is upregulated ∼7-fold in the kidney papillary cell carcinoma (KIRP), while a modest increase (∼2-fold) was observed in other cancers. Despite the well-established upregulation of miR-21 in tumors, it remains unclear to what extent the isomiR profile changes during tumorigenesis; study of this will lead to not only novel biomarkers, but also to the identification of potential regulations of miR-21 biogenesis and decay. To this end, we analyzed miR-21-5p isomiRs with TIE (Supplementary Fig. S1B). Although the average percentage of isomiRs oscillates between 24%-38% in different cancers, miR-21 isomiR is more abundant than the canonical miR-21 in many patient samples. Interestingly, the level of isomiRs is significantly higher in some cancer types such as LUSC when compared to their corresponding normal samples (LUSC-N). Further analysis revealed that the mono-adenylated isoform was enriched in tumor samples (Supplementary Fig. S1B), thus supporting a hypothesis that they may play a functional role during tumorigenesis. Another case study investigating miR-30 isomiRs can be found in the supplementary methods.
3 Discussion
IsomiRs are highly prevalent as a result of sequence modifications. Recent studies have shown that isomiRs have distinct targeting specificities and/or turnover rates (Rüegger and Großhans, 2012; Yang et al., 2019, 2020). In fact, it was shown that the oncomiR miR-21 is regulated via TENT4B adenylation and subsequent degradation by PARN (Boele et al., 2014), a mechanism implicated in the regulation of p53 (Shukla et al., 2019). These examples support the idea that isomiRs are involved in novel oncogenic roles or the loss of function of well-established tumor suppressor miRNAs.
Similar to other databases (Fromm et al., 2020; Kozomara et al., 2019; Zhang et al., 2016), TIE accurately reports the expression of miRNA isoforms. However, TIE excels at providing in-depth analyses of isomiRs and easy access to the most comprehensive set of adult and childhood cancers. This enables inquiry without downloading or storing sensitive data locally. The big data analysis of isomiRs in pan-cancer datasets could pave the way for a better understanding of their functions in tumorigenesis.
One limitation of the current version of TIE is that the settings regarding the motif and the number of mismatches is predetermined. As a result, the isomiR analysis is restricted to the well annotated miRNAs in the miRBase. However, recent studies have identified hundreds of novel cell-type specific miRNAs (Londin et al., 2015). Future versions of TIE will allow queries of unannotated or viral miRNAs. It will be also interesting to explore the impact of isomiRs on cancer by combining TIE with other datasets such as the exome or the transcriptomic profiles available for TCGA and TARGET.
Funding
This work was funded by the intramural research program of the National Cancer Institute, National Institutes of Health [ZIA-BC-011566].
Conflict of Interest: none declared.
Supplementary Material
Contributor Information
Xavier Bofill-De Ros, RNA Mediated Gene Regulation Section, RNA Biology Laboratory, Center for Cancer Research, National Cancer Institute, Frederick, MD, USA.
Brian Luke, Advanced Biomedical Computational Science, Frederick National Laboratory for Cancer Research, Frederick, MD, USA.
Robert Guthridge, Advanced Biomedical Computational Science, Frederick National Laboratory for Cancer Research, Frederick, MD, USA; Leidos Biomedical Research Inc., Frederick, MD 21702, USA.
Uma Mudunuri, Advanced Biomedical Computational Science, Frederick National Laboratory for Cancer Research, Frederick, MD, USA; Leidos Biomedical Research Inc., Frederick, MD 21702, USA.
Michael Loss, Advanced Biomedical Computational Science, Frederick National Laboratory for Cancer Research, Frederick, MD, USA; Leidos Biomedical Research Inc., Frederick, MD 21702, USA.
Shuo Gu, RNA Mediated Gene Regulation Section, RNA Biology Laboratory, Center for Cancer Research, National Cancer Institute, Frederick, MD, USA.
References
- Boele J. et al. (2014) PAPD5-mediated 3’ adenylation and subsequent degradation of miR-21 is disrupted in proliferative disease. Proc. Natl. Acad. Sci. USA, 111, 11467–11472. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bofill-De Ros X. et al. (2019) QuagmiR: a cloud-based application for isomiR big data analytics. Bioinformatics, 35, 1576–1578. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Esquela-Kerscher A., Slack F.J. (2006) Oncomirs – microRNAs with a role in cancer. Nat. Rev. Cancer, 6, 259–269. [DOI] [PubMed] [Google Scholar]
- Fromm B. et al. (2020) MirGeneDB 2.0: the metazoan microRNA complement. Nucleic Acids Res., 48, D132–D141. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gebert L.F.R., MacRae I.J. (2019) Regulation of microRNA function in animals. Nat. Rev. Mol. Cell Biol., 20, 21–37. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kozomara A. et al. (2019) miRBase: from microRNA sequences to function. Nucleic Acids Res., 47, D155–D162. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Londin E. et al. (2015) Analysis of 13 cell types reveals evidence for the expression of numerous novel primate- and tissue-specific microRNAs. Proc. Natl. Acad. Sci. USA, 112, E1106–E1115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- McCall M.N. et al. (2017) Toward the human cellular microRNAome. Genome Res., 27, 1769–1781. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Medina P.P. et al. (2010) OncomiR addiction in an in vivo model of microRNA-21-induced pre-B-cell lymphoma. Nature, 467, 86–90. [DOI] [PubMed] [Google Scholar]
- Morin R.D. et al. (2008) Application of massively parallel sequencing to microRNA profiling and discovery in human embryonic stem cells. Genome Res., 18, 610–621. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Parafioriti A. et al. (2020) Expression profiling of microRNAs and isomiRs in conventional central chondrosarcoma. Cell Death Discov., 6, 46. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rüegger S., Großhans H. (2012) MicroRNA turnover: when, how, and why. Trends Biochem. Sci., 37, 436–446. [DOI] [PubMed] [Google Scholar]
- Shukla S. et al. (2019) The RNase PARN controls the levels of specific miRNAs that contribute to p53 regulation. Mol. Cell, 73, 1204–1216.e4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Telonis A.G. et al. (2015) Beyond the one-locus-one-miRNA paradigm: microRNA isoforms enable deeper insights into breast cancer heterogeneity. Nucleic Acids Res., 43, 9158–9175. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Telonis A.G. et al. (2017) Knowledge about the presence or absence of miRNA isoforms (isomiRs) can successfully discriminate amongst 32 TCGA cancer types. Nucleic Acids Res., 45, 2973–2985. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Telonis A.G., Rigoutsos I. (2018) Race disparities in the contribution of miRNA isoforms and tRNA-derived fragments to triple-negative breast cancer. Cancer Res., 78, 1140–1154. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang S. et al. (2019) Tumor classification and biomarker discovery based on the 5’isomiR expression level. BMC Cancer, 19, 127. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yang A. et al. (2019) 3’ Uridylation confers miRNAs with non-canonical target repertoires. Mol. Cell, 75, 511–522.e4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yang A. et al. (2020) AGO-bound mature miRNAs are oligouridylated by TUTs and subsequently degraded by DIS3L2. Nat. Commun., 11, 2765. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang Y. et al. (2016) IsomiR Bank: a research resource for tracking IsomiRs. Bioinformatics, 32, 2069–2071. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.