Abstract
High-throughput RNA-seq has revolutionized the process of small RNA (sRNA) discovery, leading to a rapid expansion of sRNA categories. In addition to the previously well-characterized sRNAs such as microRNAs (miRNAs), piwi-interacting RNAs (piRNAs), and small nucleolar RNA (snoRNAs), recent emerging studies have spotlighted on tRNA-derived sRNAs (tsRNAs) and rRNA-derived sRNAs (rsRNAs) as new categories of sRNAs that bear versatile functions. Since existing software and pipelines for sRNA annotation are mostly focused on analyzing miRNAs or piRNAs, here we developed the sRNA annotation pipelineoptimized for rRNA- and tRNA-derived sRNAs (SPORTS1.0). SPORTS1.0 is optimized for analyzing tsRNAs and rsRNAs from sRNA-seq data, in addition to its capacity to annotate canonical sRNAs such as miRNAs and piRNAs. Moreover, SPORTS1.0 can predict potential RNA modification sites based on nucleotide mismatches within sRNAs. SPORTS1.0 is precompiled to annotate sRNAs for a wide range of 68 species across bacteria, yeast, plant, and animal kingdoms, while additional species for analyses could be readily expanded upon end users’ input. For demonstration, by analyzing sRNA datasets using SPORTS1.0, we reveal that distinct signatures are present in tsRNAs and rsRNAs from different mouse cell types. We also find that compared to other sRNA species, tsRNAs bear the highest mismatch rate, which is consistent with their highly modified nature. SPORTS1.0 is an open-source software and can be publically accessed at https://github.com/junchaoshi/sports1.0.
Keywords: Small RNA, RNA-seq data analysis, tsRNA, rsRNA, Annotation pipeline
Introduction
Expanding classes of small RNAs (sRNAs) have emerged as key regulators of gene expression, genome stability, and epigenetic regulation [1], [2]. In addition to the previously well-characterized sRNA classes such as microRNAs (miRNAs), Piwi-interacting RNA (piRNAs), and small nucleolar RNA (snoRNAs), recent analysis of sRNA-seq data has led to the identification of expanding novel sRNA families. These include tRNA-derived sRNAs (tsRNAs; also known as tRNA-derived fragments, tRFs) and rRNA-derived sRNAs (rsRNAs) [3]. tsRNAs and rsRNAs have been discovered in a wide range of species with evolutionary conservation, supposedly due, in part, to the highly conservative sequence of their respective precursors, i.e., tRNAs and rRNAs [3]. Interestingly, tsRNAs and rsRNAs have been abundantly found in unicellular organisms (e.g., protozoa), where canonical sRNA pathways such as miRNA, siRNA, and piRNAs are entirely lacking [4], [5], [6]. The dynamic regulation of tsRNAs and rsRNAs in these unicellular organisms suggests that they are among the most ancient classes of sRNAs for intra- and inter-cellular communications [7].
Moreover, recent emerging evidence from mammalian species have highlighted the diverse biological functions mediated by tsRNAs, including regulating ribosome biogenesis, translation initiation, retrotransposon control, cancer metastasis, stem cell differentiation, neurological diseases, and epigenetic inheritance [3], [8], [9], [10], [11], [12], [13], [14], [15]. Although tsRNAs are known to be involved in regulating these processes at both post-transcriptional and translational levels [11], [14], [16], the exact molecular mechanisms of how tsRNAs exert their functions have not been fully understood. Compared to tsRNAs, rsRNAs are more recently discovered and also exhibit tissue-specific distribution. Dynamic expression of rsRNAs is associated with diseases such as metabolic disorders and inflammation [17], [18], [19]. The diverse biological functions of tsRNAs and rsRNAs and their strong disease associations are now pushing the new frontier of sRNA research.
Currently, there are multiple existing general sRNA annotation software and pipelines [20], [21], [22], [23], [24], and some have been developed aiming to analyze tsRNAs [25], [26], [27]. However, there is still a lack of specialized tools that can simultaneously analyze both tsRNAs and rsRNAs in addition to other canonical sRNAs. Here, we provide the sRNA annotation pipeline optimized for rRNA- and tRNA-derived sRNAs (SPORTS1.0), which can annotate and profile canonical sRNAs such as miRNAs and piRNAs, and is also optimized to analyze tsRNAs and rsRNAs from sRNA-seq data. In addition, SPORTS1.0 can help predict potential RNA modification sites based on nucleotide mismatches within sRNAs.
Method
The source code of SPORTS1.0 is written in Perl and R. The whole package and installation instructions are available on Github (https://github.com/junchaoshi/sports1.0). SPORTS1.0 can apply to a wide-range of species and the annotation references of 68 species are precompiled for downloading (Table S1).
The workflow of SPORTS1.0 consists of four main steps, i.e., pre-processing, mapping, annotation output, and annotation summary (Figure 1). SRA, FASTQ, and FASTA are the acceptable formats for data input. By calling Cutadapt [28] and Perl scripts extracted from miRDeep2 [29], SPORTS1.0 outputs clean reads by removing sequence adapters, and discarding sequences with length beyond the defined range and those with bases other than ATUCG. The clean reads obtained in pre-processing step are sequentially mapped against reference genome, miRBase [30], rRNA database (collected from NCBI), GtRNAdb [31], piRNA database [32], [33], Ensembl [34] and Rfam [35], upon users’ setting. sRNA sequences are first annotated by Bowtie [36]. Next, a Perl script precompiled in SPORTS1.0 is used to identify the locations of tsRNAs regarding whether they are derived from 5′ terminus, 3′ terminus, or 3′CCA end of tRNAs. Then an R script precompiled in SPORTS1.0 is applied to obtain rsRNA expression level and positional mapping information regarding their respective rRNA precursors (5.8S, 18S, 28S, etc.).
SPORTS1.0 can also be used to analyze sequence mismatch information if mismatches are allowed during alignment process. Such information can help predict potential modification sites that have caused nucleotide misincorporation during the reverse transcription (RT) process as previously reported [37]. In the current version, a mismatch site is designated using criteria as previously described [37]. Binomial distribution is used to address whether the observed mismatch enrichment is significantly higher than the base-calling error. Here, we define perr as the base-calling error rate, nref as the number of nucleotides perfectly fitted to the reference sites, nmut as the number of mismatched nucleotides, and ntot as the sum of nref and nmut. The probability of observing not larger than k perfectly matched nucleotides out of ntot can be calculated as:
SPORTS1.0 provides two methods to evaluate nmut number. The first option is to simply calculate nmut as the read number of sequences containing particular mismatches. Since some sequences may align to multiple reference loci, using this method may result in an increased false-positive rate. A second method is thus included, in which read number of sequences from multiple matching loci are uniformly distributed (based on the assumption that each of these multiple sites equally expresses RNAs) and consequently generates an adjusted nmut.
SPORTS1.0 summary output includes annotation details for each sequence and length distribution along with other statistics. (See sample output Figures 2 and 3, Tables S2 and S3). User guideline is provided online (https://github.com/junchaoshi/sports1.0).
Results
As an example, we used SPORTS1.0 to analyze sRNA-seq datasets from mouse sperm (GSM2304822 [38]), bone marrow cells (GSM1604100 [39]), and intestinal epithelial cells (GSM1975854 [40]) samples. Graphic output by SPORTS1.0 reveals distinct sRNA profiles in sperm (Figure 2A), bone marrow cells (Figure 2B), and intestinal epithelial cells (Figure 2C) samples. tsRNAs and rsRNAs are found equally or more abundantly than the well-known miRNAs or piRNAs (length distribution data for each sRNA type are exemplified in Table S2). In particular, tsRNAs are dominant in sperm, and rsRNAs are highest in bone marrow cells, whereas intestinal epithelial cells contain an appreciable amount of both tsRNAs and rsRNAs in addition to a miRNA peak.
Importantly, SPORTS1.0 found an appreciable portion of rsRNAs annotated in sperm (48.7%), bone marrow cell (11.1%) and intestinal epithelial cell (61.1%) samples that are previously deemed as “unmatch genome” (UMG) (Figure 2A−C upper pie chart). This is because these newly annotated rsRNAs are derived from rRNA genes (rDNA), which were not assembled and shown in current mouse genome (mm10) [41], and thus were discarded before analysis by previous sRNA analyzing pipelines. SPORTS1.0 can now annotate and analyze these rsRNAs, including providing the subtypes of rRNA precursors (5.8S, 18S, 28S, etc.) from which they are derived from (Figure 3A−C), as well as the loci mapping information (Figure 3D−F). Interestingly, our analyses revealed that the specific loci that generate rsRNAs are completely distinct among sperm, bone marrow cell, and intestinal epithelial cell samples (Figure 3D−F), suggesting distinct biogenesis and functions of these rsRNAs. Similarly, SPORTS1.0 also revealed tissue-specific landscape of tsRNAs in terms of their relative abundance (Figure 2A−C lower pie chart) and the tRNA loci where they are derived from (5’ terminus, 3’ terminus, 3’CCA end, etc.) (Figure 4 and Figures S1–S3). Since tsRNAs from different loci bear distinct biological functions [3], the tissue-specific tsRNA composition may represent features that define the unique functions of respective tissue/cell types.
In addition, SPORTS1.0 also revealed distinct mismatch rates among different types of sRNAs (Figure 5 and Table S3), with tsRNAs showing the highest. The detected mismatch sites represent the modified nucleotides that might have caused misincorporation of nucleotides during the RT process. The relatively higher mismatch rate detected in tsRNA sequences is consistent with their highly modified nature. The mismatch sites detected by SPORTS1.0 could provide a potential source for further analyses of RNA modifications within sRNAs.
Finally, SPORTS1.0 can analyze sRNAs of a wide range of species, depending on the availability of their reference genome and sRNA sequences (Figure 6 and Table S1). The species to be analyzed and their associated sRNA references are subject to update in future versions, or can be customized by the end users.
Conclusion
SPORTS1.0 is an easy-to-use and flexible pipeline for analyzing sRNA-seq data across a wide-range of species. Using mice as example, SPORTS1.0 provides a far more complicated sRNA landscape than having been previously seen, highlighting a tissue-specific dynamic regulation of tsRNAs and rsRNAs. SPORTS1.0 can also predict potential RNA modification sites based on nucleotide mismatches within sRNAs, and show a distinct pattern between different sRNA types. SPORTS1.0 may set the platform for potential new discoveries in biomedical and evolutionary studies that are related to sRNAs.
The real voyage of discovery consists not in seeking new landscapes, but in looking with new eyes.
Marcel Proust
Authors’ contributions
JS, TZ, and QC conceived the idea and wrote the manuscript. JS and TZ developed the SPORTS1.0 software and analyzed the RNA-seq data. JS, EK, KMS, QC, and TZ contributed to the interpretation of the results. All authors read and approved the final manuscript.
Competing interests
The authors have declared no competing interests.
Acknowledgments
We would like to thank Songjia Fan and Tin Nguyen for their constructive suggestions for the manuscript. This work is supported by Start-up funds for Zhou and Chen labs from Reno School of Medicine, University of Nevada and from the National Institutes of Health, United States (Grant Nos. R01DK091336 and P01DK041315 to KMS; Grant Nos. R01HD092431 and P30GM110767-03 to QC).
Handled by Fangqing Zhao
Footnotes
Peer review under responsibility of Beijing Institute of Genomics, Chinese Academy of Sciences and Genetics Society of China.
Supplementary material associated with this article can be found, in the online version, at https://doi.org/10.1016/j.gpb.2018.04.004.
Contributor Information
Junchao Shi, Email: junchaoshi@nevada.unr.edu.
Qi Chen, Email: cqi@med.unr.edu.
Tong Zhou, Email: tongz@med.unr.edu.
Supplementary material
References
- 1.Cech T.R., Steitz J.A. The noncoding RNA revolution-trashing old rules to forge new ones. Cell. 2014;157:77–94. doi: 10.1016/j.cell.2014.03.008. [DOI] [PubMed] [Google Scholar]
- 2.Chen Q., Yan W., Duan E. Epigenetic inheritance of acquired traits through sperm RNAs and sperm RNA modifications. Nat Rev Genet. 2016;17:733–743. doi: 10.1038/nrg.2016.106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Kumar P., Kuscu C., Dutta A. Biogenesis and function of transfer RNA-related fragments (tRFs) Trends Biochem Sci. 2016;41:679–689. doi: 10.1016/j.tibs.2016.05.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Lambertz U., Oviedo Ovando M.E., Vasconcelos E.J., Unrau P.J., Myler P.J., Reiner N.E. Small RNAs derived from tRNAs and rRNAs are highly enriched in exosomes from both old and new world Leishmania providing evidence for conserved exosomal RNA Packaging. BMC Genomics. 2015;16:151. doi: 10.1186/s12864-015-1260-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Garcia-Silva M.R., das Neves R.F., Cabrera-Cabrera F., Sanguinetti J., Medeiros L.C., Robello C. Extracellular vesicles shed by Trypanosoma cruzi are linked to small RNA pathways, life cycle regulation, and susceptibility to infection of mammalian cells. Parasitol Res. 2014;113:285–304. doi: 10.1007/s00436-013-3655-1. [DOI] [PubMed] [Google Scholar]
- 6.Liao J.Y., Guo Y.H., Zheng L.L., Li Y., Xu W.L., Zhang Y.C. Both endo-siRNAs and tRNA-derived small RNAs are involved in the differentiation of primitive eukaryote Giardia lamblia. Proc Natl Acad Sci U S A. 2014;111:14159–14164. doi: 10.1073/pnas.1414394111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Szempruch A.J., Dennison L., Kieft R., Harrington J.M., Hajduk S.L. Sending a message: extracellular vesicles of pathogenic protozoan parasites. Nat Rev Microbiol. 2016;14:669–675. doi: 10.1038/nrmicro.2016.110. [DOI] [PubMed] [Google Scholar]
- 8.Chen Q., Yan M., Cao Z., Li X., Zhang Y., Shi J. Sperm tsRNAs contribute to intergenerational inheritance of an acquired metabolic disorder. Science. 2016;351:397–400. doi: 10.1126/science.aad7977. [DOI] [PubMed] [Google Scholar]
- 9.Schorn A.J., Gutbrod M.J., LeBlanc C., Martienssen R. LTR-retrotransposon control by tRNA-derived small RNAs. Cell. 2017;170:61–71. doi: 10.1016/j.cell.2017.06.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Anderson P., Ivanov P. tRNA fragments in human health and disease. FEBS Lett. 2014;588:4297–4304. doi: 10.1016/j.febslet.2014.09.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Kim H.K., Fuchs G., Wang S., Wei W., Zhang Y., Park H. A transfer-RNA-derived small RNA regulates ribosome biogenesis. Nature. 2017;552:57–62. doi: 10.1038/nature25005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Gebetsberger J., Wyss L., Mleczko A.M., Reuther J., Polacek N. A tRNA-derived fragment competes with mRNA for ribosome binding and regulates translation during stress. RNA Biol. 2017;14:1364–1373. doi: 10.1080/15476286.2016.1257470. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Martinez G., Choudury S.G., Slotkin R.K. tRNA-derived small RNAs target transposable element transcripts. Nucleic Acids Res. 2017;45:5142–5152. doi: 10.1093/nar/gkx103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Ivanov P., Emara M.M., Villen J., Gygi S.P., Anderson P. Angiogenin-induced tRNA fragments inhibit translation initiation. Mol Cell. 2011;43:613–623. doi: 10.1016/j.molcel.2011.06.022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Schimmel P. The emerging complexity of the tRNA world: mammalian tRNAs beyond protein synthesis. Nat Rev Mol Cell Biol. 2018;19:45–58. doi: 10.1038/nrm.2017.77. [DOI] [PubMed] [Google Scholar]
- 16.Luo S., He F., Luo J., Dou S., Wang Y., Guo A. Drosophila tsRNAs preferentially suppress general translation machinery via antisense pairing and participate in cellular starvation response. Nucleic Acids Res. 2018 doi: 10.1093/nar/gky189. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Wei H., Zhou B., Zhang F., Tu Y., Hu Y., Zhang B. Profiling and identification of small rDNA-derived RNAs and their potential biological functions. PLoS One. 2013;8:e56842. doi: 10.1371/journal.pone.0056842. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Chu C., Yu L., Wu B., Ma L., Gou L.T., He M. A sequence of 28S rRNA-derived small RNAs is enriched in mature sperm and various somatic tissues and possibly associates with inflammation. J Mol Cell Biol. 2017;9:256–259. doi: 10.1093/jmcb/mjx016. [DOI] [PubMed] [Google Scholar]
- 19.Zhang Y., Zhang X., Shi J., Tuorto F., Li X., Liu Y. Dnmt2 mediates intergenerational transmission of paternally acquired metabolic disorders through sperm small non-coding RNAs. Nat Cell Biol. 2018;20:535–540. doi: 10.1038/s41556-018-0087-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Wu X., Kim T.K., Baxter D., Scherler K., Gordon A., Fong O. sRNAnalyzer-a flexible and customizable small RNA sequencing data analysis pipeline. Nucleic Acids Res. 2017;45:12140–12151. doi: 10.1093/nar/gkx999. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Mohorianu I., Stocks M.B., Applegate C.S., Folkes L., Moulton V. The UEA small RNA workbench: a suite of computational tools for small RNA analysis. Methods Mol Biol. 2017;1580:193–224. doi: 10.1007/978-1-4939-6866-4_14. [DOI] [PubMed] [Google Scholar]
- 22.Rueda A., Barturen G., Lebron R., Gomez-Martin C., Alganza A., Oliver J.L. sRNAtoolbox: an integrated collection of small RNA research tools. Nucleic Acids Res. 2015;43:W467–W473. doi: 10.1093/nar/gkv555. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Axtell M.J. ShortStack: comprehensive annotation and quantification of small RNA genes. RNA. 2013;19:740–751. doi: 10.1261/rna.035279.112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Fasold M., Langenberger D., Binder H., Stadler P.F., Hoffmann S. DARIO: a ncRNA detection and analysis tool for next-generation sequencing experiments. Nucleic Acids Res. 2011;39:W112–W117. doi: 10.1093/nar/gkr357. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Thompson A., Zielezinski A., Plewka P., Szymanski M., Nuc P., Szweykowska-Kulinska Z. tRex: a web portal for exploration of tRNA-derived fragments in Arabidopsis thaliana. Plant Cell Physiol. 2018;59:e1. doi: 10.1093/pcp/pcx173. [DOI] [PubMed] [Google Scholar]
- 26.Zheng L.L., Xu W.L., Liu S., Sun W.J., Li J.H., Wu J. tRF2Cancer: a web server to detect tRNA-derived small RNA fragments (tRFs) and their expression in multiple cancers. Nucleic Acids Res. 2016;44:W185–W193. doi: 10.1093/nar/gkw414. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Selitsky S.R., Sethupathy P. tDRmapper: challenges and solutions to mapping, naming, and quantifying tRNA-derived RNAs from human small RNA-sequencing data. BMC Bioinformatics. 2015;16:354. doi: 10.1186/s12859-015-0800-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J. 2011;17:3. [Google Scholar]
- 29.Friedlander M.R., Chen W., Adamidi C., Maaskola J., Einspanier R., Knespel S. Discovering microRNAs from deep sequencing data using miRDeep. Nat Biotechnol. 2008;26:407–415. doi: 10.1038/nbt1394. [DOI] [PubMed] [Google Scholar]
- 30.Kozomara A., Griffiths-Jones S. miRBase: annotating high confidence microRNAs using deep sequencing data. Nucleic Acids Res. 2014;42:D68–D73. doi: 10.1093/nar/gkt1181. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Chan P.P., Lowe T.M. GtRNAdb 2.0: an expanded database of transfer RNA genes identified in complete and draft genomes. Nucleic Acids Res. 2016;44:D184–D189. doi: 10.1093/nar/gkv1309. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Zhang P., Si X., Skogerbo G., Wang J., Cui D., Li Y. piRBase: a web resource assisting piRNA functional study. Database. 2014;2014:bau110. doi: 10.1093/database/bau110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Sai Lakshmi S., Agrawal S. piRNABank: a web resource on classified and clustered Piwi-interacting RNAs. Nucleic Acids Res. 2008;36:D173–D177. doi: 10.1093/nar/gkm696. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Yates A., Akanni W., Amode M.R., Barrell D., Billis K., Carvalho-Silva D. Ensembl 2016. Nucleic Acids Res. 2016;44:D710–D716. doi: 10.1093/nar/gkv1157. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Nawrocki E.P., Burge S.W., Bateman A., Daub J., Eberhardt R.Y., Eddy S.R. Rfam 12.0: updates to the RNA families database. Nucleic Acids Res. 2015;43:D130–D137. doi: 10.1093/nar/gku1063. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Langmead B., Trapnell C., Pop M., Salzberg S.L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10:R25. doi: 10.1186/gb-2009-10-3-r25. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Ryvkin P., Leung Y.Y., Silverman I.M., Childress M., Valladares O., Dragomir I. HAMR: high-throughput annotation of modified ribonucleotides. RNA. 2013;19:1684–1692. doi: 10.1261/rna.036806.112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Yang Q., Lin J., Liu M., Li R., Tian B., Zhang X. Highly sensitive sequencing reveals dynamic modifications and activities of small RNAs in mouse oocytes and early embryos. Sci Adv. 2016;2:e1501482. doi: 10.1126/sciadv.1501482. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Tuorto F., Herbst F., Alerasool N., Bender S., Popp O., Federico G. The tRNA methyltransferase Dnmt2 is required for accurate polypeptide synthesis during haematopoiesis. EMBO J. 2015;34:2350–2362. doi: 10.15252/embj.201591382. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Peck B.C., Mah A.T., Pitman W.A., Ding S., Lund P.K., Sethupathy P. Functional transcriptomics in diverse intestinal epithelial cell types reveals robust microRNA sensitivity in intestinal stem cells to microbial status. J Biol Chem. 2017;292:2586–2600. doi: 10.1074/jbc.M116.770099. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.McStay B., Grummt I. The epigenetics of rRNA genes: from molecular to chromosome biology. Annu Rev Cell Dev Biol. 2008;24:131–157. doi: 10.1146/annurev.cellbio.24.110707.175259. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.