The discovery potential of RNA processing profiles

Amadís Pagès; Ivan Dotu; Joan Pallarès-Albanell; Eulàlia Martí; Roderic Guigó; Eduardo Eyras

doi:10.1093/nar/gkx1115

. 2017 Nov 16;46(3):e15. doi: 10.1093/nar/gkx1115

The discovery potential of RNA processing profiles

Amadís Pagès ^1,², Ivan Dotu ^1,³, Joan Pallarès-Albanell ^1,², Eulàlia Martí ^1,², Roderic Guigó ^1,², Eduardo Eyras ^1,^4,^✉

PMCID: PMC5814818 PMID: 29155959

Abstract

Small non-coding RNAs (sncRNAs) are highly abundant molecules that regulate essential cellular processes and are classified according to sequence and structure. Here we argue that read profiles from size-selected RNA sequencing capture the post-transcriptional processing specific to each RNA family, thereby providing functional information independently of sequence and structure. We developed SeRPeNT, a new computational method that exploits reproducibility across replicates and uses dynamic time-warping and density-based clustering algorithms to identify, characterize and compare sncRNAs by harnessing the power of read profiles. We applied SeRPeNT to: (i) generate an extended human annotation with 671 new sncRNAs from known classes and 131 from new potential classes, (ii) show pervasive differential processing of sncRNAs between cell compartments and (iii) predict new molecules with miRNA-like behaviour from snoRNA, tRNA and long non-coding RNA precursors, potentially dependent on the miRNA biogenesis pathway. Furthermore, we validated experimentally four predicted novel non-coding RNAs: a miRNA, a snoRNA-derived miRNA, a processed tRNA and a new uncharacterized sncRNA. SeRPeNT facilitates fast and accurate discovery and characterization of sncRNAs at an unprecedented scale. SeRPeNT code is available under the MIT license at https://github.com/comprna/SeRPeNT.

INTRODUCTION

Small non-coding RNAs (sncRNAs) are highly abundant functional transcription products that regulate essential cellular processes, from splicing or protein synthesis to the catalysis of post-transcriptional modifications or gene expression regulation (1). Major classes include micro-RNAs (miRNAs), small nucleolar RNAs (snoRNAs), small nuclear RNAs (snRNAs) and transfer RNAs (tRNAs). Developments in high-throughput approaches have facilitated their characterization in terms of sequence and structure (2–4) and have led to the discovery of new molecules in diverse physiological and pathological contexts. However, the function of many of them remains unknown (5,6); hence their characterization is essential to understand multiple cellular processes in health and disease.

Sequence and structure are traditionally used to identify and characterize sncRNAs (7,8). Although sequence is a direct product of the sequencing technology, structure determination is still of limited accuracy and requires specialized protocols (3,4,9). On the other hand, extensive processing is a general characteristic of non-coding RNAs (10–12). The best-characterized cases are miRNAs, which are processed from precursors and preferentially express one arm over the other depending on the cellular conditions (13,14). Furthermore, snoRNAs and tRNAs can be processed into smaller RNAs, whose function is often independent of their precursor (10,15–18). These findings suggest that a new path to systematically characterize RNA molecules emerges through the genome-wide analysis of their sequencing read profiles.

Here we argue that sequencing profiles can be used to directly characterize the function of sncRNAs, in the same way that sequence and structure have been used in the past. We report here on SeRPeNT, a fast and memory efficient software to facilitate the discovery and characterization of known and novel classes of sncRNAs exploiting their processing pattern from small RNA sequencing (sncRNA-seq) experiments. As opposed to previous supervised methods that necessarily rely on known annotations, SeRPeNT is capable of grouping sncRNAs into families without the need of previous annotation and therefore has the potential to discover new classes of sncRNAs. We applied SeRPeNT to generate an extended human annotation with 671 new RNAs from known classes and 131 from new potential classes. We further showed these sncRNAs to have pervasive differential processing between cell compartments and predict new miRNA-like molecules potentially processed from different RNA precursors, snoRNAs, tRNAs and long non-coding RNAs. Finally, we validated experimentally four novel non-coding RNAs predicted by SeRPeNT, highlighting the power of SeRPeNT for the discovery and characterization of sncRNAs.

MATERIALS AND METHODS

Using multiple size-selected small (<200 nt) RNA sequencing (sncRNA-seq) experiments mapped to a genome reference, SeRPeNT enables the discovery and characterization of known and novel sncRNAs through three operations: profiler, annotator and diffproc, which can be used independently or together in a pipeline (Figure 1). Initially, sncRNA read profiles are calculated from the mapped sncRNA-seq reads, and filtered according to the reproducibility between replicates, and to the length and expression constraints given as input (Figure 1A). Pairwise distances between profiles are calculated as a normalized cross-correlation of their alignment calculated using a time-warping algorithm (Figure 1B). Profiles are clustered into families according to pairwise distances using an improved density-based clustering algorithm (Figure 1B). Novel profiles are annotated using the class label from known profiles in the same cluster if available by majority voting (Figure 1C). Additionally, SeRPeNT allows the identification of differential processing of sncRNAs between two conditions, independently of their expression change (Figure 1D).

Figure 1. — Overview of SeRPeNT. Overview of the operations performed by the SeRPeNT: (A) Building of profiles from short RNA-Seq reads mapped to the genome using reproducibility across replicates. A profile is a collection of reads overlapping over a given genomic locus and can be regarded as a vector where each component contains the number of reads at each nucleotide. (B) Density-based clustering of profiles based on pairwise distances calculated with a dynamic time-warping algorithm. (C) Annotation of novel profiles using majority vote in clusters. (D) Differential processing calculation. The distribution of distances between a profile and its cluster sisters in one condition (C₁) and across conditions (C₂) are compared (panel below). Differential processing is determined in terms of a Mann–Whitney U test and a fold-enrichment.

Profile building from aligned short RNA-Seq reads

The tool profiler uses as input one or more sncRNA-seq replicates in BAM format. Consensus read contigs are built by pooling reads that overlap on a genomic region and that are at a distance smaller than a user-defined threshold. Each contig is scored per individual replicate by counting the number of reads mapped within its boundaries and reproducibility is measured across all the biological replicates using either a non-parametric irreproducibility detection rate (NP-IDR) (19) or the simple error ratio estimate (SERE) (20). NP-IDR determines the reproducibility of a contig in one or more replicates with similar sequencing depths. On the other hand, SERE compares the observed variation in the raw number of reads of a contig to an expected value, accounting for the variation in read depth across replicates. For all analyses of reproducibility in this work we used NP-IDR with cut-off of 0.01. Contigs that do not pass the user-defined cutoff of reproducibility are discarded from further analysis. For each of the remaining contigs, a profile is built by counting the number of reads per nucleotide in the genomic region delimited by the contig boundaries (Figure 1A). SeRPeNT defines each sncRNA as a genomic region and a vector of raw read counts, or heights, of length equal to the number of nucleotides spanned by this genomic region. Profiles are additionally trimmed at the 3′-end positions when heights are either below 5 reads or below 10% of the highest position, but not when having more than 20 reads. Only profiles of lengths between 50 and 200 nt, and of minimum height 100 in pooled replicates, were considered. All these parameters can be configured on SeRPeNT command line. The consistency of sncRNA profiles across multiple experiments is determined by calculating the normalized entropy of the different labels for the same sncRNA locus across experiments (Supplementary Materials and Methods).

sncRNA profile clustering

SeRPeNT assigns a distance between each possible pair of profiles resulting from the previous step. This distance is computed with a novel algorithm (described in Supplementary Figure S1) based on dynamic time-warping (21,22) (Figure 1B). This algorithm finds the optimal alignment between two profiles by placing the heights of a pair of profiles along the axes of a grid, representing alignments as paths through the grid cells, and finding the path with maximum normalized cross-correlation score across them. Given a pair of profiles of the same length A = (a₁, …, a_n) and B = (b₁, …, b_n), where a_i and b_i are the heights of nucleotide i in profile A and B, respectively, the cross-correlation score between A and B is defined as:

(1)

and the normalized cross-correlation score as:

(2)

The optimal alignment maximizes the normalized cross-correlation score between the two profiles. Given two profiles S = (s₁, …, s_n) and Q = (q₁, …, q_m) of length n and m nucleotides respectively, each position (i, j) in the dynamic programming matrix D stores a vector of three values D(i,j) = (x, y, z) such that they maximize the value Inline graphic in formula 2 amongst all the possible partial alignments between S_i and Q_j, where S_i = (s₁, …, s_i) and Q_j = (q₁, …, q_j) are the profiles spanning the first i and j nucleotides of the profiles S and Q. The dynamic programming equation is then defined as:

(3)

where, ϕ represents a negative Gaussian white noise function used to penalize an expansion or contraction in the alignment. When applied to a profile S, ϕ(S) returns a random negative value taken from a uniform distribution with mean and standard deviation defined by S.

Once all the pairwise distances are calculated, profiles are clustered using a modified version of a density-based clustering algorithm (23) (described in Supplementary Figure S2A). The clustering algorithm is based on the assumption that clusters are formed by points surrounded by a high density of data points of lower local density and lie at large distance from other profiles of high local density. For each profile i we defined its local density ρ_i as follows:

(4)

where, d_ij is the distance between profiles i and j, and d_c is an optimal distance that determines the size of the neighborhood of a profile. The optimal distance d_c is calculated using a data field calculated from all profiles (24,25) (algorithm described in Supplementary Figure S2B). Once the optimal d_c is obtained, the profile with the highest local density is identified and this profile and all the profiles that are within distance d_c are assigned to a cluster. We introduced a novel step in the clustering in which all the profiles that have already been clustered are removed before the next iteration step. In the next step, a new d_c value is then calculated with the remaining clusters and new local densities are calculated to identify the cluster with highest density, and so on. The algorithm stops when only singletons are produced or when the calculated optimal value for d_c is higher than 0.02. This value represents the maximum distance we allow to start building a cluster from a profile with the highest local density.

Profile annotation

The annotator tool performs the sncRNA profile annotation. Every detected profile that overlaps an annotated short non-coding RNA is marked as known and labeled with the corresponding class label (e.g. H/ACA snoRNA). The minimum overlap amount required between the sncRNA profile and the annotated RNA can be defined by the user. Profiles that do not overlap with any annotation or do not satisfy the overlapping requirements are marked as unknown. For each cluster with two or more profiles, the different labels from all the known profiles are counted, and all the unknown profiles within the cluster are labeled by majority vote with the most abundant label (Figure 1C). In case of a tie, the label of the closest profile is assigned. All the remaining profiles are denoted as unlabeled. Clustered unlabeled profiles represent a coherent group of multiple profiles, and hence potentially indicate a novel sncRNA class.

Differential processing analysis

Differential processing is calculated for each sncRNA from the pairwise distance distributions with sister sncRNAs from the same cluster in either condition. Profiles are considered as differentially processed according to the fold-change and significance of the change. The diffproc tool assesses if a profile P_a in a particular condition C₁ shows a different processing pattern P_b in another condition C₂ (Figure 1D). A pair of profiles P_a and P_b from conditions C₁ and C₂, respectively, such that their reference coordinates overlap as described above, are compared as follows. Given K_a the cluster in condition C₁ that contains the profile P_a and K_b the cluster in condition C₂ that contains the profile P_b, diffproc calculates all the pairwise distances D_ab between P_a and all the profiles in K_b, and the pairwise distances D_b between profiles in K_b (Figure 1). These two distance distributions are then compared using a one-sided Mann-Whitney U test and a fold-change is calculated as the ratio of the medians between both distributions. The same method is applied to profile P_b and cluster K_a. P_a and P_b are then reported as differentially processed if both tests are significant according to the P-value and fold-change cutoffs defined by the user. When there are not enough cases to perform a Mann-Whitney U test, only the fold-change is taken into account.

Accuracy analysis and experimental validation

Details about the accuracy analysis and the experimental validations are available in the Supplementary Material.

Software

SeRPeNT is written in C. The source code is available at https://github.com/comprna/SeRPeNT.

Code and make files to reproduce the analyses described in this manuscript are available at https://github.com/comprna/SeRPeNT-analysis.

RESULTS

Fast and accurate discovery of small non-coding RNAs

We assessed the accuracy of SeRPeNT by performing a comparison against BlockClust (26), an unsupervised method that predicts known sncRNA families from sncRNA-seq data. We evaluated the accuracy to detect known miRNAs, tRNAs and snoRNAs from the GENCODE annotation (27) using the same procedure and dataset used by Videm et al. (26) (Supplementary Materials and Methods). SeRPeNT shows overall similar precision for miRNAs (0.858) and tRNAs (0.855), and a dramatic improvement of the precision for snoRNAs (0.922) (Supplementary Table S1). Of note, although BlockClust was benchmarked in (26) using only C/D-box snoRNAs only, we benchmarked SeRPeNT using also H/ACA-box snoRNAs. Notably, SeRPeNT analysis took ∼3 min and <200 Mb of RAM in a single core AMD Opteron 64 with 4 Gb of memory. In contrast, the same analysis with BlockClust, which included the execution of Blockbuster (28), took ∼15 min and used nearly 30 Gb of memory. Additionally, we compared the performance of SeRPeNT against the supervised version of BlockClust and against DARIO (29), using a cross-fold validation approach (Supplementary Figure S3). Using small RNA-Seq data from MCF-7 cells (8) (GSM769510) for the three methods, SeRPeNT shows overall higher precision in all tested sncRNA families (Supplementary Table S1). Importantly, as opposed to the supervised methods, SeRPeNT did not use the annotation to group sncRNAs profiles.

We also assessed the accuracy of SeRPeNT differential processing operation diffproc by analyzing the differential expression of miRNA arms and arm-switching events in miRNAs between normal and tumor liver tissues (30). From the 49 miRNAs tested, 41 passed our filters of reproducibility and clustered with other sncRNAs. Imposing a significance threshold of P-value < 0.01 and a fold-change of at least 2.5 (Supplementary Figure S4), SeRPeNT identified as differentially processed 10 out of 24 miRNAs described to exhibit different 5′-arm to 3′-arm expression ratio (30), including 4 out of 5 arm-switching events (Supplementary Figure S5). Moreover, only 1 out of the remaining 17 miRNAs that did not exhibit a difference in 5′-arm to 3′-arm expression ratio was identified as differentially processed by SeRPeNT. We further compared SeRPeNT against RPA (31), a recent method for differential processing analysis, using sncRNA-seq data from nine cell lines (32). SeRPeNT detected many more differentially processed events, with a moderate overlap with RPA predictions (Supplementary Figure S6). Notably, for this analysis SeRPeNT took 2 h in a single core AMD Opteron 64 with 4 Gb of memory, whereas RPA took about 10 h in a cluster of 32 cores each having 8 Gb of RAM.

An extended annotation of small non-coding RNAs in human

We decided to exploit SeRPeNT to produce an extended annotation of sncRNAs in human. We applied SeRPeNT profiler and annotator tools to sncRNA-seq data from nine cell lines (32) (Supplementary Table S2 and Figure S7). We observed a higher proportion of known compared to novel sncRNAs, with an increase of novel sncRNAs in samples sequenced at a higher depth: A549, IMR90, MCF-7 and SK-N-SH (Figure 2A). We further measured the accuracy of SeRPeNT in recovering known sncRNA classes using cross-fold validation in these datasets and found an overall high accuracy consistently across all cell lines (Supplementary Table S3), except for snRNAs, probably due to their broad differences in structural features and processing patterns (12). Additionally, in the cross-fold validation SeRPeNT did not annotate on average about 30% of all the profiles detected in known scnRNAs from GENCODE, as they either were in clusters with only unlabeled profiles or because they were singletons. Importantly, the accuracy values were robust when running SeRPeNT with different parameters for minimum expression, reproducibility value between replicates, minimum length of sncRNA profiles or spacing between profiles, and using different sequencing depths (Supplementary Table S4).

Figure 2. — Extended annotation derived from ENCODE cell lines. (A) Number of known and novel sncRNAs across 9 ENCODE cell lines. (B) Hierarchical clustering representation of the clusters obtained for the NHEK cell line. Distance between clusters is calculated by averaging all the distances between profiles from both clusters. Colored circles represent clusters of sncRNAs at the leaves of the tree labeled by class. Empty circles represent internal nodes of the tree. The read profiles in clusters 5 and 11 are for one of its members, for which we plot the number of reads per nucleotide in the sncRNA. (C) Genomic loci and graphical representation of the hairpins for four predicted novel miRNAs. The predicted mature miRNAs are highlighted in blue in the corresponding gene locus: miRNA chr17:57228820–57228919:− (upper left) at the *SKA2* locus, miRNA chr2:29352292–29352349:− (upper right) at the *CLIP4* locus, miRNA chr13:76258915–76258974:+ (lower left) at the *LMO7* locus, and miRNA chr6:142308575–142308638:− (lower right) at an intergenic region.

We annotated new sncRNAs with SeRPeNT and obtained a total of 4673 non-unique sncRNAs across all tested cell lines that were not in the GENCODE annotation (Supplementary Table S5). We were able to assign a label to 2140 of them. From the remaining 2533 unlabeled sncRNAs, 323 formed 92 clusters with three or more unlabeled profiles per cluster, suggesting possible new classes of non-coding RNAs with a coherent processing pattern. We called these clustered uncharacterized RNAs (cuRNAs) and kept them for further study. Interestingly, some known and predicted sncRNAs with the same class labels were grouped into different clusters, indicating subfamilies. For instance, SeRPeNT separated C/D-box and H/ACA-box snoRNAs according to their processing profiles (clusters 1 and 2 in Figure 2B), and separated miRNAs into subtypes according to their different arm-processing patterns (clusters 5 and 11 in Figure 2B). Thus SeRPeNT identifies functional families and subfamilies of non-coding RNAs in a scalable and robust way, independently of the granularity of the available annotation.

We established the consistency of the sncRNAs across the multiple experiments using an entropy measure of the label assignment across cell lines (Supplementary Materials and Methods), producing a total of 929 unique novel sncRNAs (Supplementary Table S6), 787 from the major classes (79 miRNAs, 475 snoRNAs, 82 snRNAs and 151 tRNAs) plus 142 cuRNAs, the majority of them being expressed in only one cell line (Supplementary Figure S8). These, together with the sncRNAs annotated in GENCODE, conformed an extended catalog of small sncRNAs in the human genome reference. The novel sncRNAs are available in (Supplementary Table S6) and in GTF format as Supplementary File.

From the 79 newly predicted miRNAs, 37 (46.8%) were confirmed as potential miRNA precursors using FOMmiR (33) (Supplementary Table S6). Moreover, 39 (49.3%) of these novel miRNAs overlapped with AGO2-loaded small RNAs from HEK293 cells (34). In contrast, from 3109 annotated miRNAs from GENCODE, 951 (30.59%) overlapped with AGO2-loaded small RNAs (Fisher's exact test P-value = 1.14e-3, odds-ratio = 2.01) (Supplementary Table S6). To further characterize these miRNAs, we searched for sequence and secondary structure similarities in Rfam using Infernal (35,36), with threshold e-value < 0.01 (Supplementary Materials and Methods). We found that 23 of them had a hit to a known miRNA family (Supplementary Table S6). Repeating these analyses for the other new sncRNAs we found 47 snoRNA and 15 tRNAs with a hit to an Rfam family, from which 3 snoRNAs and 4 tRNAs had a hit to a family of the same class predicted by SeRPeNT (Supplementary Table S6). The rest of predicted sncRNAs did not have any hit to Rfam. We further compared the predicted sncRNAs from our extended annotation with DASHR (6), the most recently published database of human sncRNAs, and with a compendium of human miRNAs from a recent study using multiple samples (37). We found that 802 out of the 929 predicted sncRNAs (51 miRNAs, 430 snoRNAs, 69 snRNAs, 121 tRNAs and 131 cuRNAs) (Supplementary Table S6) were not present in those catalogs. In particular, four of the newly predicted miRNAs that had a hit to an Rfam miRNA family and were confirmed as potential miRNA precursors with FOMmiR were not present in these previous catalogs (6,37) (Figure 2C). We further checked the overlap of cuRNAs with CAGE data from The FANTOM5 project (38) (‘Materials and Methods’ section). From the 142 cuRNAs in the extended annotation, 32 of them overlapped with CAGE profiles in the same strand. Moreover, for 27 of these 32 (84.3%) the 5′ end of the cuRNA overlaps with the CAGE profile (Supplementary Table S7).

SeRPeNT uncovers new RNAs with potential miRNA-like function

SeRPeNT analysis on individual cell lines identified a cluster that grouped together snoRNA SCARNA15 (ACA45) with two miRNAs in NHEK, and a cluster that grouped snoRNA SCARNA3 with several miRNAs and a tRNA in A549 (Supplementary Table S8) in agreement with a previous study showing that these snoRNAs can function as miRNAs (15). The clusters obtained with SeRPeNT in cell lines provided additional evidence of six other snoRNAs that grouped with miRNAs: SNORD116, SNORA57, SNORD14C, SNORD26, SNORD60 and SNORA3 (Supplementary Table S8), suggesting new snoRNAs with miRNA-like function. Interestingly, we also found seven clusters with a majority of miRNAs that included annotated tRNAs: tRNA-Ile-GAT, tRNA-Glu-GAA, tRNA-Gly-CCC, tRNA-Ala-AGC and tRNA-Leu-AAG, with tRNA-Ile-GAT clustering with miRNAs in three different cell lines, MCF-7, A549 and SK-N-SH. This suggests new tRNAs with miRNA-like function (10,39). These results support the notion that sncRNA read-profiles facilitate the direct identification of functional similarities without the need to analyze sequence or structure.

To search for new cases of miRNA-like non-coding RNAs in the extended annotation, we tested their potential association with components of the canonical miRNA biogenesis pathway, using sncRNA-seq data from controls and individual knockouts of DICER1, DROSHA and XPO5 (40) (Supplementary Materials and Methods). We validated the dependence of a number of known and predicted miRNAs on these three factors (Figure 3A; Supplementary Figures S9 and 10) and recovered the previously described dependence of ACA45 and SCARNA3 with DICER1 (15). Additionally, we found 18 sncRNAs predicted as snoRNAs with similar behaviour upon DICER1 knockout (Figure 3B). Interestingly, 14 out of 20 DICER1-dependent snoRNAs did not show dependence on DROSHA, including ACA45 and SCARNA3, in agreement with previous findings (15,40) (Supplementary Figure S9). We also found a strong dependence on DICER1 for 128 tRNAs, 82 of which changed expression in the direction opposite to most miRNAs, suggesting that they may be repressed by DICER (Figure 3C). Further, four cuRNAs showed similar results to miRNAs, suggesting some association with the miRNA biogenesis machinery (Supplementary Figure S11 and Table S6). Although they were not confirmed as potential miRNA precursors using FOMmiR, two of these miRNA-like cuRNAs overlapped with the protein-coding genes SEC24C and DHFR (Supplementary Figure S11).

Figure 3. — Detection of miRNA-like sncRNAs. Differentially expressed sncRNAs (blue) from the extended annotation in the comparison between *DICER1* knockout and control experiments in human HCT116 cell lines for (A) miRNAs, (B) snoRNAs and (C) tRNAs. The analyses for the knockout of *DROSHA* and *XPO5* are available as Supplementary Figures. (D) Representation of a novel miRNA detected by SeRPeNT (depicted as a read profile) whose precursor is the lncRNA RP11–141B14.1 (depicted as a green line). Profiles for both replicates are included. (E) Secondary structure prediction of the predicted miRNA precursor by FOMmiR.

Certain long non-coding RNAs (lncRNAs) are known to act as precursors of miRNAs (41,42) and tRNAs (43). We thus analyzed whether the new sncRNAs could originate from lncRNAs. We found that 8 miRNAs, 16 snoRNAs, 7 tRNAs and 4 cuRNAs overlapped annotated lncRNAs (Supplementary Table S6). These lncRNAs included MALAT1, which we predicted to produce 2 miRNAs, 2 tRNAs and 1 cuRNA. Additionally, three of the miRNAs predicted and confirmed with FOMmiR were found on the lncRNAs MIR100HG, CTD-23C24–1 and RP11–141B14.1. From these, the new miRNA in RP11–141B14.1 is not present in recent miRNA catalogs (Figure 3D and E). As the processing from lncRNAs is a recognized biogenesis mechanism for certain sncRNAs, these results provide further support for the relevance of the newly predicted sncRNAs in our extended annotation.

Pervasive differential processing of non-coding RNAs between cell compartments

To further characterize the extended sncRNA annotation defined above, we studied their differential processing between four different cell compartments: chromatin, nucleoplasm, nucleolus and cytosol for the cell line K562 using replicated data (32) (Supplementary Table S2). The majority of sncRNAs from the extended annotation showed expression in one or more cell compartments: 599 in chromatin, 763 in cytosol, 554 in nucleolus and 651 in nucleoplasm. The majority of sncRNAs in cytosol are tRNAs (45%), followed by miRNAs (15%). Although tRNAs were enriched in the cytosol (Fisher's one-sided test P-value < 0.001), they were abundant in all four cell compartments (Supplementary Table S9). This is compatible with tRNA biogenesis, which comprises early processing in the nucleolus and later processing in the nucleoplasm before export to the cytoplasm (44). In contrast, miRNA clusters appeared almost exclusively in the cytosol (Fisher's one-sided test P-value < 0.001) and were coherently grouped into large clusters (Figure 4A and Supplementary Table S9). On the other hand, snoRNAs were enriched in the nucleolus (Fisher’s one-sided test P-value < 0.01), accounting for 38% of the found profiles. Interestingly, snoRNAs were also enriched in the chromatin compartment (Fisher’s one-sided test P-value <0.001) accounting for 23% of the sncRNAs found there, suggesting new candidates for their recognized role on establishing open chromatin domains (45). Finally, snRNAs and cuRNAs appeared at low frequency in most compartments (Supplementary Table S9). We applied SeRPeNT diffproc operation for each pair of compartments, using fold-change ≥ 2.5 and P-value < 0.01. A large proportion of snoRNAs showed differential processing from the nucleus and nucleolus, where they exert their function, to the rest of cellular compartments (Figure 4B). On the other hand, only four of the cuRNAs identified showed expression in at least two compartments, nucleolus and cytosol, and three of them showed differential processing. Overall, tRNAs showed the largest proportion of differentially processed profiles between the cytosol and the different nuclear compartments (Figure 4B and Supplementary Table S10). Many of these tRNAs showed a more prominent processing in the cytosol from the 30–35 nt part of their 3′ part (Figure 4C and Supplementary Figure S12), also called tRNA halves (46,47).

Figure 4. — Differential processing across ENCODE cell compartments. (A) Representation of clusters containing five or more sncRNAs across all four ENCODE cell compartments. The size of the points represents the number of sncRNAs from the extended annotation contained in the cluster. The normalized entropy (y-axis) represents the purity of a cluster (Supplementary Materials and Methods), the lower the entropy, the higher the purity of the cluster. (B) Proportion of profiles from the extended annotation that are differentially processed between cellular compartments separated by non-coding RNA family (y-axis). Numbers at the top of the bars represent the total number of profiles detected in both compartments. (C) Representation of the read profiles for the tRNA-Leu-AAG transfer RNA showing abundant processing of the 3′-half in the cytosol compared to the chromatin compartment. The plot represents the number of reads per nucleotide in the same scale for each compartment.

Experimental validation of novel short non-coding RNAs

To validate our findings, we decided to test experimentally two of the newly predicted sncRNAs, and two known sncRNAs with new predicted processing patterns. From the 79 predicted miRNAs not present in GENCODE, 51 of them were not present in previous sncRNA compendia (6,37); and from these, 5 had a hit to an Rfam family, with 4 confirmed as potential miRNA precursors using FOMmiR (Figure 2 and Supplementary Table S6). From these four predicted miRNAs, there were three intronic (see Figure 2), and only one of them was near an annotated miRNA (MIR301A). Since miRNAs often appear in tandem, we considered this new miRNA to be a good candidate for testing. The predicted miRNA, chr17:57228820–57228919:-, had a match to the Rfam family RF00906 and was located in an intron of SKA2, a gene relevant for chromosome segregation during mitosis (48). We used four different cell lines for validation: SH-SY5Y, MCF-7, MCF-10A and HeLa-S3 (Supplementary Tables S11 and 12; Supplementary Methods). Using sequence specific primers we detected expression of this miRNA by qPCR in HeLa-S3 and SH-SY5Y cells (Figure 5 and Supplementary Figure S13). Additionally, we detected this miRNA with SeRPeNT using sncRNA-seq from the same SH-SY5Y and MCF-7 cells used for experimental validation (49), as well as using sncRNA-seq from ENCODE for HeLa-S3 (Figure 5).

Figure 5. — Experimental validation of novel sncRNAs. Experimental validation of four predicted sncRNAs in the cell lines HeLa-S3, MCF-7, MCF-10A and SH-SY5Y. We tested a predicted miRNA (miRNA), a miRNA predicted to derive from the H/ACA snoRNA SNORA3 (sno-miRNA), a clustered uncharacterized RNA (cuRNA) and a processed tRNA (p-tRNA). For each sncRNA and each cell line, we indicate whether it was detected by SeRPeNT (black circle), whether its measured expression was RPM (reads per million) >1 (black square), and whether it was validated by qPCR (black star), or in gray color otherwise. RPM values were calculated as the average of two small RNA-seq replicates from for the same SH-SY5Y, MCF-10A and MCF-7 cells, and from an ENCODE HeLa-S3 cells. RPM values and qPCR values in ΔCt scale are given in Supplementary Tables S11 and 12. The qPCR experiment was evaluated by comparing each RNA expression with respect to the expression of the endogenous control U6 snRNA in each sample.

We also tested experimentally a clustered-uncharacterized RNA (cuRNA). We predicted 142 cuRNAs based on clusters with 3 or more uncharacterized sncRNAs, expressed in at least one ENCODE cell line and not clustering with any labeled profile (Supplementary Table S6). From these 142 cuRNAs, only one (chr10:75526203–75526253:+), which we had detected in SK-N-SH and IMR90 cells, showed significant decrease in expression upon the independent knock-outs of DROSHA, DICER and XPO5. We thus decided to test this one experimentally. This cuRNA was predicted by SeRPeNT to be lowly expressed in HeLa-S3 and SH-SY5Y, but it was only detected by qPCR in HeLa-S3 (Figure 5; Supplementary Figure S13 and Tables S11 and 12).

Furthermore, we wanted to validate SeRPeNT’s ability to discover new processing patterns of known sncRNAs. Using SeRPeNT clustering method we were able to detect eight snoRNAs annotated by GENCODE with miRNA-like processing profiles (Supplementary Table S8). From these snoRNAs, six were not reported before in the literature: SNORD116, SNORA57, SNORD14C, SNORD26, SNORD60 and SNORA3, and only SNORA3 had a precursor confirmed with FOMmiR. We chose the miRNA-like profile from this H/ACA snoRNA SNORA3 (chr16:2846409–2846473:-) (sno-miRNA in Figure 5) for experimental validation. Using sequence specific primers, we detected expression of this miRNA by qPCR in HeLa-S3 and SH-SY5Y cell lines (Figure 5; Supplementary Figure S13 and Tables S11 and 12). This miRNA was also detected with SeRPeNT using sncRNA-seq data for the same SH-SY5Y cells and from ENCODE HeLa-S3 cells (Figure 5).

Finally, we decided to test the ability of SeRPeNT to identify known sncRNAs with differential processing. Our analysis across cell compartments showed pervasive differential processing for sncRNAs, and especially for tRNAs between cytosol and nuclear compartments (Supplementary Tables S9 and 10). Moreover, we identified 3′ end differential processing in multiple tRNAs, including tRNA-Lys tRNA-His and tRNA-Leu across all compartments (Supplementary Figure S12). Since processed tRNA, also known as tRNA fragments, are potentially relevant for disease (10,50), we decided to validate one of these cases. We chose tRNA-His (chr1:145396847–145396952:-) (p-tRNA), which was predicted to be cytosol-specific in K562 cells and had differential processing with respect to the nucleolus and chromatin compartments (Supplementary Table S10 and Figure S12). We validated this p-tRNA in all cell lines used for testing. Moreover, we detected the p-tRNA with SeRPeNT using sncRNA-seq from the same SH-SY5Y, MCF-7 and MCF-10 cells used for experimental testing (49) and from ENCODE HeLa-S3 cells (Figure 5; Supplementary Figures S13 and 14).

DISCUSSION

SeRPeNT provides a fast and accurate method to identify known and novel sncRNAs exploiting read profiles from stranded size-selected RNA sequencing data. SeRPeNT does not depend on the annotation granularity of databases and avoids many drawbacks inherent to sequence and secondary structure based methods, which may be affected by post-transcriptional modifications or limited by the reliability of structure determination. Here we have shown that by capturing the post-transcriptional processing that is specific to each sncRNA family, read profiles provide functional information independently of sequence or structure. In particular, a number of known snoRNAs and tRNAs clustered with miRNAs according to their profiles. Beyond the known cases, we detected new candidates of this dual behaviour. It remains to be determined whether these new sncRNAs can indeed function as miRNAs and associate with AGO2 (51). It is possible that they compete with more abundant miRNAs to be loaded on the RNA-induced silencing complex; hence they might become more prominent in specific cellular conditions. Incidentally, many sncRNAs increase expression measured from the sequencing of AGO2-associated reads in DICER1 knocked-down cells (data not shown), suggesting a repression by DICER1 (34) or an association to alternative biogenesis pathways (40). We also expect that our dynamic-time warping algorithm can account for the heterogeneity in the processing miRNAs (52) and other sncRNAs. Although we have used known sncRNAs to label the clustered profiles, SeRPeNT does not need any annotation to cluster profiles and therefore is capable to derive sncRNA families from newly sequenced organisms for which no phylogenetically close annotation exists.

We have generated an extended annotation for human that includes hundreds of previously unannotated sncRNAs from known classes. These included new miRNAs, which we validated comparing to known families, confirming the structure of the precursor, and by measuring their expression dependence with the miRNA biogenesis machinery. We further observed the frequent differential processing of sncRNAs across cell compartments, especially for tRNAs. As differential processing of tRNAs has been associated to disease (53–55), the observed patterns may be indicative of relevant cellular processes that are worth investigating further.

We also detected 131 new sncRNAs that could not be labeled, which we named clustered uncharacterized RNAs (cuRNAs), and which are not present in current sncRNA catalogs, hence could correspond to novel sncRNA species. Although cuRNAs did not show frequent differential processing across cell compartments, they showed dependencies on the miRNA processing machinery and overlapped with CAGE tags or lncRNAs; suggesting similar mechanisms of biogenesis. The role of lncRNAs as possible general precursors of multiple types of sncRNAs in fact suggests new possible ways to classify lncRNAs beyond the current proposed frameworks (56). A subset of lncRNAs may act as precursors of a wide variety of sncRNAs, including those from known families. Another possibility is that cuRNAs, and perhaps some of the known sncRNAs, are processed through other mechanisms like back-splicing to give rise to circular RNAs (57). In any case, cuRNAs conform a small fraction from all the known classes of sncRNAs, indicating that there might be a very limited number of new sncRNA species.

We validated our approach by obtaining experimental evidence for the expression of four predicted sncRNAs from four different classes: one intronic miRNA, a snoRNA-derived miRNA, a processed tRNA and a cuRNA. Although we could experimentally validate the specific expression of these sncRNAs, we did not always find an agreement between the experimental validation and the detection by SeRPNT in the same cells. Some of the filters used for SeRPeNT might have been too strict, thereby limiting our level of detection. Nonetheless, the validation of these new sncRNAs demonstrates SeRPeNT’s ability to detect RNA species that are experimentally reproducible. Further analyses and validations will be required to capture the extent and variability of the processing of these RNAs across multiple conditions.

We envision a wide variety of future applications of SeRPeNT, including the fast identification and differential processing of non-coding RNAs from size-selected RNA-sequencing from tumor biopsies, circulating tumor cells, or exosomes, as well as the rapid discovery and characterization of non-coding RNAs families in multiple organisms. SeRPeNT differential processing operation can also be powerful at, for instance, discovering RNAs that are differentially processed in tumor cells, thus generating biomarkers and potential drug targets. In summary, SeRPeNT provides a fast, easy to use and memory efficient software for the discovery and characterization of known and novel classes of sncRNAs.

Supplementary Material

Supplementary Data

Click here for additional data file.^{(1.7MB, zip)}

ACKNOWLEDGEMENTS

We are thankful to D. Gautheret for useful discussions.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online.

FUNDING

MINECO (to E.E., A.P.); FEDER [BIO2014–52566-R to E.E., A.P.]; Consolider RNAREG [CSD2009–00080 to E.E., A.P.]; AGAUR [SGR2014–1121 to E.E., A.P.]; European ITN Network RNP-Net [ID:289007 to E.E., A.P.]; Sandra Ibarra Foundation for Cancer [FSI2013 to E.E., A.P.]; MINECO [BIO2011–26205 to R.G., A.P., SAF2014–60551-R to E.M., J.P.A.]; National Human Genome Research Institute of the National Institutes of Health [U54HG007004 to R.G., A.P.]; MINECO Centro de Excelencia Severo Ochoa 2013–2017 [SEV-2012–0208 to R.G., A.P.]; Research Programme on Biomedical Informatics (GRIB), which is member of ELIXIR-Excelerate of the European Union Horizon 2020 Programme 2014–2020 [676559 to E.E., A.P., I.D.]; ELIXIR-Excelerate, European Union Horizon 2020 Spanish National Bioinformatics Institute (INB) [PT13/0001/0023 to E.E., A.P., I.D.]. Funding for open access charge: European ITN Network RNP-Net [ID:289007] and MINECO-FEDER [BIO2014–52566-R].

Conflict of interest statement. None declared.

REFERENCES

1. Morris K.V., Mattick J.S.. The rise of regulatory RNA. Nat. Rev. Genet. 2014; 15:423–437. [DOI] [PMC free article] [PubMed] [Google Scholar]
2. Agirre E., Eyras E.. Databases and resources for human small non-coding RNAs. Human Genomics. 2011; 5:192–199. [DOI] [PMC free article] [PubMed] [Google Scholar]
3. Ge P., Zhang S.. Computational analysis of RNA structures with chemical probing data. Methods. 2015; 79:60–66. [DOI] [PMC free article] [PubMed] [Google Scholar]
4. Ding Y., Kwok C.K., Tang Y., Bevilacqua P.C., Assmann S.M.. Genome-wide profiling of in vivo RNA structure at single-nucleotide resolution using structure-seq. Nat. Protoc. 2015; 10:1050–1066. [DOI] [PubMed] [Google Scholar]
5. Vickers K.C., Roteta L.A., Hucheson-Dilks H., Han L., Guo Y.. Mining diverse small RNA species in the deep transcriptome. Trends Biochem. Sci. 2015; 40:4–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
6. Leung Y.Y., Kuksa P.P., Amlie-Wolf A., Valladares O., Ungar H., Kannan S., Gregory B.D., Wang L.. DASHR: database of small human noncoding RNAs. Nucleic Acids Res. 2015; 44:1–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
7. Ritchie W., Legendre M., Gautheret D.. RNA stem-loops: to be or not to be cleaved by RNAse III. RNA. 2007; 13:457–462. [DOI] [PMC free article] [PubMed] [Google Scholar]
8. Friedländer M.R., MacKowiak S.D., Li N., Chen W., Rajewsky N.. MiRDeep2 accurately identifies known and hundreds of novel microRNA genes in seven animal clades. Nucleic Acids Res. 2012; 40:37–52. [DOI] [PMC free article] [PubMed] [Google Scholar]
9. Siegfried N.A., Busan S., Rice G.M., Nelson J.A.E., Weeks K.M.. RNA motif discovery by SHAPE and mutational profiling (SHAPE-MaP). Nat. Methods. 2014; 11:959–965. [DOI] [PMC free article] [PubMed] [Google Scholar]
10. Lee Y.S., Shibata Y., Malhotra A., Dutta A.. A novel class of small RNAs: tRNA-derived RNA fragments (tRFs). Genes Dev. 2009; 23:2639–2649. [DOI] [PMC free article] [PubMed] [Google Scholar]
11. Macias S., Plass M., Stajuda A., Michlewski G., Eyras E., Cáceres J.F.. DGCR8 HITS-CLIP reveals novel functions for the Microprocessor. Nat. Struct. Mol. Biol. 2012; 19:760–766. [DOI] [PMC free article] [PubMed] [Google Scholar]
12. Chen C.J., Heard E.. Small RNAs derived from structural non-coding RNAs. Methods. 2013; 63:76–84. [DOI] [PubMed] [Google Scholar]
13. Griffiths-Jones S., Hui J.H.L., Marco A., Ronshaugen M.. MicroRNA evolution by arm switching. EMBO Rep. 2011; 12:172–177. [DOI] [PMC free article] [PubMed] [Google Scholar]
14. Ha M., Kim V.N.. Regulation of microRNA biogenesis. Nat. Rev. Mol. Cell Biol. 2014; 15:509–524. [DOI] [PubMed] [Google Scholar]
15. Ender C., Krek A., Friedländer M.R., Beitzinger M., Weinmann L., Chen W., Pfeffer S., Rajewsky N., Meister G.. A Human snoRNA with MicroRNA-Like Functions. Mol. Cell. 2008; 32:519–528. [DOI] [PubMed] [Google Scholar]
16. Kishore S., Khanna A., Zhang Z., Hui J., Balwierz P.J., Stefan M., Beach C., Nicholls R.D., Zavolan M., Stamm S.. The snoRNA MBII-52 (SNORD 115) is processed into smaller RNAs and regulates alternative splicing. Hum. Mol. Genet. 2010; 19:1153–1164. [DOI] [PMC free article] [PubMed] [Google Scholar]
17. Brameier M., Herwig A., Reinhardt R., Walter L., Gruber J.. Human box C/D snoRNAs with miRNA like functions: expanding the range of regulatory RNAs. Nucleic. Acids. Res. 2011; 39:675–686. [DOI] [PMC free article] [PubMed] [Google Scholar]
18. Li Z., Ender C., Meister G., Moore P.S., Chang Y., John B.. Extensive terminal and asymmetric processing of small RNAs from rRNAs, snoRNAs, snRNAs, and tRNAs. Nucleic Acids Res. 2012; 40:6787–6799. [DOI] [PMC free article] [PubMed] [Google Scholar]
19. Dobin A., Davis C.A., Schlesinger F., Drenkow J., Zaleski C., Jha S., Batut P., Chaisson M., Gingeras T.R.. STAR: Ultrafast universal RNA-seq aligner. Bioinformatics. 2013; 29:15–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
20. Schulze S.K., Kanwar R., Gölzenleuchter M., Therneau T.M., Beutler A.S.. SERE: single-parameter quality control and sample comparison for RNA-Seq. BMC Genomics. 2012; 13:524. [DOI] [PMC free article] [PubMed] [Google Scholar]
21. Sakoe H., Chiba S.. Dynamic Programming Algorithm Optimization for Spoken Word Recognition. IEEE Trans. Acoust. 1978; 26:43–49. [Google Scholar]
22. Kruskal J.B., Liberman M.. The symmetric time-warping problem: from continuous to discrete. Time Warps, String Ed. Macromol. Theory Pract. Seq. Com. 1983; Addison-Wesley. [Google Scholar]
23. Rodriguez A., Laio A.. Clustering by fast search and find of density peaks. Science (80-.). 2014; 344:1492–1496. [DOI] [PubMed] [Google Scholar]
24. Cheng Y. Mean shift, mode seeking, and clustering. IEEE Trans. Pattern Anal. Mach. Intell. 1995; 17:790–799. [Google Scholar]
25. Wang S., Wang D., Li C., Li Y., Ding G.. Clustering by fast search and find of density peaks with data field. Chin. J. Electron. 2016; 25:397–402. [Google Scholar]
26. Videm P., Rose D., Costa F., Backofen R.. BlockClust: efficient clustering and classification of non-coding RNAs from short read RNA-seq profiles. Bioinformatics. 2014; 30:274–282. [DOI] [PMC free article] [PubMed] [Google Scholar]
27. Harrow J., Frankish A., Gonzalez J.M., Tapanari E., Diekhans M., Kokocinski F., Aken B.L., Barrell D., Zadissa A., Searle S. et al. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res. 2012; 22:1760–1774. [DOI] [PMC free article] [PubMed] [Google Scholar]
28. Langenberger D., Bermudez-Santana C., Hertel J., Hoffmann S., Khaitovich P., Stadler P.F.. Evidence for human microRNA-offset RNAs in small RNA sequencing data. Bioinformatics. 2009; 25:2298–2301. [DOI] [PubMed] [Google Scholar]
29. Fasold M., Langenberger D., Binder H., Stadler P.F., Hoffmann S.. DARIO: a ncRNA detection and analysis tool for next-generation sequencing experiments. Nucleic Acids Res. 2011; 39:W112–W117. [DOI] [PMC free article] [PubMed] [Google Scholar]
30. Li S.C., Tsai K.W., Pan H.W., Jeng Y.M., Ho M.R., Li W.H.. MicroRNA 3’ end nucleotide modification patterns and arm selection preference in liver tissues. BMC Syst. Biol. 2012; 6:S14. [DOI] [PMC free article] [PubMed] [Google Scholar]
31. Pundhir S., Gorodkin J.. Differential and coherent processing patterns from small RNAs. Sci. Rep. 2015; 5:12062. [DOI] [PMC free article] [PubMed] [Google Scholar]
32. Consortium, E.P. Bernstein B.E., Birney E., Dunham I., Green E.D., Gunter C., Snyder M.. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012; 489:57–74. [DOI] [PMC free article] [PubMed] [Google Scholar]
33. Shen W., Chen M., Wei G., Li Y.. MicroRNA prediction using a fixed-order markov model based on the secondary structure pattern. PLoS One. 2012; 7:e48236. [DOI] [PMC free article] [PubMed] [Google Scholar]
34. Rybak-Wolf A., Jens M., Murakawa Y., Herzog M., Landthaler M., Rajewsky N.. A variety of dicer substrates in human and C. elegans. Cell. 2014; 159:1153–1167. [DOI] [PubMed] [Google Scholar]
35. Nawrocki E.P., Burge S.W., Bateman A., Daub J., Eberhardt R.Y., Eddy S.R., Floden E.W., Gardner P.P., Jones T.A., Tate J. et al. Rfam 12.0: updates to the RNA families database. Nucleic Acids Res. 2015; 43:D130–D137. [DOI] [PMC free article] [PubMed] [Google Scholar]
36. Nawrocki E.P., Eddy S.R.. Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics. 2013; 29:2933–2935. [DOI] [PMC free article] [PubMed] [Google Scholar]
37. Friedländer M.R., Lizano E., Houben A.J.S., Bezdan D., Báñez-Coronel M., Kudla G., Mateu-Huertas E., Kagerbauer B., González J., Chen K.C. et al. Evidence for the biogenesis of more than 1,000 novel human microRNAs. Genome Biol. 2014; 15:R57. [DOI] [PMC free article] [PubMed] [Google Scholar]
38. FANTOM Consortium and the RIKEN PMI and CLST (DGT) Forrest A.R.R., Kawaji H., Rehli M., Baillie J.K., de Hoon M.J.L., Haberle V., Lassmann T., Kulakovskiy I.V., Lizio M. et al. A promoter-level mammalian expression atlas. Nature. 2014; 507:462–470. [DOI] [PMC free article] [PubMed] [Google Scholar]
39. Venkatesh T., Suresh P.S., Tsutsumi R.. TRFs: miRNAs in disguise. Gene. 2016; 579:133–138. [DOI] [PubMed] [Google Scholar]
40. Kim Y.-K., Kim B., Kim V.N.. Re-evaluation of the roles of DROSHA, Exportin 5, and DICER in microRNA biogenesis. Proc. Natl. Acad. Sci. U.S.A. 2016; 113:E1881–E1889. [DOI] [PMC free article] [PubMed] [Google Scholar]
41. Bevilacqua V., Gioia U., Di Carlo V., Tortorelli A.F., Colombo T., Bozzoni I., Laneve P., Caffarelli E.. Identification of linc-NeD125, a novel long non coding RNA that hosts miR-125b-1 and negatively controls proliferation of human neuroblastoma cells. RNA Biol. 2015; 12:1323–1337. [DOI] [PMC free article] [PubMed] [Google Scholar]
42. Ballarino M., Cazzella V., D’Andrea D., Grassi L., Bisceglie L., Cipriano A., Santini T., Pinnarò C., Morlando M., Tramontano A. et al. Novel long noncoding RNAs (lncRNAs) in myogenesis: a miR-31 overlapping lncRNA transcript controls myoblast differentiation. Mol. Cell. Biol. 2015; 35:728–736. [DOI] [PMC free article] [PubMed] [Google Scholar]
43. Wilusz J.E., Freier S.M., Spector D.L.. 3′ end processing of a long nuclear-retained noncoding RNA yields a tRNA-like cytoplasmic RNA. Cell. 2008; 135:919–932. [DOI] [PMC free article] [PubMed] [Google Scholar]
44. Kirchner S., Ignatova Z.. Emerging roles of tRNA in adaptive translation, signalling dynamics and disease. Nat. Rev. Genet. 2014; 16:98–112. [DOI] [PubMed] [Google Scholar]
45. Schubert T., Pusch M.C., Diermeier S., Benes V., Kremmer E., Imhof A., Längst G.. Df31 protein and snoRNAs maintain accessible higher-order structures of chromatin. Mol. Cell. 2012; 48:434–444. [DOI] [PubMed] [Google Scholar]
46. Cole C., Sobala A., Lu C., Thatcher S.R., Bowman A., Brown J.W.S., Green P.J., Barton G.J., Hutvagner G.. Filtering of deep sequencing data reveals the existence of abundant Dicer-dependent small RNAs derived from tRNAs. RNA. 2009; 15:2147–2160. [DOI] [PMC free article] [PubMed] [Google Scholar]
47. Telonis A.G., Loher P., Honda S., Jing Y., Palazzo J., Kirino Y., Rigoutsos I.. Dissecting tRNA-derived fragment complexities using personalized transcriptomes reveals novel fragment classes and unexpected dependencies. Oncotarget. 2015; 6:24797–24822. [DOI] [PMC free article] [PubMed] [Google Scholar]
48. Hanisch A., Silljé H.H.W., Nigg E.A.. Timely anaphase onset requires a novel spindle and kinetochore complex comprising Ska1 and Ska2. EMBO J. 2006; 25:5504–5515. [DOI] [PMC free article] [PubMed] [Google Scholar]
49. Allo M., Agirre E., Bessonov S., Bertucci P., Gomez Acuna L., Buggiano V., Bellora N., Singh B., Petrillo E., Blaustein M. et al. Argonaute-1 binds transcriptional enhancers and controls constitutive and alternative splicing in human cells. Proc. Natl. Acad. Sci. U.S.A. 2014; 111:15622–15629. [DOI] [PMC free article] [PubMed] [Google Scholar]
50. Anderson P., Ivanov P.. tRNA fragments in human health and disease. FEBS Lett. 2014; 588:4297–4304. [DOI] [PMC free article] [PubMed] [Google Scholar]
51. Kishore S., Gruber A.R., Jedlinski D.J., Syed A.P., Jorjani H., Zavolan M.. Insights into snoRNA biogenesis and processing from PAR-CLIP of snoRNA core proteins and small RNA sequencing. Genome Biol. 2013; 14:R45. [DOI] [PMC free article] [PubMed] [Google Scholar]
52. Starega-Roslan J., Krol J., Koscianska E., Kozlowski P., Szlachcic W.J., Sobczak K., Krzyzosiak W.J.. Structural basis of microRNA length variety. Nucleic Acids Res. 2011; 39:257–268. [DOI] [PMC free article] [PubMed] [Google Scholar]
53. Honda S., Loher P., Shigematsu M., Palazzo J.P., Suzuki R., Imoto I., Rigoutsos I., Kirino Y.. Sex hormone-dependent tRNA halves enhance cell proliferation in breast and prostate cancers. Proc. Natl. Acad. Sci. U.S.A. 2015; 112:E3816–E3825. [DOI] [PMC free article] [PubMed] [Google Scholar]
54. Chen Q.Q., Yan M., Cao Z., Li X., Zhang Y.Y., Shi J., Feng G.-H.G.-H., Peng H., Zhang X., Zhang Y.Y. et al. Sperm tsRNAs contribute to intergenerational inheritance of an acquired metabolic disorder. Science (80-.). 2015; 7977:1–8. [DOI] [PubMed] [Google Scholar]
55. Sharma U., Conine C.C., Shea J.M., Boskovic A., Derr A.G., Bing X.Y., Belleannee C., Kucukural A., Serra R.W., Sun F. et al. Biogenesis and function of tRNA fragments during sperm maturation and fertilization in mammals. Science. 2015; 351:391–396. [DOI] [PMC free article] [PubMed] [Google Scholar]
56. St. Laurent G., Wahlestedt C., Kapranov P.. The landscape of long noncoding RNA classification. Trends Genet. 2015; 31:249–251. [DOI] [PMC free article] [PubMed] [Google Scholar]
57. Lasda E., Parker R.. Circular RNAs: diversity of form and function. RNA. 2014; 20:1829–1842. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Data

Click here for additional data file.^{(1.7MB, zip)}

[B1] 1. Morris K.V., Mattick J.S.. The rise of regulatory RNA. Nat. Rev. Genet. 2014; 15:423–437. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B2] 2. Agirre E., Eyras E.. Databases and resources for human small non-coding RNAs. Human Genomics. 2011; 5:192–199. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B3] 3. Ge P., Zhang S.. Computational analysis of RNA structures with chemical probing data. Methods. 2015; 79:60–66. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B4] 4. Ding Y., Kwok C.K., Tang Y., Bevilacqua P.C., Assmann S.M.. Genome-wide profiling of in vivo RNA structure at single-nucleotide resolution using structure-seq. Nat. Protoc. 2015; 10:1050–1066. [DOI] [PubMed] [Google Scholar]

[B5] 5. Vickers K.C., Roteta L.A., Hucheson-Dilks H., Han L., Guo Y.. Mining diverse small RNA species in the deep transcriptome. Trends Biochem. Sci. 2015; 40:4–7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B6] 6. Leung Y.Y., Kuksa P.P., Amlie-Wolf A., Valladares O., Ungar H., Kannan S., Gregory B.D., Wang L.. DASHR: database of small human noncoding RNAs. Nucleic Acids Res. 2015; 44:1–7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B7] 7. Ritchie W., Legendre M., Gautheret D.. RNA stem-loops: to be or not to be cleaved by RNAse III. RNA. 2007; 13:457–462. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B8] 8. Friedländer M.R., MacKowiak S.D., Li N., Chen W., Rajewsky N.. MiRDeep2 accurately identifies known and hundreds of novel microRNA genes in seven animal clades. Nucleic Acids Res. 2012; 40:37–52. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B9] 9. Siegfried N.A., Busan S., Rice G.M., Nelson J.A.E., Weeks K.M.. RNA motif discovery by SHAPE and mutational profiling (SHAPE-MaP). Nat. Methods. 2014; 11:959–965. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B10] 10. Lee Y.S., Shibata Y., Malhotra A., Dutta A.. A novel class of small RNAs: tRNA-derived RNA fragments (tRFs). Genes Dev. 2009; 23:2639–2649. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B11] 11. Macias S., Plass M., Stajuda A., Michlewski G., Eyras E., Cáceres J.F.. DGCR8 HITS-CLIP reveals novel functions for the Microprocessor. Nat. Struct. Mol. Biol. 2012; 19:760–766. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B12] 12. Chen C.J., Heard E.. Small RNAs derived from structural non-coding RNAs. Methods. 2013; 63:76–84. [DOI] [PubMed] [Google Scholar]

[B13] 13. Griffiths-Jones S., Hui J.H.L., Marco A., Ronshaugen M.. MicroRNA evolution by arm switching. EMBO Rep. 2011; 12:172–177. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B14] 14. Ha M., Kim V.N.. Regulation of microRNA biogenesis. Nat. Rev. Mol. Cell Biol. 2014; 15:509–524. [DOI] [PubMed] [Google Scholar]

[B15] 15. Ender C., Krek A., Friedländer M.R., Beitzinger M., Weinmann L., Chen W., Pfeffer S., Rajewsky N., Meister G.. A Human snoRNA with MicroRNA-Like Functions. Mol. Cell. 2008; 32:519–528. [DOI] [PubMed] [Google Scholar]

[B16] 16. Kishore S., Khanna A., Zhang Z., Hui J., Balwierz P.J., Stefan M., Beach C., Nicholls R.D., Zavolan M., Stamm S.. The snoRNA MBII-52 (SNORD 115) is processed into smaller RNAs and regulates alternative splicing. Hum. Mol. Genet. 2010; 19:1153–1164. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B17] 17. Brameier M., Herwig A., Reinhardt R., Walter L., Gruber J.. Human box C/D snoRNAs with miRNA like functions: expanding the range of regulatory RNAs. Nucleic. Acids. Res. 2011; 39:675–686. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B18] 18. Li Z., Ender C., Meister G., Moore P.S., Chang Y., John B.. Extensive terminal and asymmetric processing of small RNAs from rRNAs, snoRNAs, snRNAs, and tRNAs. Nucleic Acids Res. 2012; 40:6787–6799. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B19] 19. Dobin A., Davis C.A., Schlesinger F., Drenkow J., Zaleski C., Jha S., Batut P., Chaisson M., Gingeras T.R.. STAR: Ultrafast universal RNA-seq aligner. Bioinformatics. 2013; 29:15–21. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B20] 20. Schulze S.K., Kanwar R., Gölzenleuchter M., Therneau T.M., Beutler A.S.. SERE: single-parameter quality control and sample comparison for RNA-Seq. BMC Genomics. 2012; 13:524. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B21] 21. Sakoe H., Chiba S.. Dynamic Programming Algorithm Optimization for Spoken Word Recognition. IEEE Trans. Acoust. 1978; 26:43–49. [Google Scholar]

[B22] 22. Kruskal J.B., Liberman M.. The symmetric time-warping problem: from continuous to discrete. Time Warps, String Ed. Macromol. Theory Pract. Seq. Com. 1983; Addison-Wesley. [Google Scholar]

[B23] 23. Rodriguez A., Laio A.. Clustering by fast search and find of density peaks. Science (80-.). 2014; 344:1492–1496. [DOI] [PubMed] [Google Scholar]

[B24] 24. Cheng Y. Mean shift, mode seeking, and clustering. IEEE Trans. Pattern Anal. Mach. Intell. 1995; 17:790–799. [Google Scholar]

[B25] 25. Wang S., Wang D., Li C., Li Y., Ding G.. Clustering by fast search and find of density peaks with data field. Chin. J. Electron. 2016; 25:397–402. [Google Scholar]

[B26] 26. Videm P., Rose D., Costa F., Backofen R.. BlockClust: efficient clustering and classification of non-coding RNAs from short read RNA-seq profiles. Bioinformatics. 2014; 30:274–282. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B27] 27. Harrow J., Frankish A., Gonzalez J.M., Tapanari E., Diekhans M., Kokocinski F., Aken B.L., Barrell D., Zadissa A., Searle S. et al. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res. 2012; 22:1760–1774. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B28] 28. Langenberger D., Bermudez-Santana C., Hertel J., Hoffmann S., Khaitovich P., Stadler P.F.. Evidence for human microRNA-offset RNAs in small RNA sequencing data. Bioinformatics. 2009; 25:2298–2301. [DOI] [PubMed] [Google Scholar]

[B29] 29. Fasold M., Langenberger D., Binder H., Stadler P.F., Hoffmann S.. DARIO: a ncRNA detection and analysis tool for next-generation sequencing experiments. Nucleic Acids Res. 2011; 39:W112–W117. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B30] 30. Li S.C., Tsai K.W., Pan H.W., Jeng Y.M., Ho M.R., Li W.H.. MicroRNA 3’ end nucleotide modification patterns and arm selection preference in liver tissues. BMC Syst. Biol. 2012; 6:S14. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B31] 31. Pundhir S., Gorodkin J.. Differential and coherent processing patterns from small RNAs. Sci. Rep. 2015; 5:12062. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B32] 32. Consortium, E.P. Bernstein B.E., Birney E., Dunham I., Green E.D., Gunter C., Snyder M.. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012; 489:57–74. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B33] 33. Shen W., Chen M., Wei G., Li Y.. MicroRNA prediction using a fixed-order markov model based on the secondary structure pattern. PLoS One. 2012; 7:e48236. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B34] 34. Rybak-Wolf A., Jens M., Murakawa Y., Herzog M., Landthaler M., Rajewsky N.. A variety of dicer substrates in human and C. elegans. Cell. 2014; 159:1153–1167. [DOI] [PubMed] [Google Scholar]

[B35] 35. Nawrocki E.P., Burge S.W., Bateman A., Daub J., Eberhardt R.Y., Eddy S.R., Floden E.W., Gardner P.P., Jones T.A., Tate J. et al. Rfam 12.0: updates to the RNA families database. Nucleic Acids Res. 2015; 43:D130–D137. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B36] 36. Nawrocki E.P., Eddy S.R.. Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics. 2013; 29:2933–2935. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B37] 37. Friedländer M.R., Lizano E., Houben A.J.S., Bezdan D., Báñez-Coronel M., Kudla G., Mateu-Huertas E., Kagerbauer B., González J., Chen K.C. et al. Evidence for the biogenesis of more than 1,000 novel human microRNAs. Genome Biol. 2014; 15:R57. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B38] 38. FANTOM Consortium and the RIKEN PMI and CLST (DGT) Forrest A.R.R., Kawaji H., Rehli M., Baillie J.K., de Hoon M.J.L., Haberle V., Lassmann T., Kulakovskiy I.V., Lizio M. et al. A promoter-level mammalian expression atlas. Nature. 2014; 507:462–470. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B39] 39. Venkatesh T., Suresh P.S., Tsutsumi R.. TRFs: miRNAs in disguise. Gene. 2016; 579:133–138. [DOI] [PubMed] [Google Scholar]

[B40] 40. Kim Y.-K., Kim B., Kim V.N.. Re-evaluation of the roles of DROSHA, Exportin 5, and DICER in microRNA biogenesis. Proc. Natl. Acad. Sci. U.S.A. 2016; 113:E1881–E1889. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B41] 41. Bevilacqua V., Gioia U., Di Carlo V., Tortorelli A.F., Colombo T., Bozzoni I., Laneve P., Caffarelli E.. Identification of linc-NeD125, a novel long non coding RNA that hosts miR-125b-1 and negatively controls proliferation of human neuroblastoma cells. RNA Biol. 2015; 12:1323–1337. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B42] 42. Ballarino M., Cazzella V., D’Andrea D., Grassi L., Bisceglie L., Cipriano A., Santini T., Pinnarò C., Morlando M., Tramontano A. et al. Novel long noncoding RNAs (lncRNAs) in myogenesis: a miR-31 overlapping lncRNA transcript controls myoblast differentiation. Mol. Cell. Biol. 2015; 35:728–736. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B43] 43. Wilusz J.E., Freier S.M., Spector D.L.. 3′ end processing of a long nuclear-retained noncoding RNA yields a tRNA-like cytoplasmic RNA. Cell. 2008; 135:919–932. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B44] 44. Kirchner S., Ignatova Z.. Emerging roles of tRNA in adaptive translation, signalling dynamics and disease. Nat. Rev. Genet. 2014; 16:98–112. [DOI] [PubMed] [Google Scholar]

[B45] 45. Schubert T., Pusch M.C., Diermeier S., Benes V., Kremmer E., Imhof A., Längst G.. Df31 protein and snoRNAs maintain accessible higher-order structures of chromatin. Mol. Cell. 2012; 48:434–444. [DOI] [PubMed] [Google Scholar]

[B46] 46. Cole C., Sobala A., Lu C., Thatcher S.R., Bowman A., Brown J.W.S., Green P.J., Barton G.J., Hutvagner G.. Filtering of deep sequencing data reveals the existence of abundant Dicer-dependent small RNAs derived from tRNAs. RNA. 2009; 15:2147–2160. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B47] 47. Telonis A.G., Loher P., Honda S., Jing Y., Palazzo J., Kirino Y., Rigoutsos I.. Dissecting tRNA-derived fragment complexities using personalized transcriptomes reveals novel fragment classes and unexpected dependencies. Oncotarget. 2015; 6:24797–24822. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B48] 48. Hanisch A., Silljé H.H.W., Nigg E.A.. Timely anaphase onset requires a novel spindle and kinetochore complex comprising Ska1 and Ska2. EMBO J. 2006; 25:5504–5515. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B49] 49. Allo M., Agirre E., Bessonov S., Bertucci P., Gomez Acuna L., Buggiano V., Bellora N., Singh B., Petrillo E., Blaustein M. et al. Argonaute-1 binds transcriptional enhancers and controls constitutive and alternative splicing in human cells. Proc. Natl. Acad. Sci. U.S.A. 2014; 111:15622–15629. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B50] 50. Anderson P., Ivanov P.. tRNA fragments in human health and disease. FEBS Lett. 2014; 588:4297–4304. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B51] 51. Kishore S., Gruber A.R., Jedlinski D.J., Syed A.P., Jorjani H., Zavolan M.. Insights into snoRNA biogenesis and processing from PAR-CLIP of snoRNA core proteins and small RNA sequencing. Genome Biol. 2013; 14:R45. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B52] 52. Starega-Roslan J., Krol J., Koscianska E., Kozlowski P., Szlachcic W.J., Sobczak K., Krzyzosiak W.J.. Structural basis of microRNA length variety. Nucleic Acids Res. 2011; 39:257–268. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B53] 53. Honda S., Loher P., Shigematsu M., Palazzo J.P., Suzuki R., Imoto I., Rigoutsos I., Kirino Y.. Sex hormone-dependent tRNA halves enhance cell proliferation in breast and prostate cancers. Proc. Natl. Acad. Sci. U.S.A. 2015; 112:E3816–E3825. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B54] 54. Chen Q.Q., Yan M., Cao Z., Li X., Zhang Y.Y., Shi J., Feng G.-H.G.-H., Peng H., Zhang X., Zhang Y.Y. et al. Sperm tsRNAs contribute to intergenerational inheritance of an acquired metabolic disorder. Science (80-.). 2015; 7977:1–8. [DOI] [PubMed] [Google Scholar]

[B55] 55. Sharma U., Conine C.C., Shea J.M., Boskovic A., Derr A.G., Bing X.Y., Belleannee C., Kucukural A., Serra R.W., Sun F. et al. Biogenesis and function of tRNA fragments during sperm maturation and fertilization in mammals. Science. 2015; 351:391–396. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B56] 56. St. Laurent G., Wahlestedt C., Kapranov P.. The landscape of long noncoding RNA classification. Trends Genet. 2015; 31:249–251. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B57] 57. Lasda E., Parker R.. Circular RNAs: diversity of form and function. RNA. 2014; 20:1829–1842. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

The discovery potential of RNA processing profiles

Amadís Pagès

Ivan Dotu

Joan Pallarès-Albanell

Eulàlia Martí

Roderic Guigó

Eduardo Eyras

Abstract

INTRODUCTION

MATERIALS AND METHODS

Figure 1.

Profile building from aligned short RNA-Seq reads

sncRNA profile clustering

Profile annotation

Differential processing analysis

Accuracy analysis and experimental validation

Software

RESULTS

Fast and accurate discovery of small non-coding RNAs

An extended annotation of small non-coding RNAs in human

Figure 2.

SeRPeNT uncovers new RNAs with potential miRNA-like function

Figure 3.

Pervasive differential processing of non-coding RNAs between cell compartments

Figure 4.

Experimental validation of novel short non-coding RNAs

Figure 5.

DISCUSSION

Supplementary Material

ACKNOWLEDGEMENTS

SUPPLEMENTARY DATA

FUNDING

REFERENCES

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases