Skip to main content
Bioinformatics logoLink to Bioinformatics
. 2014 Aug 26;30(23):3412–3413. doi: 10.1093/bioinformatics/btu573

MIRPIPE: quantification of microRNAs in niche model organisms

Carsten Kuenne 1,, Jens Preussner 1,, Mario Herzog 1, Thomas Braun 2, Mario Looso 1,*
PMCID: PMC4816158  PMID: 25165094

Abstract

Summary: MicroRNAs (miRNAs) represent an important class of small non-coding RNAs regulating gene expression in eukaryotes. Present algorithms typically rely on genomic data to identify miRNAs and require extensive installation procedures. Niche model organisms lacking genomic sequences cannot be analyzed by such tools. Here we introduce the MIRPIPE application enabling rapid and simple browser-based miRNA homology detection and quantification. MIRPIPE features automatic trimming of raw RNA-Seq reads originating from various sequencing instruments, processing of isomiRs and quantification of detected miRNAs versus public- or user-uploaded reference databases.

Availability and implementation: The Web service is freely available at http://bioinformatics.mpi-bn.mpg.de. MIRPIPE was implemented in Perl and integrated into Galaxy. An offline version for local execution is also available from our Web site.

Contact: Mario.Looso@mpi-bn.mpg.de

Supplementary information: Supplementary data are available at Bioinformatics online.

1 INTRODUCTION

MicroRNAs (miRNAs) are ∼22 nucleotides long and belong to the class of snRNAs. miRNAs serve numerous roles in downregulation (transcript degradation and sequestering, translational suppression) of gene expression. In general, miRNAs are assumed to regulate multiple targets although effects on most targets are relatively mild (Ameres and Zamore, 2013). Isoforms of miRNAs resulting from imperfect digestion by Drosha and Dicer or RNA editing by specialized enzymes represent a challenge during the determination of correct read counts following RNASeq. miRNA variants might be ‘silent’ (3′ modification = isomiR) or target different mRNAs when changes occur in the 5′ regions responsible for complementary binding. Sequence differences between taxa hamper quantification, especially if no genomic or miRNA data for the studied organism are available as in the case of niche model organisms. Sequencing errors can further complicate the identification of miRNAs. These effects should ideally be addressed on multiple levels, including (i) isomiR handling, (ii) enforcement of a minimum read copy number, (iii) clustering of similar miRNAs, (iv) removal of relatively low abundance reads and (v) optional fallback to the miRNA family level. A set of applications in the field attempts to cover these features, but a Web-based tool able to unify all functionalities that can be applied to any organism is critically missing (An et al., 2013; Giurato et al., 2013; Wen et al., 2012).

2 WORKFLOW AND FEATURES

MIRPIPE uses open-source binary tools including the FASTX-Toolkit (Pearson et al., 1997), Cutadapt (Martin, 2011) and BLASTN (Boratyn et al., 2013) for data processing. The pipeline was integrated into a Galaxy-based Web platform (Goecks et al., 2010) but is also available for download and local execution. A detailed explanation of the algorithm can be found in Supplementary File S1.

The workflow starts with the upload of a compressed FASTQ/FASTA read file using the Web interface or the MIRPIPE FTP server. MIRPIPE can fully process raw reads originating from Illumina, 454, IonTorrent or Sanger sequencing instruments including adapter trimming. A reference FASTA database bearing mature target miRNAs can either be selected from current miRBase release (Griffiths-Jones et al., 2006) or can be uploaded by the user.

The raw reads are processed to optionally remove an adapter sequence and trim for a minimum quality (default Q20). Only reads of the desired size range are selected to limit the pool to mature miRNAs. Duplicate reads are collapsed to decrease the number of necessary homology searches, and only those sequences represented by a minimum count are kept for further analyses. This measure is intended to remove unique reads, which frequently denote sequencing errors or miRNA variations that are expressed near to the detection limit, preventing reliable quantification.

Read counts from isomiRs of the same miRNA are combined. These isomiR read sequences may only differ by the 3′ end and are thus putatively encoded by the same gene. Only one nucleotide may differ between two sequences to be counted as isoforms of the same miRNA, and only the longest sequence is used in the next step to further reduce the amount of homology searches.

The remaining read sequences are used for a sequence similarity search versus the chosen reference database of miRNAs.

Mature reference miRNAs and their precursors are optionally collated by name on the family level to remove redundancy introduced by organism prefixes and precursor suffixes (e.g. bta-miR-200a, oan-miR-200a-3p > miR-200a).

For each read, the detected reference miRNA families are scored based on the minimum number of mismatches. If a read matched equally well versus multiple miRNA families, the respective families are joined by single linkage clustering. This permits the inclusion of reads that cannot be matched uniquely, as well as the exact measurement of the fraction of ambiguously matching reads and thereby the reliability of the match. By default, only those read sequences that are at least 5% as abundant as the most abundant sequence per miRNA family cluster are denoted to reduce the impact of sequencing errors and increase robustness.

Counts per miRNA family and cluster are presented for download. Currently, MIRPIPE can complete a job within 0.5–2 h, depending on the file size and the selected reference database. MIRPIPE quantification results can be directly used for differential expression analysis using other tools on our Web site (Supplementary File S1).

3 BENCHMARK

To demonstrate congruent results for MIRPIPE, we compared the results with an miRNA analysis based on a genomic mapping of Illumina HiSeq reads (Lawless et al., 2013). We identified 96% of the published miRNAs (Supplementary File S2). Furthermore, we compared our tool with a similar approach without the need for a genome sequence by analyzing a public dataset (Zhang et al., 2013) with the CLC Genomics Workbench. In this case, 84% of the miRNAs were identical (Supplementary File S2).

Finally, we checked the predictive efficiency of our tool for niche models based on a human RNA-Seq dataset (Lappalainen et al., 2013). Here, we performed MIRPIPE versus a reference database bearing (i) the complete miRBase, (ii) miRBase excluding human miRNAs and (iii) miRBase excluding miRNAs of all primates. The absence of closely related reference sequences resulted in only a marginal loss of sensitivity for MIRPIPE, indicating its aptitude for the analysis of niche model organisms (Fig. 1, Supplementary File S2).

Fig. 1.

Fig. 1.

A) Comparison of MIRPIPE prediction on two gold standard (GS) datasets using full miRBase and reduced miRBase as reference set. (B) Spearman correlation of absolute counts of GS and MIRPIPE. (C) The large number of GS-specific miRNA identifications is caused by low counts, filtered out by MIRPIPE default parameters

Funding: Excellence Cluster Cardio-Pulmonary System (ECCPS); MPI for Heart and Lung Research.

Conflict of interest: none declared.

Supplementary Material

Supplementary Data

REFERENCES

  1. Ameres SL, Zamore PD. Diversifying microRNA sequence and function. Nat. Rev. Mol. Cell Biol. 2013;14:475–488. doi: 10.1038/nrm3611. [DOI] [PubMed] [Google Scholar]
  2. An J, et al. miRDeep*: an integrated application tool for miRNA identification from RNA sequencing data. Nucleic Acids Res. 2013;41:727–737. doi: 10.1093/nar/gks1187. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Boratyn GM, et al. BLAST: a more efficient report with usability improvements. Nucleic Acids Res. 2013;41:W29–W33. doi: 10.1093/nar/gkt282. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Giurato G, et al. iMir: an integrated pipeline for high-throughput analysis of small non-coding RNA data obtained by smallRNA-Seq. BMC Bioinformatics. 2013;14:362. doi: 10.1186/1471-2105-14-362. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Goecks J, et al. Galaxy: a comprehensive approach for supporting accessible, reproducible: transparent computational research in the life sciences. Genome Biol. 2010;11:R86. doi: 10.1186/gb-2010-11-8-r86. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Griffiths-Jones S, et al. miRBase: microRNA sequences, targets and gene nomenclature. Nucleic Acids Res. 2006;34:D140–D144. doi: 10.1093/nar/gkj112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Lappalainen T, et al. Transcriptome and genome sequencing uncovers functional variation in humans. Nature. 2013;501:506–511. doi: 10.1038/nature12531. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Lawless N, et al. Next generation sequencing reveals the expression of a unique miRNA profile in response to a gram-positive bacterial infection. PLoS One. 2013;8:e57543. doi: 10.1371/journal.pone.0057543. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J. 2011;17:10–12. [Google Scholar]
  10. Pearson WR, et al. Comparison of DNA sequences with protein sequences. Genomics. 1997;46:24–36. doi: 10.1006/geno.1997.4995. [DOI] [PubMed] [Google Scholar]
  11. Wen M, et al. miREvo: an integrative microRNA evolutionary analysis platform for next-generation sequencing experiments. BMC Bioinformatics. 2012;13:140. doi: 10.1186/1471-2105-13-140. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Zhang Z, et al. High-efficiency RNA cloning enables accurate quantification of miRNA expression by deep sequencing. Genome Biol. 2013;14:R109. doi: 10.1186/gb-2013-14-10-r109. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Data

Articles from Bioinformatics are provided here courtesy of Oxford University Press

RESOURCES