Abstract
Next-generation sequencing technologies generate vast catalogs of short RNA sequences from which to mine microRNAs. However, such data must be vetted to appropriately categorize microRNA precursors and interpret their evolution. A recent study annotated hundreds of microRNAs in three Drosophila species on the basis of singleton reads of heterogeneous length1. Our multi-million read datasets indicated that most of these were not substrates of RNAse III cleavage, and comprised many mRNA degradation fragments. We instead identified a distinct and smaller set of novel microRNAs supported by confident cloning signatures, including a high proportion of evolutionarily nascent mirtrons. Our data support a much lower rate in the emergence of lineage-specific microRNAs than previously inferred1, with a net flux of ~1 microRNA/million years of Drosophilid evolution.
RESULTS AND DISCUSSION
microRNAs (miRNAs) are ~21–24 nucleotide (nt) regulatory RNAs derived from RNAse III-mediated cleavages of hairpin transcripts. Conserved miRNA genes are differentiated from bulk hairpins in that their terminal loops diverge more quickly than their stems2. However, species-specific miRNAs are not confidently identified solely using computational methods, since 100,000s of Drosophila1,3–5 and human loci6 are plausible as miRNA hairpins. Instead, next-generation sequencing has become the preferred method to discern recently-evolved miRNAs (e.g. Supplementary Table 1). Such data often reveals heterogeneous size and read patterns with respect to predicted hairpins (Fig. 1 and Supplementary Fig. 1), indicating that only a subset of hairpins with reads are substrates of Dicer-driven biogenesis pathways.
Figure 1.
Putative miRNA loci annotated on the basis of single reads and plausible hairpin structures (center box), exhibit distinct patterns when more reads are available. Reads may be distributed throughout the inferred hairpin, have heterogeneous sizes, and/or pair as duplexes lacking 3′ overhangs (top left and right); these cannot be annotated as miRNAs. Confident miRNAs exhibit multiply cloned 21–24nt reads with relatively fixed 5′ ends (bottom left). With sufficient sequencing, it is usually possible to identify the partner miRNA* species, as well as other byproducts of miRNA biogenesis such as terminal loops or species flanking the pre-miRNA hairpin (bottom right).
Lu and colleagues reported ~900 novel miRNAs sequenced from three Drosophila species—D. melanogaster (Dme), D. simulans (Dsi), and D. pseudoobscura (Dps)—including ~400 annotated under “high stringency” criteria1. They concluded that evolutionarily transient miRNA genes are continually born and lost, with only a small proportion fixed across Drosophilid radiation. Inspection of these annotations showed that 35 Dme/47 Dsi/30 Dps “novel” miRNAs corresponded to orthologs of 50 distinct genes, whose cloning and evolutionary characteristics were previously described4,5,7 (miRBase 10.1 and Supplementary Tables 2–4). Another locus comprising multiple tandem hairpins corresponded to hairpin RNA hp-CG4068, which generates endo-siRNAs8. We sought to understand the nature of the remaining hundreds of miRNA candidates, whose abundant numbers were used to estimate a birthrate of ~12 miRNAs per million years (Myr) of Drosophilid evolution1.
We mapped ~15 million Dme reads from various stages, including ~1 million from adult heads4,9. Compared to their frequency in ~16,000 Dme adult head reads1, we expected our data to contain 50-fold more reads for genuine miRNAs, and likely more given that many are expressed in multiple stages and tissues. This was true for the 35 Dme miRBase 10.1 loci designated “novel” by Lu and colleagues1. These were represented by 1247 reads in their data (~34 reads/locus, although six loci were cloned only 2–3 times and twelve were singletons), but by ~320,000 reads in our data (~8,800 reads/locus). The remaining 23 non-miRBase loci were severely under-represented in our data, with nine cloned 1–6 times and nine that were not recovered at all (Supplementary Table 2).
For non-miRBase loci cloned in our dataset, the reads mapped incoherently across the predicted hairpin and/or across adjacent genomic regions (Fig. 1 and Supplementary Fig. 1). They also exhibited broadly heterogeneous sizes, contrasting with the restricted lengths of genuine Drosophila miRNAs (Fig. 2). Although some loci were conserved, the most abundant reads mapped to an rRNA (Lu-mir-2018) and two snoRNAs (Lu-mir-2324 and Lu-mir-2213); 16/20 remaining loci derived from mRNAs (Supplementary Table 2). Therefore, instances of conservation were attributable to protein-coding or functional RNA status, and not with evolutionary dynamics characteristic of genuine miRNAs (Supplementary Fig. 1A, B). Similar analysis revealed that hundreds of novel Dsi and Dps miRNA candidates1 mapped to syntenic exons of Dme protein-coding transcripts (Supplementary Tables 3–6), with reads spanning the 18–28 nt window used for cloning (Fig. 2). We conclude that the prior miRNA annotations1 include a high proportion of RNA fragments derived from degradation of diverse mRNAs and some ncRNAs.
Figure 2.
Size analysis of Dme/Dsi/Dps miRNAs annotated as novel by Lu and colleagues1. We used Solexa data from diverse Dme samples and Dsi or Dps embryos to assess the distribution of read sizes from annotated “novel” loci that were orthologous to miRBase 10.1 genes (A–C) or lacked miRBase orthologs (D–F). The top panels indicate that genuine Drosophila miRNAs produce a characteristic range of 21–24 nt RNAs, with preference for 22 nt. The other candidate miRNAs, nearly all of which were annotated on the basis of single reads1, exhibited broadly heterogeneous sizes in our larger datasets; note that we did not recover any reads for many of these loci.
We therefore wished to gauge miRNA flux using independent small RNA data. We and others annotated 147 miRNA loci (including 14 mirtrons) from ~1 million Dme reads4,5,7, but >17 million additional reads9,10 yielded only 14 novel miRNA loci and the confident antisense locus Dme-mir-307-as (Supplementary Tables 7 and 8). Because of this sequencing depth, we could assign confident miRNA cloning patterns to novel loci, and most had star reads despite their evolutionary transience (Supplementary Figs. 1C and 2). Curiously, 5/14 were mirtrons, a high proportion consistent with the proposition that mirtrons generally evolve more quickly than canonical miRNAs11,12. Four miRBase loci that did not meet confident read criteria are discussed in the Supplementary Text and Supplementary Fig. 3.
We next mapped 3,712,683 and 3,318,524 small RNAs from mixed embryos of Dsi and Dps, respectively, and 3,442,645 reads from Dps heads (Supplementary Table 1). These comprise 50–270 times the data earlier used to estimate miRNA diversity1. Our datasets contained abundant reads for previously rare or uncloned Dsi and Dps orthologs of miRBase 10.1 loci (Supplementary Tables 9 and 11), consistent with the expectation that genuine miRNAs are recovered proportionally to sequencing depth. These reads yielded 11 novel Dsi miRNAs including 5 mirtrons (two of which were orthologous to novel Dme mirtrons mir-2489 and mir-2494), and >88 distinct novel Dps miRNAs including 17 mirtrons (Supplementary Figs. 4 and 5, see also Supplementary Text for discussion of potentially duplicate Dps loci). Of these, the overlap with the annotations of Lu and colleagues was minimal, only 4/261 Dsi and 19/598 Dps loci1. Conversely, nearly 300 of their reported Dsi/Dps miRNAs had zero reads in our data, and ~100 had fewer than 5 reads (Supplementary Tables 3 and 4). Therefore, deep sequencing failed to validate most of the previously reported miRNAs1, and the minimal overlap in annotated loci highlights that the differences were not due to applying more “conservative” vs. more “lenient” cutoffs to a common set of hairpins.
Although rates of miRNA flux in different species of Drosophila are expected to be reasonably similar, Lu and colleagues annotated vastly different numbers of novel miRNAs in Dme, Dsi and Dps1. Our annotations from multi-million read datasets instead yielded numbers of novel genes that were consistent with the relative ancestries of these species. We recovered few novel miRNAs in the highly related Dme and Dsi sister species, but many more in the distant Dps species (Fig. 3); most novel Dps genes were conserved only in its related sister D. persimilis (Dper). The overall flux in miRNA repertoire was consistent: 45–47 miRNAs cloned from Dme or Dsi have no obscura group homologs, while 88 miRNAs were cloned from Dps for which no melanogaster group homologs exist. Assuming ~55 Myr divergence between these clades as before1, this puts the rate of Drosophilid miRNA flux at 0.82–1.6 genes/Myr, far less than the ~12 genes/Myr earlier proposed1. Notably, the tally of species-restricted mirtrons, relative to canonical miRNAs, was disproportionately high in all three species (Supplementary Figs. 2, 4 and 5). Therefore, mirtrons and canonical miRNAs exhibit distinct evolutionary dynamics for emergence and fixation, even though they generate functionally identical regulatory RNAs.
Figure 3.
Flux of Drosophilid miRNA genes assessed using multi-million read datasets in three species. Small RNAs were cloned from the species in dark green, detailed orthology of novel miRNAs annotated in this study were determined with respect to species in light green. Since not all loci are necessarily present in all of the species in a given branch, some values are designated as approximate (~). For example, the Dps and Dper genomes coordinately lack orthologs of 9 miRNA genes present in the Sophophoran and/or proto-Drosophilid ancestor (Supplementary Fig. 7), these are considered to have died in the obscura lineage. Amongst the dozen Dme- or Dsi-cloned miRNAs for which aligning sequences were found only in their closest sister species, only a few have cloned small RNA evidence from multiple species thus far (e.g. the highly species-restricted miR-2489 was cloned from both Dme and Dsi). We do not exclude that some of these miRNAs may actually prove to be “unique” to a single species. Note that mirtrons comprise a small fraction of deeply-conserved set of miRNAs, but a much higher fraction of lineage-restricted miRNAs in various Drosophilid genomes.
The net rate of miRNA flux is a combination of genes born and genes lost, but distinguishing birth from death is challenging. For example, the ~70 miRNAs shared by Dps and Dper, for which no orthologs exist in any melanogaster group genomes, might have been “born” in the ancestor to the obscura lineage or “died” in the ancestor to the melanogaster lineage. In addition, the poorer state of the Dsi genome obfuscates whether it truly lost some genes (nine pan-Drosophilid miRNAs have gaps or errors in DroSim1, Supplementary Fig. 6). However, we could confidently judge that 9 miRNAs distributed in 4 operons died in the obscura group, since they were ancestrally conserved but absent from both Dps and Dper (Fig. 3 and Supplementary Fig. 7). Conversely, the small number of Dme/Dsi/Dps miRNAs lacking aligned sequences in any other sequenced species are good candidates for “newly-born” miRNAs. Their identification supports the concept that substrates occasionally arise de novo from neutral evolution of transcripts with hairpin character1.
Nascent miRNAs might exhibit cleavage register that is more imprecise than for well-conserved miRNAs, but their biogenesis via RNAse III enzymes indicates that duplexes of appropriate size should be cloned with sufficient sequencing, as observed in our data (Fig. 2 and Supplementary Figs. 1–5). Similar to previous analyses1, we assigned singleton reads to hundreds of candidate hairpins (Supplementary HTML tables), and these evolve neutrally with respect to hairpin character. However, as few are likely to ever be validated as genuine substrates for miRNA biogenesis, their evolution is not generally germane to miRNA evolution. Since the majority of metazoan euchromatin is actively transcribed13,14, deep sequencing is expected to recover small RNAs constituting degradation fragments from many incidental hairpins. This is the case even when using protocols that select for 5′ phosphates (and presumably against degradation fragments), since endogenous kinases can phosphorylate arbitrary short RNAs15. The existence of exceptionally diverse populations of piRNAs and endo-siRNAs6 further highlights that non-miRNA reads can be abundant in total RNA libraries. In conclusion, confident annotation of miRNAs from deep sequence yields unified rates of canonical miRNA and mirtron evolution amongst the Drosophilids, and provides evidence for only a limited set of species-specific miRNAs in this genus.
Supplementary Material
Acknowledgments
Katsutomo Okamura assisted with library amplification. Dsi and Dps Solexa sequencing was performed at the BC Genome Sciences Centre. This work was supported by VIDI grant and the European Commission Sixth Framework Programme Integrated Project SIROCCO (LSHG-CT-2006-037900) to EB, HHMI support to GJH, and grants from the V Foundation for Cancer Research, the Sidney Kimmel Cancer Foundation, the Alfred Bressler Scholars Fund and the National Institutes of Health (R01-GM083300 and U01-HG004261) to ECL. Competing financial interest: EB is an employee of InteRNA Genomics B.V.
Footnotes
Accession numbers. The three small RNA datasets from Dsi embryos and Dps embryos and heads were submitted to NCBI GEO under series GSE13677.
Author contributions. The study was designed by ECL and EB. ASF and ECL prepared small RNA libraries. EH, MR and GJH performed Dme sequencing. NL and EB analyzed small RNA data. ECL and EB wrote the paper.
See also the following Supplemental Files available online.
3 Supplementary HTML documents accessible at http://www.internagenomics.com/public/dros0811
References
- 1.Lu J, et al. Nat Genet. 2008;40:351–5. doi: 10.1038/ng.73. [DOI] [PubMed] [Google Scholar]
- 2.Lai EC. Curr Biol. 2003;13:R925–36. doi: 10.1016/j.cub.2003.11.017. [DOI] [PubMed] [Google Scholar]
- 3.Lai EC, Tomancak P, Williams RW, Rubin GM. Genome Biol. 2003;4:R42.1–R42.20. doi: 10.1186/gb-2003-4-7-r42. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Ruby JG, et al. Genome Res. 2007;17:1850–1864. doi: 10.1101/gr.6597907. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Stark A, et al. Genome Res. 2007;17:1865–1879. doi: 10.1101/gr.6593807. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Bentwich I, et al. Nat Genet. 2005;37:766–70. doi: 10.1038/ng1590. [DOI] [PubMed] [Google Scholar]
- 7.Sandmann T, Cohen SM. PLoS ONE. 2007;2:e1265. doi: 10.1371/journal.pone.0001265. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Okamura K, Lai EC. Nat Rev Mol Cell Biol. 2008;9:673–8. doi: 10.1038/nrm2479. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Chung WJ, Okamura K, Martin R, Lai EC. Current Biology. 2008;18:795–802. doi: 10.1016/j.cub.2008.05.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Seitz H, Ghildiyal M, Zamore PD. Curr Biol. 2008;18:147–51. doi: 10.1016/j.cub.2007.12.049. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Okamura K, et al. Cell. 2007;130:89–100. doi: 10.1016/j.cell.2007.06.028. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Ruby JG, Jan CH, Bartel DP. Nature. 2007;448:83–6. doi: 10.1038/nature05983. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Manak JR, et al. Nat Genet. 2006;38:1151–8. doi: 10.1038/ng1875. [DOI] [PubMed] [Google Scholar]
- 14.Kapranov P, et al. Science. 2007;316:1484–8. doi: 10.1126/science.1138341. [DOI] [PubMed] [Google Scholar]
- 15.Aravin A, et al. Dev Cell. 2003;5:337–350. doi: 10.1016/s1534-5807(03)00228-4. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.



