Summary
MicroRNAs (miRNAs) are small regulatory RNAs processed from stem-loop regions of primary transcripts (pri-miRNAs), with the choice of stem-loops for initial processing largely determining what becomes a miRNA. To identify sequence and structural features influencing this choice, we determined cleavage efficiencies of >50,000 variants of three human pri-miRNAs, focusing on the regions intractable to previous high-throughput analyses. Our analyses revealed a mismatched motif in the basal stem region, a preference for maintaining or improving base-pairing throughout the remainder of the stem, and a narrow stem-length preference of 35±1 base pairs. Incorporating these features with previously identified features, including three primary-sequence motifs, yielded a unifying model defining mammalian pri-miRNAs, in which motifs help orient processing and increase efficiency, with the presence of more motifs compensating for structural defects. This model enables generation of artificial pri-miRNAs, designed de novo, without reference to any natural sequence, yet processed more efficiently than natural pri-miRNAs.
Graphical Abstract

Introduction
MicroRNAs (miRNAs) are ~22 nucleotide (nt) RNAs that pair to sites within mRNAs to target these transcripts for post-transcriptional repression (Bartel, 2009). In the canonical biogenesis pathway, miRNA genes are transcribed as pri-miRNAs, which contain at least one region that folds back on itself to form a hairpin that is cleaved by the Microprocessor complex, a heterotrimeric complex consisting of one molecule of the Drosha endonuclease and two molecules of its cofactor, DGCR8 (Lee et al., 2003; Denli et al., 2004; Gregory et al., 2004; Han et al., 2004; Nguyen et al., 2015). Drosha-catalyzed cleavage releases the pre-miRNA hairpin, which is exported to the cytoplasm and cleaved by Dicer (Grishok et al., 2001; Hutvagner et al., 2001) to produce a ~20 bp RNA duplex with 2-nt 3′ overhangs on each end (Lee et al., 2003; Lim et al., 2003b). One strand of the RNA duplex is ultimately loaded into the Argonaute protein, forming the core of the silencing complex (Hutvagner and Zamore, 2002; Mourelatos et al., 2002; Liu et al., 2004; Song et al., 2004).
When considering the broad diversity of miRNA hairpins, the question arises as to what features Microprocessor and its associated proteins recognize to discriminate the pri-miRNAs from the many thousands of other hairpins encoded in the human genome (Lim et al., 2003a; Bentwich et al., 2005). Mutations of mammalian pri-miRNAs have shown the importance of an unstructured apical loop of ≥10 nt (Zeng et al., 2005), pairing at the base of the pri-miRNA hairpin (Lee et al., 2003; Zeng and Cullen, 2003), which optimally extends 11 bp beyond the pre-miRNA hairpin (Han et al., 2006), and unpaired segments flanking the basal stem (Zeng and Cullen, 2005; Han et al., 2006). However, these features do not in themselves impart much discrimination, since many non-pri-miRNA hairpins also have loops ≥10 nt and basal pairing flanked by unpaired segments.
A more recent study shows that the distance from both ends of the stem influences the site of cleavage (Ma et al., 2013), which implies that the Microprocessor could prefer stems with lengths falling within a specified window. Indeed, the mean stem lengths of human pri-miRNAs are reported to be in the range of 33–35 bp (Han et al., 2006), although considerable heterogeneity in predicted stem lengths is observed. In addition, simultaneous analysis of millions of functional pri-miRNA variants shows that primary-sequence features can also contribute to pri-miRNA recognition (Auyeung et al., 2013). These features include a UG motif at the base of the pri-miRNA hairpin, a UGU/GUG motif in the apical loop (hereafter called the UGU motif), and a CNNC motif downstream of the hairpin, positioned 16–18 nt from the Drosha cut (Auyeung et al., 2013). Because of technical challenges, however, the stem region of pri-miRNA hairpins has not been examined using high-throughput approaches, leaving open the possibility that sequence or structural features (such as optimally placed bulges, wobbles, or mismatches) within this region might provide the elusive determinants needed to explain the specificity of pri-miRNA recognition in vivo. Indeed, the three known motifs seem to exert their effects in some pri-miRNAs but not others, leading to a model in which the known determinants somehow interact with additional unknown features to yield an idiosyncratic outcome.
With such a complex and incomplete model and little indication of how many more features remain undiscovered, it is perhaps not surprising that the successful de novo design of an artificial miRNA gene has not been reported. In practice, this lack of knowledge is circumvented when building Drosha-dependent short-hairpin RNAs (shRNAs) to be used for gene knock-down experiments, by relying on the modification of natural pri-miRNAs, typically pri-miR-30 (Zeng et al., 2002; Silva et al., 2005; Bassik et al., 2013; Fellmann et al., 2013; Knott et al., 2014; Kampmann et al., 2015). The design of such reagents de novo, without reference to a known pri-miRNA sequence, would require a more complete understanding of what a miRNA gene is. Because the requirements for Dicer processing are well understood (Zhang et al., 2004; MacRae et al., 2006; Park et al., 2011), the remaining challenge is to understand what the Microprocessor looks for as it decides which transcripts are pri-miRNAs and which are not.
Here, we develop a high-throughput strategy that uses molecular barcodes to query the region of the pri-miRNA stem that had been intractable to previous high-throughput analyses. This strategy revealed a mismatched motif near the base of the hairpin, which enhances pri-miRNA processing, a preference for pairing throughout the remainder of the stem, and a narrow preference for stem length. Integrating these features with those that had been previously identified, we had all that was needed for the reliable de novo design of functional pri-miRNAs. Indeed, the processing of these artificial pri-miRNAs was more efficient than that of any natural pri-miRNAs assayed, including the one used as a scaffold for building shRNAs. Additional experiments revealed the ability of the mismatched motif and previously described primary-sequence motifs to compensate for structural defects in the hairpin as well as their partial redundancies with each other. These insights resolved many of the complexities and seeming discrepancies of earlier studies, leading to a simplified and unifying model of what it takes to be a miRNA gene.
Results
Barcoding strategy for the analysis of pri-miRNA variants
To interrogate the stem region of pri-miRNAs, we randomized blocks of nucleotides within human pri-miR-125a, pri-miR-16-1, and pri-miR-30a (hereafter called pri-miR-125, pri-miR-16, and pri-miR-30), whose apical and flanking regions had already been thoroughly studied (Auyeung et al., 2013). Nucleotide identities within each of the 3-bp sliding windows across the stem were completely randomized, resulting in 4096 variants for each window (Figure 1A). All three pri-miRNAs contained a bulge in the stem, which was also varied with respect to its sequence and length to generate all possible bulge sequences of all lengths spanning the length of the bulge (1 or 2 nt) to no bulge (0 nt) (Table S1). For each window (and each bulge length), a pool of DNA templates was synthesized by combinatorial synthesis, and then all template pools for each pri-miRNA were combined and transcribed into RNA, with the goal of achieving near-equal representation of each variant. In this way ~50,000 variants were generated for pri-miR-125, and ~80,000 variants were generated for pri-miR-16 and pri-miR-30, with overlap of the sliding windows generating greater diversity for the latter two pools (Figure 1A).
Figure 1. Design of pri-miRNA pools and high-throughput analyses of variants. See also Figure S1.

(A) Secondary structure of the three parental pri-miRNAs. miRNA–miRNA* duplexes are colored red. Blue numbers above 5p sequences indicate randomized windows, most of which contained three nucleotides on the 5p arm and the corresponding nucleotides on the 3p arm
(B) Schematic of the protocol for generating each dictionary of barcoded variants and quantifying the amount of each variant that was in the input and that was cleaved at each site.
(C) Distribution of unique barcodes per variant in each dictionary.
(D) Distributions of reads per variant in the input sequencing.
(E) Distributions of cleavage sites for the sequenced 5′-cleavage fragments.
(F) Distributions of cleavage scores.
In previous strategies for identifying variants cleaved by the Microprocessor, the mutagenized positions all resided in one cleavage product (either all in flanking regions joined through circular permutation or all in the distal loop region), which enabled the original variants to be identified by simply sequencing of the relevant cleavage products (Auyeung et al., 2013). However, a different strategy was required for our variants because the mutagenized region spanned the cleavage sites, and thus for many variants, processing separated mutagenized residues from each other. Therefore, we devised a strategy in which each variant was linked to ~100 unique barcodes residing in the 5′ flanking region, which enabled the identity of a cleaved molecule to be inferred from the barcode sequence of its cleaved product.
The barcodes had 29–31 nt of random sequence and were appended to the hairpin regions in a primer-extension reaction that also added the T7 promoter to the pool of templates (Figure 1B). For each pool, a small fraction of extended product, containing ~10 million template molecules, was amplified, and a portion of the amplified material was sequenced to create a “dictionary” of ~10 million different barcode–variant linkages, while another portion was transcribed to generate the pool of RNA variants to be used in the experiment. The bottleneck of 10 million molecules was imposed to reduce the barcode complexity, so that most of the transcribed barcode sequences would also be in the dictionary. At the same time, the bottleneck was designed to be sufficiently large such that each of the hairpin sequences would be appended to multiple barcodes. Indeed, the dictionaries each had a median of ~120 barcodes per hairpin variant (Figures 1C and S1A), with >20 different barcodes observed for ≥99.92% of the hairpin variants. This large diversity of barcodes for nearly all hairpin variants minimized concern that an influence of certain barcode sequences on pri-miRNA processing might be misattributed to the hairpin sequence.
After brief incubation with a cell lysate from HEK293T cells over-expressing the Microprocessor (Lee and Kim, 2007), <10% of each pri-miRNA pool was processed (Figure S1B), and the 5′ cleavage products were isolated, reverse transcribed and sequenced (Figure 1B). The sequences of these products revealed the precise site of cleavage, and for those with barcodes present in the dictionary (70.4–82.2% of the sequenced cleavage fragments), the identity of the pri-miRNA variant. Barcodes from each input pool were also reverse transcribed and sequenced, which provided quantification of each variant in the input and the ability to normalize for differences in the input (Figures 1B and 1D). Comparing for each variant the number of input reads with the number of cleavage-product reads, considering only those indicating cleavage at the proper site (which comprised a large majority of reads for each pool; Figure 1E), provided a measurement of its cleavage efficiency. These measurements were normalized to that of the wild-type pri-miRNA to generate a cleavage score, and these scores were reported on a log2 scale, such that variants with cleavage efficiencies better or worse than wild-type had positive and negative scores, respectively (Figure 1F). Scores for pri-miR-30 were determined using two time points, in which 1.6% and 9% of the pool was cleaved (Figure S1B). As expected, these scores were highly correlated (Figure S1C, Pearson’s r = 0.94), showing the robustness of the approach over the range of cleavage percentages used. Results of these two time points were treated as replicates and combined for subsequent analyses. Global distributions of cleavage scores showed that most variants were cleaved less efficiently than were their wild-type counterparts, and that among the three pri-miRNAs, pri-miRNA-125 was the least sensitive to mutations, whereas pri-miRNA-16 was most frequently improved by substitutions (Figure 1F).
Structural preferences across the stem
To visualize the results, we plotted the cleavage scores of all 4096 variants within each 3-bp window on a 64 × 64 grid. Figure 2A shows a typical pattern, in which the higher scores for combinations along the diagonals indicated a preference for base pairing. We also plotted cleavage scores of all 16 possible variants within each single-base-pair window across the stem (considering only variants that were wild-type at all other positions) on 4 × 4 grids, to see how all changes at each position across each stem affected cleavage in the wild-type background (Figure S2). Again, a preference for pairing (Watson–Crick and wobbles) was often observed, even at positions that were not paired in the wild-type sequence.
Figure 2. Structural preferences across the stem. See also Figure S2.
(A) Cleavage scores for all 4096 variants of pri-miR-16 within randomized window 2 (grey shaded). Each row shows the scores of the indicated 5p trinucleotide (written 5′ to 3′), and each column shows the scores of the indicated 3p trinucleotide (written 3′ to 5′), colored according to the key (right). The asterisk marks the wild-type sequence.
(B) Base-pairing scores at each position across each of the indicated stems.
(C) Detrimental effects of maintaining or strengthening the apical pairing of pri-miR-16. Each 4 × 4 heat map shows the scores of all 16 single-bp variants at the shaded position in the context of wild-type nucleotides at all other positions, colored according to the key (below). Each asterisk marks the wild-type sequence.
(D) Beneficial effects of pairing at position 2 and of maintaining the UGU motif in the apical region of pri-miR-30. Otherwise, this panel is as in (C).
(E) The modest effects of changing or eliminating each of the bulges normally found in each of the three pri-miRNAs. The heat map shows the cleavage scores of the indicated variants in the context of wild-type nucleotides at all other positions (−, no bulge), colored according to the key (below). Because pri-miR-16 had a single-nucleotide bulge, dinucleotide variants were not tested (grey). Each asterisk marks the wild-type sequence.
To summarize the preference for pairing at each stem position, we derived a simple base-pairing score, calculated as the difference between the average cleavage scores of the six paired variants (including G–U and U–G wobbles) and the average scores of the ten mismatch variants, all in the context of the wild-type background. At all but one of the first 35 positions of the stem, a preference for pairing was observed, as indicated by base-pairing scores >0 (Figure 2B). The exception was at position 8, which is one of four positions reported to be frequently mismatched in human pri-miRNAs (Han et al., 2006). Base-pairing scores were particularly high in the basal stem (positions 1–13) of pri-miR-16 and pri-miR-30—two pri-miRNAs with the most mismatches and wobbles in the basal stem—indicating that their cleavage scores were particularly sensitive to gain or loss of a pair (positive and negative changes, respectively) in this region (Figure 2B). Maintaining pairing in the last three base positions of the pri-miR-30 stem (positions 33–35) was also particularly important (Figure 2B).
Overall, single-base-pair changes had relatively minor effects on pri-miR-125, suggesting redundancy in the features required for efficient processing of this hairpin (Figure S2). In contrast, many substitutions toward the apical end of pri-miR-16 stem improved processing (Figure S2). In particular, disrupting any of the last three base pairs strongly enhanced processing, and replacing any of these U–A or A–U pairs with stronger pairs had the opposite effect (Figure 2C), suggesting that the length of the pri-miR-16 stem, which is 39 bp (counting Watson–Crick pairs, wobbles, and mismatches, but not the single-nt bulge) was too long. In agreement with this idea, the wild-type pri-miR-30 and pri-miR125 had 35 bp stems, with either extension to 36 bp or shortening to less than 34 bp clearly disfavored (Figure S2). In contrast to the apical region of pri-miR-16, the apical region of pri-miR-30 was already near a local optimum on the fitness landscape of cleavage substrates, with preference for retention of both the pairing at the end of the stem and the previously described UGU motif (Auyeung et al., 2013) (Figure 2D). For this pri-miRNA, the most favorable single-base-pair changes repaired the mismatch in the basal stem at position 2 (Figure 2D).
Examining variants of the 1- or 2-nt bulges revealed a tendency for beneficial effects from changing their sizes (including eliminating the bulge altogether) or nucleotide composition (Figure 2E). However, these effects were modest, indicating that in the wild-type contexts of these three pri-miRNAs, small bulges were neither necessary for nor detrimental for Drosha cleavage.
A mismatched GHG motif enhances pri-miRNA processing
The consistently lower and even negative pairing scores at position 8 (Figure 2B) prompted a closer look at pairs and mismatches favored at this position and its two flanking positions. Combining results from all three pri-miRNAs, we ranked the 4096 variants at positions 7–9 based on their average cleavage scores and selected the top 1% (Table S2). When examining the frequencies of pairs and mismatches, these 41 variants all had pairs at positions 7 and 9, but mostly mismatches (particularly U–C, C–U, and G–A) at position 8 (Figure 3A). This analysis combined with nucleotide composition analysis of these top variants showed that in the 3′ (3p) arm of the hairpin, position 7 was enriched for a paired G, although the other Watson–Crick pairs and wobbles were also present, position 8 was never a G, and position 9 was enriched for a Watson–Crick paired G, although other Watson–Crick pairs were present (Figures 3A and 3B). We therefore named this the “mismatched GHG” motif (in which H is any nucleotide except G), based on this primary-sequence preference in the 3p arm and the frequent mismatch at position 8.
Figure 3. A broadly conserved mismatched GHG motif enhances pri-miRNA processing. See also Figure S3 and Tables S2 and S3.
(A) Nucleotide pairs preferred at the three positions of the mismatched GHG motif. Shown is the relative fraction of each nucleotide pair observed in the top 1% of the variants generated from randomizing positions 7–9 of the three pri-miRNAs (Table S2). For each pair the first letter indicates the 5p nucleotide, and the second letter the 3p nucleotide.
(B) Primary-sequence preferences within the mismatched GHG motif. Shown is a pLogo, which represents the nucleotide enrichment and depletion observed at the indicated positions within the top 41 variants generated from randomizing positions 7–9 (Table S2), compared to the background of all 4096 possible variants at these positions (O’Shea et al., 2013). Red lines indicate P-value threshold of 0.05.
(C) Enrichment of the mismatched GHG motif in natural miRNAs. The mismatched GHG motif was defined as a 3-bp structural element in which the first pair could be C–G or U–G, the second could be one of the seven mismatches or pairs shown in panel A, and the third could be any Watson–Crick base pair. The heat map shows the frequency of the motif observed at the indicated position within the stems of representative pri-miRNA from the indicated species (Table S3). The asterisk indicates species with a significant enrichment at position 7 (P < 0.05, one-tailed binomial test with Bonferroni correction).
(D) Increased cleavage efficiency imparted by the mismatched GHG motif. The gel (center) shows results of competitive-cleavage assays that determined the relative cleavage of pri-miR-125 variants 1–5, which had the indicated substitutions within the mismatched GHG motif (center left table). The wild-type (WT) hairpin with mismatched GHG motif at positions 7–9 (blue shading) is shown for reference (upper left). As schematized (lower left), each assay included the query variant, which generated a 39 nt labeled product, and a longer pri-miR-125 wild-type reference substrate, which generated a 69 nt labeled product. The graph (right) shows the mean relative cleavage efficiency of each variant, normalized to that of the wild-type (blue bars; error bars, s.e.m., n = 3), compared to the value determined from the high-throughput sequencing experiment (orange bars).
(E) Increased miRNA accumulation imparted by the mismatched GHG motif in HEK293T cells. The mismatched GHG motif was tested in the context of pri-miR-44.3 (bottom left), a derivative of C. elegans pri-miR-44 with a U substitution (blue) that increases processing, presumably because it destabilizes pairing beyond the basal stem and introduces a basal UG motif (Auyeung et al., 2013). The variants (center table) introduced either the mismatched GHG motif (variant 1) or control sequences. RNA blots (top right) examined miR-44 accumulation in cells for each variant when expressed as a query pri-miRNA on the same primary transcript as pri-miR-1, as schematized (top left). The graph (bottom right) plots relative levels of mature miR-44 after normalizing to the miR-1 internal reference (mean ± s.e.m., n = 3; **, P ≤ 0.01, one-tailed Student’s t-test).
When examining the presence of this mismatched GHG motif within conserved human pri-miRNAs, we observed a significant position-specific enrichment (Figure 3C). Significant enrichment was also observed in other vertebrates, as well as in fruit fly (Figure 3C). Enrichment observed in other arthropods (mosquito and water flea) did not reach statistical significance, presumably because of the smaller sets of annotated pri-miRNAs in these species (Table S3).
The wild-type sequence at positions 7–9 of pri-miR-125, a CUC on the 5p arm imperfectly paired to GCG on the 3p arm, was chosen as a representative mismatched GHG motif for further study. It contained the pairs and mismatch most frequently observed in the top variants (Figure 3A) and was among the top three variants in the overall ranking (Table S2). To validate the high-throughput results and further characterize this motif, we tested engineered variants in a competitive cleavage assay, in which cleavage was measured relative to that of an internal reference. The internal reference was wild-type pri-miR-125 with a long 5′ cleavage product, designed to be easily distinguished from that of the variants (Figure 3D). When the C–G pair at position 7 or 9, or at both positions, was flipped, in vitro cleavage efficiency decreased to 67%, 74%, and 26% respectively, and when the U–C mismatch at position 8 was changed to either a U–A or a U–G pair, the in vitro processing decreased to 66% and 45% respectively, results consistent with those determined from the sequencing data (Figure 3D). When motifs were added sequentially to a hairpin without motifs, the mismatched GHG motif, which was added first, had the greatest effect (12 fold), suggesting some redundancy among the motifs (Figure S3A).
To test whether this mismatched GHG motif enhanced pri-miRNA processing in vivo, we incorporated it into the corresponding region of derivatives of pri-miR-44, a C. elegans pri-miRNA known to be sub-optimally processed in mammalian cells (Auyeung et al., 2013), and asked whether the motif conferred more efficient processing in HEK293T cells. In this assay, the pri-miR-44 variants were each expressed on a transcript that also contained human pri-miR-1-1 (hereafter called pri-miR-1), and the relative accumulation of each mature miRNA was measured on RNA blots, using accumulation of miR-1 as an internal normalization standard (Figure 3E) (Auyeung et al., 2013). When CUC–GCG replaced GGC–CCG in C. elegans pri-miRNA-44 to create a GHG motif with a U–C mismatch at position 8, the mature miR-44 increased >2 fold (Figure 3E). Flipping the C–G pairs at positions 7 and 9, or changing the U–C mismatch to a G–C pair diminished the enhancement, consistent with our in vitro results. Similar results were observed in the other pri-miRNA contexts examined (Figures S3B and S3C).
De novo design of artificial pri-miRNAs
Our high-throughput analyses of the stem regions complemented our previous analyses of the flanking and loop regions (Auyeung et al., 2013) to provide thorough analyses of three human pri-miRNAs. Based on these analyses, the preferred Microprocessor substrate is a 35 bp hairpin flanked by single-stranded sequences with a properly positioned GHG motif in mid-basal stem and the previously described basal UG, apical UGU, and flanking CNNC motifs. Many additional, more nuanced preferences were also observed (Figure S2), raising the possibility that many additional weak features or complex combinations of features might be needed to define a miRNA. Alternatively, we might have already identified a subset of elements sufficient to define a pri-miRNA, and the additional preferences might have reflected idiosyncratic vulnerabilities of the starting pri-miRNAs, analogous to the heightened sensitivity to either mismatches in the basal stem of pri-miRNAs starting with more mismatches in this region (Figure 2B, pri-miR-16 and pri-miR-30) or strengthened pairing at the distal end of a pri-miRNA stem that is already too long (Figure 2C).
To test if we knew enough to define a miRNA gene, we designed artificial pri-miRNAs using the features of the preferred Microprocessor substrate listed above—without reference to the sequence of any known miRNA, and asked whether these hairpins could be processed. To simplify the design, we used homopolymeric U segments at the single-stranded regions near the stem and perfect Watson–Crick pairs at all paired positions of the stem (Figure 4A). At most paired positions, the primary sequence was randomly generated, the exceptions being position 1, which included the G of the UG motif, positions 7–9, which comprised the mismatched GHG motif, position 35, which comprised the first U of the apical UGU motif, and positions 14–16. Although particular Watson–Crick pairs at positions 14–16 were not favored during Drosha processing (Figure S2), the possibilities at these positions were nonetheless constrained to facilitate loading of the mature miRNA into Argonaute, which is required for miRNA stability in vivo (Winter and Diederichs, 2011). Accordingly, pairs at positions 14–16 were constrained to be A–U or U–A pairs, so that Watson–Crick pairing of these positions in the miRNA duplex would be sufficiently weak to facilitate loading of the strand from the 5p arm into Argonaute (Khvorova et al., 2003; Schwarz et al., 2003), and the pair at position 14, which included the first nucleotide of the mature miRNA, was further constrained to be a U, the most common first nucleotide of conserved mammalian miRNAs.
Figure 4. De novo designed pri-miRNAs are processed efficiently and accurately in vitro and in cells. See also Figure S4.
(A) Guidelines for de novo design of pri-miRNAs. Motif residues are highlighted (blue). PolyU segments, which disfavor pairing, and other constrained sequences, some of which favor loading into Argonaute, are purple (W = A or U), and randomly assigned residues or pairs are grey (N = A, C, G, or U).
(B) Sequences of three artificial pri-miRNAs (A1, A2 and A3) and their variants in which the motifs were disrupted (A1.1, A2.1 and A3.1, green substitutions). Motif residues are highlighted (blue), and residues of the miRNA duplex are red.
(C) In vitro cleavage efficiencies of artificial pri-miRNAs, comparing variants with and without all motifs and those with and without the mismatched GHG motif. Variants with and without all motifs are shown in (B); A1.2, A2.2, A3.2 each have the mismatched GHG motif as the only motif, and A1.3, A2.3, A3.3 each have all the motifs except the mismatched GHG motif. Plotted in blue are mean cleavage efficiencies at the correct site relative to the pri-miR-125 internal reference, determined as in Figure 3D (error bars, s.e.m, n = 3; ***, P ≤ 0.001; **, P ≤ 0.01; *, P ≤ 0.05; n.s., P > 0.05; one-tailed Student’s t-test). If miscleavage was detected, its efficiency was similarly plotted in grey. See Figure S4A for images of competitive-cleavage results.
(D) Accumulation of mature artificial miRNAs in HEK293T cells, comparing variants with and without all motifs and those with and without the mismatched GHG motif. Assays were as in Figure 3E; artificial pri-miRNA variants and evaluation of statistical significance were as in (C).
(E) miRNA yield from artificial pri-miRNAs relative to that from natural pri-miRNAs. As schematized (left), each artificial pri-miRNA was transcribed between the pri-miR-30 and pri-miR-1 internal references as the query pri-miRNA. Plotted are the relative levels of mature miRNAs, determined using quantitative RNA blots (mean ± s.e.m., n = 3). See Figure S4B for images of quantitative RNA blots.
Surprisingly, all three artificial pri-miRNAs designed according to these guidelines (Figure 4B, A1, A2, and A3) were well processed. In vitro competitive cleavage assays indicated that they were each processed 1.5–4 fold more efficiently than pri-miR-125 (Figures 4C and S4A). Removing all the motifs to yield pri-miRNAs that contained only structural features (Figure 4B, A1.1, A2.1, A3.1) reduced cleavage at the intended site ~6-fold, with appearance of miscleaved products observed for the A1 and A3 derivatives (Figure 4C), including products suggestive of unproductive cleavage (Han et al., 2006; Nguyen et al., 2015), in which the Microprocessor recognizes the stem in the opposite orientation (Figure S4A). Nonetheless, processing efficiency at the intended site was within 25–50% of that of pri-miR-125. Restoring the mismatched GHG motif (A1.2, A2.2 and A3.2) improved the processing of variants with no other motif by 2–3 fold, but removing this single motif from those with all motifs (A1.3, A2.3 and A3.3) had a statistically significant effect in only one of the three contexts (Figures 4C and S4A).
To assay processing in vivo, we expressed each of the artificial pri-miRNAs in HEK293T cells, using our bicistronic system in which accumulation of miR-1 served as an internal normalization standard (Figure 3E). The in vivo results corroborated the in vitro ones, with differences between variants being somewhat muted, although still statistically significant, compared to those observed in vitro (Figure 4D). We then used quantitative RNA blots to measure the absolute accumulation of artificial miRNAs A1, A2, and A3 in cells. When the artificial pri-miRNAs were inserted between pri-miR-30 and pri-miR-1, their mature miRNA levels accumulated to about twice that of either miR-30 or miR-1 (Figures 4E and S4B). These results indicated that our simple design produced pri-miRNAs that are processed at least as efficiently as natural pri-miRNAs, including pri-miR-30, which is commonly used as a platform for efficiently expressing shRNAs in vivo (Zeng et al., 2002; Silva et al., 2005; Bassik et al., 2013; Fellmann et al., 2013; Knott et al., 2014; Kampmann et al., 2015).
As observed for many natural pri-miRNAs, our artificial pri-miRNAs yielded mature miRNAs with some length heterogeneity (Figure 4D). Small-RNA sequencing showed that this heterogeneity was predominantly at the 3′ ends, as observed for natural miRNAs (Figure S4C). In addition, miRNAs from the 5p arm outnumbered those from the 3p arm by a factor of ~5 (Figure S4C). These results indicated that our de novo designed pri-miRNAs were each correctly processed into miRNAs, which were loaded into Argonaute with expected strand asymmetry.
Motifs rescue structural defects
Although the motifs enhanced processing of our artificial miRNAs, the versions without these motifs (A1.1, A2.1, A3.1) were processed with efficiency approaching that of natural pri-miRNAs (Figure 4), which showed that a hairpin with a 35 bp perfect stem was sufficient for recognition and cleavage, thereby illustrating the key role that structure can play in defining pri-miRNAs. However, sequences with the potential to form hairpins with perfectly paired stems of precisely 35 bp are rare in the genome, and natural pri-miRNA hairpins are mostly of other lengths and typically have mismatches and bulges in the stem. To understand better the features that define pri-miRNAs, we incorporated these structural “defects” into the A1 and A2 artificial pri-miRNAs, and asked how they were processed, with and without sequence motifs.
When the pri-miRNA stems were extended by 5 bp in the apical region (Figure 5A), the pri-miRNAs were still processed in vitro, albeit at ~50% efficiency (Figures 5B, and S5). Without the four motifs, however, cleavage was much less accurate (Figure S5), with efficiency reduced to <9% of that of the original A1 and A2 pri-miRNAs (Figure 5B). These differences observed in vitro translated to more striking differences in mature miRNA levels in vivo. With sequence motifs, miRNA levels dropped nearly 4 fold but were still within range of the miR-1 internal standard, and without sequence motifs, they dropped another 50 fold (Figure 5C). When the pri-miRNA stems were shortened by 5 bp (Figure 5A), some cleavage at the proper position occurred for the pri-miRNAs with the sequence motifs, but ~70% of the 5′ cleavage fragments were of a smaller size (Figure S5). Without the motifs, cleavage at the proper position was no longer detected, although some of the miscleaved fragment was observed for A2 derivative (Figure S5). In vivo, very little if any mature miRNA was observed from the shortened derivatives (Figure 5C), which presumably reflected reduced complementarity to the probe and the unsuitability of the properly cleaved product for Dicer cleavage, in addition to reduced cleavage by the Microprocessor. The miscleavage observed with extended and shortened stems resembled that observed for analogous derivatives of natural miRNAs, which first revealed that measurement from the ends of the stem influences the site of cleavage (Zeng et al., 2005; Han et al., 2006; Ma et al., 2013). Our results added key insight with respect to the sequence motifs, showing that these motifs are important for positioning the cleavage site of pri-miRNA hairpins that are not the optimal length, and this positioning effect, combined with enhanced processing efficiency, can increase mature miRNA levels from nearly undetectable to within range of the miR-1 internal standard.
Figure 5. Sequence motifs rescue suboptimal stem lengths.

(A) Diagrams of extension and deletion variants of A1 and A2. Otherwise, this panel is as in Figure 4B.
(B) In vitro cleavage efficiencies of the extension (40 bp) and deletion (30 bp) variants with or without motifs. Plotted are mean cleavage efficiencies relative to the pri-miR-125 internal reference, determined as in Figure 3D (error bars, s.e.m., n = 2). See Figure S5 for images of competitive-cleavage results.
(C) Accumulation of mature miRNAs from the extension and deletion variants, with or without motifs, in HEK293T cells. Assays were as in Figure 3E. Mature miRNA levels relative to co-transcribed miR-1 are indicated below each lane, reporting the mean from two biological replicates. Results of Figure 4E were used to infer the ratio of A1 and A2 accumulation relative to that of miR-1, and the other values were calculated based on this ratio.
With the goal of incorporating mismatches and bulges into the design of our artificial miRNAs, we developed a metric to score these structural defects among human pri-miRNAs. We first surveyed representative members of 186 conserved human pri-miRNA families, and tallied the occurrences of all 16 possible single base pairs, wobbles and mismatches, all four single-nucleotide bulges, and other less-frequent bulges and internal loops within their 35-bp stem regions (Table S4). On average, human pri-miRNA stems had 3.7 G–U or U–G wobbles, 2.7 1-bp mismatches, 0.9 single-nucleotide bulges, and 0.5 2-bp mismatches. The influence of each of these structural defects was then scored by taking the log2 of its frequency, relative to that of the most frequent base pair, G–C (G on the 5p arm, C on the 3p arm), which was assigned a score of 0. These scores agreed well with the analogous cleavage scores calculated from single-base-pair variants of pri-miR-125, pri-miR-16, and pri-miR-30 (Figure 6A), suggesting that the forces of natural selection acting on pri-miRNA structure were accurately reflected in the cleavage preferences observed in vitro. Summing the frequency-based scores along the 35-bp stem region of each representative pri-miRNA yielded “structure scores”, which ranged between −56 and −18, with a median of −37 (Figure S6A).
Figure 6. Motifs rescue structural defects. See also Figure S6 and Table S4.
(A) The average effects of each pair, wobble or mismatch possibility on cleavage, compared to the frequency of that possibility in natural pri-miRNAs. Cleavage effects were determined from the high-throughput results, averaging the cleavage scores calculated from single-bp variants of pri-miRNA-125, pri-miR-16, and pri-miR-30 (orange bars, left axis). The frequency of each possibility was tallied across the 35-bp stems of representative members of 186 conserved human pri-miRNA families (Table S4; purple bars, right axis).
(B) Diagrams of structural variants of A1 and A2. Motifs are highlighted (blue); miRNA duplexes are red; substituted residues are dark blue, and structure scores are in parenthesis.
(C) In vitro cleavage efficiencies of structural variants of A1 and A2, with or without motifs. Plotted are mean cleavage efficiencies relative to the pri-miR-125 internal reference, determined as in Figure 3D (error bars, s.e.m., n = 2). See Figure S6B for images of competitive-cleavage results.
(D) Accumulation of mature miRNAs from structural variants of A1 and A2, with or without motifs, in HEK293T cells. Assays were as in Figure 3E. Mature miRNA levels relative to co-transcribed miR-1 are indicated below each lane, as in Figure 5C.
(E) Diagram of extension variants of A1.12 and A2.8.
(F) In vitro cleavage efficiencies of extension variants of A1.12 and A2.8, with or without motifs. Otherwise, this panel is as in (C). See Figure S6D for images of competitive-cleavage results.
(G) Accumulation of mature miRNAs from extension variants of A1.12 and A2.8, with or without motifs, in HEK293T cells. Otherwise, this panel is as in (D).
We arbitrarily incorporated wobble pairs, mismatches and bulges into A1 and A2, keeping the mature miRNA sequence unchanged to facilitate comparisons on RNA blots, and allowing the structure score to range from −27 to −39 (Figure 6B). When retaining the sequence motifs, all of these A1 and A2 derivatives were cleaved in vitro, with efficiencies similar to or greater than that of pri-miR-125, and those with the most favorable structure scores (A1.12 and A2.10) had cleavage efficiencies similar to those of their respective parental pri-miRNAs (Figures 6C and S6B). Without motifs, these hairpins with defects were processed much less efficiently than pri-miRNA-125, with the exception of A1.12 and A2.10, which had the most favorable structure scores (Figures 6C and S6B). In vivo, mature miRNAs from A1.12 and A2.10 accumulated to levels similar to those from the parental A1 or A2 pri-miRNAs (Figures 6D and S6C). Mature miRNAs from the remaining variants accumulated to ≥4-fold lower levels, with those from hairpins with 2-nt bulges (A1.4, A1.6, A1.8, and A2.4) accumulating at >10-fold lower levels and showing enhanced dependence on the sequence motifs (Figures 6D and S6C). The variant with an internal loop spanning 3 nt on both arms (A2.6) failed to produce detectable miRNA in vivo, even when containing the sequence motifs (Figure 6B), which can be reconciled with its efficient cleavage by the Microprocessor (Figure 6C) if its internal loop prevented subsequent cleavage by Dicer.
Finally, we extended the stems of A1.12 and A2.8, two derivatives that produced miRNAs with high intracellular accumulation (Figures 6C and 6D). The extensions by one base pair had little effect in vitro and in cells, but extensions by ≥3 bp were only tolerated in variants containing the sequence motifs (Figures 6E–6G and S6D). These results reinforced our conclusion that the sequence motifs can rescue structural defects in pri-miRNAs to confer efficient processing and can contribute most in a window in which the structural defects are sufficiently severe to substantially influence processing but not so severe that they eliminate processing in the presence of the motifs.
In vivo activity of artificial miRNAs
To test further our understanding of the features required to define pri-miRNAs, we designed three new artificial pri-miRNAs and asked whether they generated mature miRNAs that function to mediate gene repression when expressed in cells. These hairpins each had a 35-bp stem, 2–4 wobble pairs, and 2–3 mismatches, and one had a single-nucleotide bulge (Figure 7A). Sequence identity within each stem was arbitrary at most positions, the exceptions being 1) the four sequence motifs, 2) the U as the first nucleotide of the mature miRNA, and 3) the use of sequences satisfying pairing-stability constraints that favor loading of the 5p species into Argonaute. To confirm that the homopolymeric U segments at the single-stranded regions near the stems of our previous artificial pri-miRNAs were not required for efficient processing, we included other nucleotides in these regions.
Figure 7. Artificial miRNAs mediate repression. See also Figure S7.
(A) Sequences of artificial pri-miRNAs A4, A5 and A6. Otherwise, this panel is as in Figure 4B.
(B) Response of cellular mRNAs upon co-expression of the indicated artificial miRNA and miR-1. Plotted are cumulative distributions of fold changes for mRNAs with the indicated sites in their 3′ UTRs. mRNAs with 3′-UTR sites to both miR-1 and the artificial miRNA were not considered. For each set of mRNAs, the number of reliably quantified distinct mRNAs is shown in parentheses, and for sets containing sites, the P value reports the significance of the difference in the fold-change distribution compared to that of the corresponding set of mRNAs without sites (one-tailed Mann–Whitney test).
When expressing these pri-miRNAs in HEK293T cells, levels of the mature artificial miRNAs were 1.5–2.5 fold greater than that of the miR-1 internal standard (Figure S7A). Moreover, small-RNA sequencing indicated that processing occurred predominantly at the designed positions (Figure S7B). After sorting transfected cells based on GFP expressed from a co-transfected plasmid, we performed RNA-seq analyses, comparing RNA from cells transfected with a bicistronic pri-miRNA plasmid to that of cells transfected with only the GFP-expression plasmid. These analyses revealed the expected miRNA-targeting effects for each of the three artificial miRNAs (Figure 7B) (Grimson et al., 2007). Indeed, the repression mediated by the artificial miRNAs appeared at least as strong as that mediated by the co-expressed miR-1 (Figure 7B).
Discussion
Our results from tens of thousands of stem variants of three pri-miRNA hairpins revealed that pairing was favored over mismatches at all but one position of the stem (Figure 2), which implied that the three human pri-miRNAs each benefitted from more pairing and were sensitive to less pairing. The benefit from more pairing had diminishing returns, however, as indicted from our analysis of the artificial pri-miRNAs. Artificial pri-miRNAs with 4 wobble pairs and 3 single-bp mismatches in their stem regions (A1.12 and A2.10) were at least as efficiently processed as either their parental pri-miRNAs with one mismatch (A1 and A2) or their derivatives with perfectly paired stems (A1.3 and A2.3) (Figures 4 and 6). In practice, the pri-miRNAs with wobbles and mismatches were much easier to clone than were those with perfectly paired stems. They might also be less likely to trigger an interferon response, although examination of RNA-seq data obtained with and without expression of artificial pri-miRNAs with more or less extensive pairing showed no evidence of an interferon response (data not shown). The advantage of greater genomic stability without compromising cleavage efficiency helps explain why all conserved pri-miRNAs have some wobbles and mismatches.
The length of the stem region, with a narrow preference for 35 ± 1 bp (including wobbles and mismatches, but not bulged nucleotides), was found to be a second important structural feature of pri-miRNAs. In addition to contributing specificity to the first step in the miRNA biogenesis pathway, the preference for a stem length of 35 bp ensures that most products of the first step have the two helical turns favored for subsequent Dicer cleavage (MacRae et al., 2006; Gu et al., 2012). The basal UG and apical UGU primary-sequence preferences at the junctions of single-stranded and double-stranded RNA regions imply that the Microprocessor recognizes either or both of these junctions at the ends of the pri-miRNA stem (Han et al., 2006; Auyeung et al., 2013). Indeed, biochemical analyses show that Drosha recognizes the basal junction and the DGCR8 dimer recognizes the apical junction (Nguyen et al., 2015). Also supporting the recognition of both junctions, results from inserting or deleting pairs within pri-miRNA stems indicate that measurements from both ends of the stem influence the site of cleavage (Ma et al., 2013). Our results showing that the efficiency of cleavage depended on a specific length of stem support the conclusion that recognition of both junctions is simultaneous and indicate that this recognition is performed by a protein complex too rigid to efficiently accommodate stems of other lengths, suggesting that the Drosha–DGCR8 heterotrimer acts as molecular calipers to measure the length of the pri-mRNA stem.
For pri-miRNAs predicted to have stems with suboptimal lengths, a stem of 35 ± 1 bp might still be achieved through disruption of extra pairing or creation of additional pairing to accommodate to Microprocessor as it binds. Indeed, we found that weakened pairing at positions 37–39 favored the cleavage of pri-miR-16, presumably because this pairing must be disrupted to accommodate Microprocessor binding. Also supporting this idea were our results at position 1 of pri-miR-30. Flanked by mismatches on both sides, this single base pair would form only transiently in the context of free RNA, yet a 3p C opposite the 5p G of the basal GU motif was preferred over the other possibilities (Figure S2), presumably because forming this lone pair at position 1 to extend the stem to 35 bp favored accommodation within the Microprocessor. The benefit of pairing at position 1 of pri-miR-30 was further supported by the benefit of creating a pair at position 2, which extended contiguous pairing to position 1 (Figure 2D).
Understanding the preference for a 35-bp stem and how some pri-miRNA derivatives might accommodate this structural feature better than others helps to reconcile seemingly contradictory results in the literature. For example, a 4-bp shift in the cleavage site observed after deleting 4 bp from the basal stem of pri-miR-16 has been interpreted as evidence that the distance from the base of the stem is more important for determining the cleavage site than is the distance from the loop (Han et al., 2006), which seems at odds with the conclusion from a study of pri-miR-30 derivatives (Zeng et al., 2005). We now realize, however, that the 4-bp deletion within the pri-miR-16 basal stem would favor a stem that incorporates rather than excludes the pairing at wild-type positions 36–39 to achieve an optimal length of 35 bp, and that the repositioned cleavage site would fall at an optimal distance from not only the base of the stem but also the loop.
The newly identified mismatched GHG motif is basal to the region that produces the miRNA duplex. Indeed, no substantial nucleotide preferences were detected in the region that produced the miRNA duplex, which, from an evolutionary perspective, would benefit the emergence of new miRNAs with any primary sequence.
The mismatched GHG motif and the three previously identified primary-sequence motifs augmented the structural features to increase both efficiency and accuracy of cleavage. As suggested by our results (Figure S4A) and also recently demonstrated for the basal UG and apical UGU (Nguyen et al., 2015), one way that the motifs increase accuracy is to break the symmetry of the single-stranded–double-stranded–single-stranded Microprocessor substrate, preventing unproductive cleavage that occurs when the substrate binds in the opposite orientation (Han et al., 2006). These four motifs exerted their greatest influence in hairpins that were suboptimal with respect to either pairing or stem length, and imparted less benefit to pri-miRNAs that already had more optimal structural features (Figure 6). Likewise, the benefit of adding an additional motif diminished if more motifs were already present (e.g., Figure 4C and Figure S3A). These diminishing returns implied some functional redundancy among the sequence motifs and between the structural and sequence features.
Knowing these features that define pri-miRNAs with awareness of their potential redundancies explains why most natural pri-miRNAs have only a subset of these features and why the primary-sequence motifs have much more impact in the context of some natural pri-miRNAs than they do in others. Pri-miR-125, which is much less reliant on the primary-sequence features than is either pri-miR-16 or pri-miR-30 (Auyeung et al., 2013), differs from the other two pri-miRNAs in having an unambiguously demarcated stem of 35 bp and in having the mismatched GHG motif—two beneficial features that appear to lower the functional impact of the other features (Figure S3A). Knowing these features that define pri-miRNAs also helps explain why pri-miR-16 and pri-miR-30 respond differently to perturbations in basal and apical regions (Figures 2 and S2) (Zeng et al., 2005; Han et al., 2006; Ma et al., 2013). Pri-miR-16 appears to have a good basal stem and a less optimal apical region, whereas pri-miR-30 appears to have an optimal apical region and a less optimal basal stem. Perhaps the more optimal regions are initially recognized and provide the primary guidance for determining the cleavage site, while the other regions accommodate Microprocessor binding.
Once we had identified structural and sequence features that define pri-miRNAs, designing artificial pri-miRNAs that were processed more efficiently than natural human pri-miRNAs was surprisingly straightforward and reliable. This accomplishment was not overstated by comparison to human miRNAs that were processed with unusually poor efficiency. Indeed, pri-miR-125, our internal standard for the competitive cleavage assays, has been the most efficiently processed of all natural pri-miRNAs that we have assayed in vitro, and pri-miR-1, our internal standard for accumulation of processed miRNA in vivo, accumulates to a level matching or exceeding that of any ectopically expressed miRNA that we have assessed using quantitative RNA blots (including miR-125 and miR-30). The de novo design of functional pri-miRNAs, particularly pri-miRNAs that were so efficiently processed, achieved a key milestone in the understanding of miRNA biogenesis.
The ease by which we were able to surpass processing efficiencies of natural pri-miRNAs implies that over the course of evolution natural pri-miRNAs have not acquired the most efficient possible processing. Processing efficiency of natural pri-miRNAs might not have been optimized for several reasons. First, some natural pri-miRNAs might be constitutively inefficiently processed to enable post-transcriptional regulation through the action of differentially expressed factors that enhance processing, presumably through recognition of features beyond those characterized here (Ha and Kim, 2014). Second, mutations favoring additional production of a mature miRNA can act at any step of miRNA production, and increasing transcriptional production might be more accessible than improving post-transcriptional processing, particularly when considering the diminishing returns of adding and maintaining each additional feature that favors pri-miRNA processing. Third, pri-miRNAs that share primary transcripts with either mRNAs or other pri-miRNAs might not be optimized for processing efficiency if rapid processing compromises expression of the co-transcribed RNA. For example, Drosha processing of a pri-miRNA from a pre-mRNA intron prior to splice-site definition would preclude production of the mature mRNA (Kim and Kim, 2007). Likewise, because 5′-to-3′ Xrn2-mediated exonucleolytic degradation of the cleavage product downstream of Drosha processing promotes RNA pol II release through a torpedo-like mechanism (Ballarino et al., 2009), rapid Drosha processing of an upstream pri-miRNA could compromise transcription or stability of a downstream pri-miRNA.
For the most part, shRNAs do not face the obstacles that prevent natural pri-miRNAs from achieving more efficient processing, the exception being unwanted Drosha cleavage of retroviral RNA during packaging of shRNA libraries (Liu et al., 2010), which can be controlled by inhibiting DGCR8 (Knott et al., 2014). Accordingly, we expect that applying our design principles to improve or replace the pri-miR-30 backbone will impart advantages to future generations of shRNA libraries. In addition, our high-throughput approach for identifying generic features that define human pri-miRNAs can be modified to reveal specialized features required for regulated processing of certain mammalian pri-miRNAs as well as the enigmatic features defining pri-miRNAs of other lineages, such as nematodes and plants.
Experimental Procedures
Pools of Pri-miRNA Variants
For each pri-miRNA, subpools of DNA oligonucleotides with all possible sequences at each of the mutagenized windows were synthesized and mixed prior to extension with primers that added the T7 promoter, barcodes and Illumina adapter sequences. All synthetic oligonucleotide sequences are provided (Table S1). The extended DNA pool for each pri-miRNA was purified, and a small fraction (10–16 million molecules) was amplified in a 1 ml PCR reaction. Some of the amplified DNA was sequenced on a HiSeq2000 (Illumina) to generate the dictionary, and some was transcribed to generate the RNA pool. To quantify the amount of each variant in the input, the barcode region of a portion of each RNA pool was sequenced. For additional details, see Extended Experimental Procedures.
In Vitro Cleavage and Analyses
5′-end-labeled pools were incubated in Microprocessor lysate, which was prepared from cells overexpressing Drosha and DGCR8 as described (Lee and Kim, 2007; Auyeung et al., 2013). After a brief incubation at 37°C, each reaction was stopped, and 5′ cleavage products were gel-purified and ligated at their 3′ ends to a pre-adenylated adapter using T4 RNA ligase 2, truncated KQ (NEB). Ligated cleavage products were then gel-purified, reverse transcribed, and sequenced. At each cleavage site, the cleavage score for each variant was calculated as
in which input(var) and cleaved(var) were the sum of the counts from all barcodes that were linked to the variant in the input or cleavage-product sequencing, respectively, with a pseudocount of 1 added to each; and input(wt) and cleaved(wt) were the analogous sums for the wild-type sequence, including the pseudocounts. For in vitro assays of designed variants, query and reference in vitro transcribed, gel-purified and cap-labeled pri-miRNAs were mixed and added to Microprocessor lysate. After either 2 minutes (assays of pri-miRNA-125 variants) or 5 minutes (assays of other pri-miRNA variants) at 37°C, reactions were phenol extracted, and RNA was precipitated and resolved on urea-acrylamide gels. For additional details, see Extended Experimental Procedures.
Pri-miRNA Processing and Mature miRNA Activity in Cells
Constructs that co-expressed the query pri-miRNA, pri-miR-1, and sometimes also pri-miR-30 were transfected into HEK293T cells using Lipofectamine 2000 (Life Technologies). After 36–48 hrs, total RNA was extracted, and miRNA accumulation was analyzed using RNA blots and small-RNA sequencing, whereas miRNA activity was analyzed by RNA-seq, using a NEXTflex Rapid Illumina Directional RNA-Seq Library Prep Kit (Bioo Scientific). For additional details, see Extended Experimental Procedures.
Supplementary Material
Acknowledgments
We thank X. Wu for the analysis of Figure 3C; V. Auyeung, S. McGeary, B. Kleaveland, I. Ulitsky, S. Eichhorn, V. Agarwal and L. Li for valuable discussions; S. McGeary for cell lysate; J. Stefano for technical assistance; V. N. Kim and T. Tuschl for plasmids; the Whitehead Institute Genome Technology Core for sequencing; and the Whitehead Institute FACS facility for cell sorting. This work is supported by NIH grant GM067031. W.F. is an HHMI Fellow of the Damon Runyon Cancer Research Foundation (DRG-2174-13). D.B. is an Investigator of the Howard Hughes Medical Institute.
Footnotes
Accession Numbers
Sequencing data and processed data are available at the Gene Expression Omnibus under accession number GSE67937.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
References
- Grishok A, Pasquinelli AE, Conte D, Li N, Parrish S, Ha I, Baillie DL, Fire A, Ruvkun G, Mello CC. Genes and mechanisms related to RNA interference regulate expression of the small temporal RNAs that control C. elegans developmental timing. Cell. 2001;106:23–34. doi: 10.1016/s0092-8674(01)00431-7. [DOI] [PubMed] [Google Scholar]
- Hutvagner G, McLachlan J, Pasquinelli AE, Balint E, Tuschl T, Zamore PD. A cellular function for the RNA-interference enzyme Dicer in the maturation of the let-7 small temporal RNA. Science. 2001;293:834–838. doi: 10.1126/science.1062961. [DOI] [PubMed] [Google Scholar]
- Hutvagner G, Zamore PD. A microRNA in a multiple-turnover RNAi enzyme complex. Science. 2002;297:2056–2060. doi: 10.1126/science.1073827. [DOI] [PubMed] [Google Scholar]
- Mourelatos Z, Dostie J, Paushkin S, Sharma A, Charroux B, Abel L, Rappsilber J, Mann M, Dreyfuss G. miRNPs: a novel class of ribonucleoproteins containing numerous microRNAs. Genes Dev. 2002;16:720–728. doi: 10.1101/gad.974702. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zeng Y, Wagner EJ, Cullen BR. Both natural and designed micro RNAs can inhibit the expression of cognate mRNAs when expressed in human cells. Mol Cell. 2002;9:1327–1333. doi: 10.1016/s1097-2765(02)00541-5. [DOI] [PubMed] [Google Scholar]
- Khvorova A, Reynolds A, Jayasena SD. Functional siRNAs and miRNAs exhibit strand bias. Cell. 2003;115:209–216. doi: 10.1016/s0092-8674(03)00801-8. [DOI] [PubMed] [Google Scholar]
- Lee Y, Ahn C, Han J, Choi H, Kim J, Yim J, Lee J, Provost P, Radmark O, Kim S, et al. The nuclear RNase III Drosha initiates microRNA processing. Nature. 2003;425:415–419. doi: 10.1038/nature01957. [DOI] [PubMed] [Google Scholar]
- Lim LP, Glasner ME, Yekta S, Burge CB, Bartel DP. Vertebrate microRNA genes. Science. 2003a;299:1540. doi: 10.1126/science.1080372. [DOI] [PubMed] [Google Scholar]
- Lim LP, Lau NC, Weinstein EG, Abdelhakim A, Yekta S, Rhoades MW, Burge CB, Bartel DP. The microRNAs of Caenorhabditis elegans. Genes Dev. 2003b;17:991–1008. doi: 10.1101/gad.1074403. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schwarz DS, Hutvagner G, Du T, Xu Z, Aronin N, Zamore PD. Asymmetry in the assembly of the RNAi enzyme complex. Cell. 2003;115:199–208. doi: 10.1016/s0092-8674(03)00759-1. [DOI] [PubMed] [Google Scholar]
- Zeng Y, Cullen BR. Sequence requirements for microRNA processing and function in human cells. RNA. 2003;9:112–123. doi: 10.1261/rna.2780503. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Denli AM, Tops BB, Plasterk RH, Ketting RF, Hannon GJ. Processing of primary microRNAs by the Microprocessor complex. Nature. 2004;432:231–235. doi: 10.1038/nature03049. [DOI] [PubMed] [Google Scholar]
- Gregory RI, Yan KP, Amuthan G, Chendrimada T, Doratotaj B, Cooch N, Shiekhattar R. The Microprocessor complex mediates the genesis of microRNAs. Nature. 2004;432:235–240. doi: 10.1038/nature03120. [DOI] [PubMed] [Google Scholar]
- Han J, Lee Y, Yeom KH, Kim YK, Jin H, Kim VN. The Drosha-DGCR8 complex in primary microRNA processing. Genes Dev. 2004;18:3016–3027. doi: 10.1101/gad.1262504. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liu J, Carmell MA, Rivas FV, Marsden CG, Thomson JM, Song JJ, Hammond SM, Joshua-Tor L, Hannon GJ. Argonaute2 is the catalytic engine of mammalian RNAi. Science. 2004;305:1437–1441. doi: 10.1126/science.1102513. [DOI] [PubMed] [Google Scholar]
- Song JJ, Smith SK, Hannon GJ, Joshua-Tor L. Crystal structure of Argonaute and its implications for RISC slicer activity. Science. 2004;305:1434–1437. doi: 10.1126/science.1102514. [DOI] [PubMed] [Google Scholar]
- Zhang H, Kolb FA, Jaskiewicz L, Westhof E, Filipowicz W. Single processing center models for human Dicer and bacterial RNase III. Cell. 2004;118:57–68. doi: 10.1016/j.cell.2004.06.017. [DOI] [PubMed] [Google Scholar]
- Bentwich I, Avniel A, Karov Y, Aharonov R, Gilad S, Barad O, Barzilai A, Einat P, Einav U, Meiri E, et al. Identification of hundreds of conserved and nonconserved human microRNAs. Nat Genet. 2005;37:766–770. doi: 10.1038/ng1590. [DOI] [PubMed] [Google Scholar]
- Silva JM, Li MZ, Chang K, Ge W, Golding MC, Rickles RJ, Siolas D, Hu G, Paddison PJ, Schlabach MR, et al. Second-generation shRNA libraries covering the mouse and human genomes. Nat Genet. 2005;37:1281–1288. doi: 10.1038/ng1650. [DOI] [PubMed] [Google Scholar]
- Zeng Y, Cullen BR. Efficient processing of primary microRNA hairpins by Drosha requires flanking nonstructured RNA sequences. J Biol Chem. 2005;280:27595–27603. doi: 10.1074/jbc.M504714200. [DOI] [PubMed] [Google Scholar]
- Zeng Y, Yi R, Cullen BR. Recognition and cleavage of primary microRNA precursors by the nuclear processing enzyme Drosha. EMBO J. 2005;24:138–148. doi: 10.1038/sj.emboj.7600491. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Han J, Lee Y, Yeom KH, Nam JW, Heo I, Rhee JK, Sohn SY, Cho Y, Zhang BT, Kim VN. Molecular basis for the recognition of primary microRNAs by the Drosha-DGCR8 complex. Cell. 2006;125:887–901. doi: 10.1016/j.cell.2006.03.043. [DOI] [PubMed] [Google Scholar]
- MacRae IJ, Zhou K, Li F, Repic A, Brooks AN, Cande WZ, Adams PD, Doudna JA. Structural basis for double-stranded RNA processing by Dicer. Science. 2006;311:195–198. doi: 10.1126/science.1121638. [DOI] [PubMed] [Google Scholar]
- Grimson A, Farh KK, Johnston WK, Garrett-Engele P, Lim LP, Bartel DP. MicroRNA targeting specificity in mammals: determinants beyond seed pairing. Mol Cell. 2007;27:91–105. doi: 10.1016/j.molcel.2007.06.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kim YK, Kim VN. Processing of intronic microRNAs. EMBO J. 2007;26:775–783. doi: 10.1038/sj.emboj.7601512. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lee Y, Kim VN. In vitro and in vivo assays for the activity of Drosha complex. Methods Enzymol. 2007;427:89–106. doi: 10.1016/S0076-6879(07)27005-3. [DOI] [PubMed] [Google Scholar]
- Ballarino M, Pagano F, Girardi E, Morlando M, Cacchiarelli D, Marchioni M, Proudfoot NJ, Bozzoni I. Coupled RNA processing and transcription of intergenic primary microRNAs. Mol Cell Biol. 2009;29:5632–5638. doi: 10.1128/MCB.00664-09. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bartel DP. MicroRNAs: target recognition and regulatory functions. Cell. 2009;136:215–233. doi: 10.1016/j.cell.2009.01.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liu YP, Vink MA, Westerink JT, Ramirez de Arellano E, Konstantinova P, Ter Brake O, Berkhout B. Titers of lentiviral vectors encoding shRNAs and miRNAs are reduced by different mechanisms that require distinct repair strategies. RNA. 2010;16:1328–1339. doi: 10.1261/rna.1887910. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Park JE, Heo I, Tian Y, Simanshu DK, Chang H, Jee D, Patel DJ, Kim VN. Dicer recognizes the 5′ end of RNA for efficient and accurate processing. Nature. 2011;475:201–205. doi: 10.1038/nature10198. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Winter J, Diederichs S. Argonaute proteins regulate microRNA stability: Increased microRNA abundance by Argonaute proteins is due to microRNA stabilization. RNA Biol. 2011;8:1149–1157. doi: 10.4161/rna.8.6.17665. [DOI] [PubMed] [Google Scholar]
- Gu S, Jin L, Zhang Y, Huang Y, Zhang F, Valdmanis PN, Kay MA. The loop position of shRNAs and pre-miRNAs is critical for the accuracy of dicer processing in vivo. Cell. 2012;151:900–911. doi: 10.1016/j.cell.2012.09.042. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Auyeung VC, Ulitsky I, McGeary SE, Bartel DP. Beyond secondary structure: primary-sequence determinants license pri-miRNA hairpins for processing. Cell. 2013;152:844–858. doi: 10.1016/j.cell.2013.01.031. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bassik MC, Kampmann M, Lebbink RJ, Wang S, Hein MY, Poser I, Weibezahn J, Horlbeck MA, Chen S, Mann M, et al. A systematic mammalian genetic interaction map reveals pathways underlying ricin susceptibility. Cell. 2013;152:909–922. doi: 10.1016/j.cell.2013.01.030. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fellmann C, Hoffmann T, Sridhar V, Hopfgartner B, Muhar M, Roth M, Lai DY, Barbosa IA, Kwon JS, Guan Y, et al. An optimized microRNA backbone for effective single-copy RNAi. Cell Rep. 2013;5:1704–1713. doi: 10.1016/j.celrep.2013.11.020. [DOI] [PubMed] [Google Scholar]
- Ma H, Wu Y, Choi JG, Wu H. Lower and upper stem-single-stranded RNA junctions together determine the Drosha cleavage site. Proc Natl Acad Sci USA. 2013;110:20687–20692. doi: 10.1073/pnas.1311639110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- O’Shea JP, Chou MF, Quader SA, Ryan JK, Church GM, Schwartz D. pLogo: a probabilistic approach to visualizing sequence motifs. Nat Methods. 2013;10:1211–1212. doi: 10.1038/nmeth.2646. [DOI] [PubMed] [Google Scholar]
- Ha M, Kim VN. Regulation of microRNA biogenesis. Nat Rev Mol Cell Biol. 2014;15:509–524. doi: 10.1038/nrm3838. [DOI] [PubMed] [Google Scholar]
- Knott SR, Maceli AR, Erard N, Chang K, Marran K, Zhou X, Gordon A, El Demerdash O, Wagenblast E, Kim S, et al. A computational algorithm to predict shRNA potency. Mol Cell. 2014;56:796–807. doi: 10.1016/j.molcel.2014.10.025. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kampmann M, Horlbeck MA, Chen Y, Tsai JC, Bassik MC, Gilbert LA, Villalta JE, Kwon SC, Chang H, Kim VN, et al. Next-generation libraries for robust RNA interference-based genome-wide screens. Proc Natl Acad Sci U S A. 2015;112:E3384–3391. doi: 10.1073/pnas.1508821112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nguyen TA, Jo MH, Choi YG, Park J, Kwon SC, Hohng S, Kim VN, Woo JS. Functional Anatomy of the Human Microprocessor. Cell. 2015;161:1374–1387. doi: 10.1016/j.cell.2015.05.010. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.





