Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2010 Apr 5;107(16):7377–7382. doi: 10.1073/pnas.1003055107

Function-based gene identification using enzymatically generated normalized shRNA library and massive parallel sequencing

Michael Shtutman a, Anil Maliyekkel a,b, Yu Shao c, C Steven Carmack c, Mirza Baig a, Natalie Warholic a, Kelly Cole a, Eugenia V Broude a, Timothy T Harkins d, Ye Ding c, Igor B Roninson a,2
PMCID: PMC2867740  PMID: 20368428

Abstract

As a general strategy for function-based gene identification, an shRNA library containing ≈150 shRNAs per gene was enzymatically generated from normalized (reduced-redundance) human cDNA. The library was constructed in an inducible lentiviral vector, enabling propagation of growth-inhibiting shRNAs and controlled activity measurements. RNAi activities were measured for 101 shRNA clones representing 100 human genes and for 201 shRNAs derived from a firefly luciferase gene. Structure-activity analysis of these two datasets yielded a set of structural criteria for shRNA efficacy, increasing the frequencies of active shRNAs up to 5-fold relative to random sampling. The same library was used to select shRNAs that inhibit breast carcinoma cell growth by targeting potential oncogenes. Genes targeted by the selected shRNAs were enriched for 10 pathways, 9 of which have been previously associated with various cancers, cell cycle progression, or apoptosis. One hundred nineteen genes, enriched through this selection and represented by two to six shRNAs each, were identified as potential cancer drug targets. Short interfering RNAs against 19 of 22 tested genes in this group inhibited cell growth, validating the efficiency of this strategy for high-throughput target gene identification.

Keywords: cancer targets, functional genomics, RNA interference, siRNA design, lentiviral vectors


RNAi, a general tool for targeted gene knockdown in mammalian cells, is carried out using synthetic siRNA duplexes or vectors expressing shRNA. shRNA is processed by enzyme Dicer to yield siRNA, which is incorporated into the RNA-induced silencing complex (RISC). RISC containing the guide (antisense) strand of siRNA causes endonucleolytic cleavage of target mRNA (1). Synthetic siRNA duplexes and shRNA templates are usually developed through rational design based on sequence-based rules that undergo continuous modification and optimization (2). The efficacy of the published rules for siRNA design remains controversial, and their applicability to shRNA is still uncertain (3).

As a tool for identifying genes, inhibition of which confers a selectable phenotype, several groups have generated expression libraries comprising vectors that express synthetic shRNA sequences targeting thousands of genes, typically at three to six different shRNAs per gene (410). Such libraries have been used for several types of selection to identify genes, knockdown of which produces a selectable phenotype. Designed shRNA libraries, however, are very expensive and time-consuming to generate for any new organism, and are limited to known genes and splice variants. Additionally, such libraries are not necessarily able to inhibit every gene they target, because the current status of shRNA design provides no assurance that the small number of shRNAs per gene will include active inhibitors.

An alternative strategy for shRNA library construction is enzymatic conversion of randomly fragmented cDNA into templates for shRNA (1117). A significant drawback of large-scale shRNA libraries derived from cellular cDNA is that the relative abundance of shRNA sequences is proportional to the starting mRNA they target. This variation among individual shRNAs makes it difficult to inhibit lower-abundance mRNAs. A second problem with enzymatically generated libraries is that their large size and undefined composition preclude the use of an efficient “molecular barcoding” approach (610) for rapid identification of enriched or depleted sequences.

In the present article, we describe a general strategy for function-based gene identification, applicable to essentially any organism and independent of the status of shRNA design rules. This strategy involves (i) enzymatic generation of transcriptome-scale shRNA libraries with relatively even representation of different genes, (ii) expression selection of functional shRNAs using a regulated lentiviral expression vector, (iii) identification of sequences enriched by selection through massive parallel sequencing, and (iv) validation of shRNA targets identified by selection using synthetic siRNAs. Structure-activity analysis of multiple clones from such a library enabled us to identify the significant parameters associated with determining shRNA activity. We also demonstrate the utility and effectiveness of this strategy for identifying genes required for breast carcinoma cell growth, for which this strategy yielded genes and pathways implicated in cell growth and cancer, potential targets for anticancer drugs.

Results and Discussion

shRNA Library Construction.

Our strategy for enzymatic generation of shRNA templates is schematized in Fig. 1A and described in SI Methods. DNase I digestion of target DNA is used to generate random double-stranded fragments (step 1), followed by ligation of the ends of these fragments to a single-stranded adaptor that forms a hairpin (step 2). The hairpin adaptor (HA; Fig. S1A) contains the loop from mir-23 miRNA and a recognition site for restriction enzyme MmeI, which cuts within the cDNA sequence 18–20 nt away from its recognition site, producing a targeting sequence of a size suitable for shRNA (step 3). The MmeI-generated fragments with 3′ NN overhangs are then ligated to a second adaptor (the termination adaptor, TA; Fig. S1A) (step 4), which provides an internal primer for subsequent extension (step 5). Parts of TA sequences are then removed by restriction enzyme digestion (step 6) to generate shRNA templates containing an inverted repeat followed by Pol III termination signal; the templates are then ligated into an expression vector to produce a library (step 7).

Fig. 1.

Fig. 1.

shRNA library construction. (A) Scheme of shRNA library construction from randomly fragmented DNA. (B) Diagram of LLCEP TU6LX vector. CMV + LTR, LTR promoter with CMV enhancer; SIN LTR, self-inactivating LTR promoter; CPPT, central polypurine tract sequence; WPRE, woodchuck hepatitis posttranscriptional regulatory element; TetOx7, module of 7 tet operator repeats; CAT, chloramphenycol acetyltransferase; ccdB, cytotoxic protein (negative selection marker). (C) RNA expression levels in MCF7 cells, plotted in the order of increasing expression for all human UniGene entries and for genes identified in the shRNA library.

The principal differences between this strategy and earlier protocols (11, 12, 1216) are as follows. (i) We use a much shorter HA, which is retained in the final shRNA library without truncation. The adaptor-derived stem sequence, joined to 19–21 bp of cDNA sequence, produces a total hairpin stem length of 27–29 bp, the size that produces more efficient knockdown (18). (ii) The design of the TA (Fig. S1) provides a Pol III termination signal and a 3′ (G/A)N overhang that places a purine at the +1 position relative to the promoter improving Pol III transcription (19) and includes a single-stranded nick that primes the extension with Klenow fragment (Fig. 1A, step 5), without the need for an external primer. (iii) The library is constructed in a tightly regulated lentiviral vector LLCEP TU6LX (Fig. 1B), an optimized version of our previously described vector LLCEP TU6X, which is regulated by tetracycline/doxycycline via tTR-KRAB repressor (20). The use of this vector for shRNA expression prevents the loss of growth-inhibitory sequences and allows for precisely controlled shRNA activity measurements.

The library construction strategy was first tested on the GL2 firefly luciferase gene, randomly fragmented with DNaseI. Approximately 90% of 700 clones sequenced from the luciferase-derived shRNA library contained proper shRNA structures comprising unique 19–21-bp sequences of the luciferase gene. To generate a human transcriptome library and to overcome the problem of uneven representation of different genes in conventional cDNA, we took advantage of the process of cDNA normalization that equalizes the abundance of different mRNA sequences (21). As the starting material for shRNA library construction, we used a library originally developed for the selection of genetic suppressor elements (GSE, short cDNA fragments encoding either antisense RNA or protein fragments acting as transdominant inhibitors). The GSE library was derived from randomly fragmented cDNA of MCF7 human breast carcinoma cells, normalized by Cot fractionation (21). The GSE library construction strategy (22) favored directional cloning of cDNA fragments flanked by different adaptors at the 5′ and 3′ ends relative to the original mRNA; sequence analysis of representative clones showed that approximately two thirds of the clones contained the adaptors in the preferred orientation. Partial directionality of the GSE library offered a possibility of creating an shRNA library where most of the shRNA sequences would have 5′-sense-loop-antisense-3′ (SA) strand orientation, which seemed desirable owing to the reports that shRNA sequences with 5′-antisense-loop-sense-3 (AS) orientation were less effective (11, 12). We replaced the 3′ adaptor of normalized cDNA fragments from the GSE library with the HA, with the subsequent steps of library construction the same as in Fig. 1A.

The shRNA library from normalized cDNA contained a total of 2.8 × 106 clones. Sequence analysis of 676 randomly picked clones showed that 632 of them (93.5%) contained proper stem-and-loop inserts. Of these, 500 clones (79.1%) matched with the UniGene transcript database and targeted 461 UniGene entries. Another 76 clones (12.0%) matched human genome sequences but not sequences within the UniGene database, and are likely to represent as-yet-unidentified transcripts. Of the sequenced clones, 68.2% had SA orientation and 31.8% had AS orientation, as in the starting GSE library. The length distribution of the targeted cDNA sequences was 52.4% 20 bp, 46.8% 21 bp, and 0.8% 19 bp, reflecting the heterogeneity of MmeI digestion.

To estimate the normalization and representativity of the shRNA library, we have analyzed the distribution of shRNA sequences identified by massive parallel sequencing of inserts recovered from genomic DNA of cells transduced with the library (see below). A total of 53,201 shRNAs corresponding to 14,699 UniGene entries were identified, with 72% of the entries differing in representation no more than 3-fold. Fig. 1C displays RNA expression [signal intensity in microarray hybridization (23)] in MCF7 cells for ≈24,000 probe sets representing all of the UniGene entries in the microarray (one probe set per entry), plotted in the order of increasing expression, as well as for probe sets representing UniGene entries identified in our library. Comparison of the two curves shows that genes expressed at the highest and the lowest levels in MCF7 cDNA have similar representation in the library, whereas the genes with intermediate levels of expression are moderately overrepresented (Fig. 1C). Hence, our library provides relatively uniform representation of at least ≈15,000 genes, at the average of >150 shRNAs per gene.

Activity Assays of Luciferase-Derived and cDNA-Derived shRNAs.

The use of an inducible promoter for shRNA expression allows for precisely controlled measurement of RNAi activity, by comparing target expression levels in the presence and in the absence of the inducer. To generate data for shRNA structure-activity analysis, we carried out high-throughput assays of target knockdown by randomly picked clones from both luciferase- and cDNA-derived libraries, as schematized in Fig. S2A. The luciferase-derived clones were assayed for luciferase activity knockdown in HT1080 human fibrosarcoma cells expressing GL2 luciferase, as determined by comparing normalized luciferase activity in the presence and in the absence of doxycycline. Fig. 2A shows luciferase knockdown in 230 cell populations corresponding to 201 different shRNA sequences. Thirty-five percent and 11% of the clones produced >50% and >75% knockdown, respectively. Although none of the clones showed >90% knockdown in this assay, the knockdown efficiency was lowered in this assay by the relatively low lentiviral transduction rate in 96-well plates [low transduction leads to lower transgene expression in drug-selected populations (24)]. In the experiment shown in Fig. 2B, 96 clones representing a range of RNAi activities were retested by more efficient transduction in 24-well plates, along with the vector expressing a commercially available highly efficient reference shRNA for GL2 luciferase. The knockdown rates in Fig. 2B were calculated for each shRNA in the presence of doxycycline relative to cells transduced with insert-free vector. This analysis showed higher knockdown rates and an overall concordance with the data in Fig. 2A (Pearson correlation coefficient 0.81, P = 2 × 10−16) (Fig. S2B). The reference shRNA produced 90% knockdown, and this efficiency was exceeded by 15 of 96 clones generated from random fragments (Fig. 2B). Hence, random fragment conversion can produce shRNA inhibitors that are at least as efficient as rationally designed shRNA.

Fig. 2.

Fig. 2.

shRNA activity testing. (A) RNAi activity of 230 luciferase-derived shRNAs in luciferase-expressing HT1080 cells, transduced and analyzed in 96-well plates. Target knockdown is measured by reduction in normalized luciferase activity in doxycycline-treated relative to untreated cells. shRNA sequences arranged in the order of increasing activity. Mean and standard deviation shown for triplicate experiments. (B) RNAi activity of 96 luciferase-derived shRNAs from the set in A, retransduced in 24-well plates. Target knockdown is measured by reduction in normalized luciferase activity in doxycycline-treated cells relative to control vector-transduced cells. Knockdown by reference luciferase shRNA (from Clontech) is marked with a dashed line. Mean and standard deviation are shown for duplicate experiments. (C) RNAi activity of 101 cDNA-derived shRNAs in MCF7 cells, transduced and analyzed in 96-well plates. Target knockdown is measured by reduction in RNA levels for the corresponding genes, as measured by QPCR, in doxycycline-treated relative to untreated cells. Mean and standard deviation shown for triplicate QPCR assays. (D) Immunoblotting analysis of the indicated proteins in MCF7 cells transduced with shRNAs targeting the corresponding genes and grown in the absence (-) or in the presence of doxycycline (+). Beta-actin was used as a loading control. RNA knockdown was determined by QPCR; protein knockdown was determined by image quantitation.

To evaluate RNAi activity of shRNA clones derived from normalized cDNA, we analyzed 101 cell populations transduced with clones corresponding to 100 different UniGene entries for a decrease in the RNA level of genes targeted by each shRNA. This analysis was performed by quantitative RT-PCR (QPCR) with gene-specific primers using RNA from cell populations grown with or without doxycycline. The distribution of RNAi activities (Fig. 2C) shows that 10% of the tested clones produced ≥50% knockdown by this criterion. Immunoblotting analysis for four of the protein targets of shRNAs that produced ≈50% knockdown in mRNA levels showed a decrease in the corresponding protein levels, which was in some cases much stronger than the mRNA decrease measured by QPCR (Fig. 2D). The lower apparent frequency of effective inhibitors among cDNA-derived shRNAs than among luciferase-derived shRNAs could reflect higher sensitivity of the artifical GL2 luciferase transcript to RNAi relative to human mRNAs. Alternatively, the difference could be due to lower sensitivity of the QPCR assay relative to the protein activity assay or to different RNAi activities between the cell lines (HT1080 and MCF7) in which these clone sets were tested. Indeed, we were able to achieve only ≈75% luciferase knockdown with the reference shRNA in MCF7 cells, as opposed to >90% knockdown in HT1080 cells. Assuming 10% frequency of active shRNAs and ≈150 shRNAs per gene, an average gene in the library should be represented by ≈15 active shRNAs.

Structure/Activity Analysis of shRNA Sequences.

We investigated whether commonly used criteria for shRNA design, namely the choice of SA over AS orientation of shRNA and the Dharmacon score [a cumulative score for several criteria based on the sequence of the 19-nt siRNA guide strand (25)], show activity correlations with our datasets of 201 luciferase-derived and 101 cDNA-derived shRNA activities. The distributions of shRNA activities of both datasets according to the shRNA orientation and the Dharmacon score are shown in Fig. 3A. SA orientation was associated with higher activity in both datasets, but this difference was significant only for the luciferase dataset (P = 0.0002 for luciferase and P = 0.1137 for cDNA dataset, Welch's t test). Dharmacon score showed significant correlation with activity in both datasets [Spearman correlation 0.26 (P = 0.0002) for luciferase and 0.27 (P = 0.0062) for cDNA dataset]. Nevertheless, the majority of active shRNAs in both datasets showed relatively low Dharmacon scores (≤7), and surprisingly, many AS-oriented shRNAs were highly active (Fig. 3A). Therefore, we have conducted a structure-activity analysis (SI Methods) by including the following parameters (Table S1): the effects of the shRNA length and orientation, the levels of gene expression, the role of individual bases at each position in shRNA (Fig. S2B), the presence of runs of identical nucleotides, and five energy parameters based on the structures of both the processed shRNA and its mRNA target calculated using the Sfold algorithm (26). The goal of the analysis was to identify the parameters and cutoff values (filtering criteria) associated with RNAi activity.

Fig. 3.

Fig. 3.

shRNA structure-activity analysis. (A) Distribution of shRNA activities according to the Dharmacon score and shRNA orientation. Closed circles, SA shRNAs. Open circles, AS shRNAs. (B) Activity of shRNAs that passed (closed circles) or failed (open circles) the combination of five filtering criteria but not target disruption energy (Top) or all six filters combined (Bottom). P values were determined by Welch's t test (two sided, unequal variance). Left: luciferase-derived shRNA. Right: cDNA-derived shRNA.

Most of the tested parameters, including secondary structure formation by siRNA guide strand, siRNA target binding energy, average internal stability at the cleavage site, and levels of gene expression (for the cDNA-derived dataset), showed no correlations with shRNA activity. The filtering criteria that produced significant activity discrimination in both datasets are presented in Figs. S3, S4, S5, and S6. The preferences identified in both cDNA and luciferase sets include (i) the SA orientation in those shRNAs that contain G as the first base; (ii) absence of runs of three G, three C, or four A nucleotides anywhere within the shRNA transcript, or three U nucleotides preceding the first U of the termination signal; (iii) GC content between 35% and 60%; (iv) The free-energy difference between the 5′ and the 3′ ends of the processed siRNA guide strand characterized by DSSE (differential stability in siRNA duplex ends) values of ≤−1 kcal/mol; and (v) the absence of C in the second position of the guide strand (Fig. S2B). As presented elsewhere (27), we have also found that the activity in the cDNA dataset was significantly correlated with target disruption energy, a measure of accessibility of the target mRNA. cDNA-derived shRNAs showed lower activity when their disruption energy was <−15 kcal/mol, which we have used as the sixth filtering criterion. Target disruption energy was also found to be significant in independent siRNA datasets from three human genes (27) but did not correlate with RNAi activity in the luciferase dataset, suggesting that luciferase inhibition by RNAi does not depend on the accessibility of individual sequences within this artificial target.

Fig 3B compares the activities of shRNAs that either passed or failed the combination of the first five filtering criteria (excluding target disruption energy) or all six filters (the comparisons for individual filters are shown in Fig. S3). After applying the first five filters, the fraction of luciferase-derived shRNAs that inhibit luciferase activity >2-fold increased from 34% in the unfiltered set to 71%. The active cDNA-derived shRNAs, defined as those that decrease target mRNA >2-fold by QPCR, increased from 10% to 40%. The addition of the target disruption energy filter produced an additional improvement in the cDNA dataset, increasing the active fraction to 50% (6 of 12) and allowing us to identify five of six most active shRNAs. Future analysis of additional clones from the cDNA-derived library should yield other significant correlations that will allow for even more rigorous selection of active shRNAs.

Identification of Genes Required for Breast Carcinoma Cell Growth Through Growth-Inhibitory shRNA Selection and Massive Parallel Sequencing.

Genes required for the cell growth are expected to give rise to shRNAs that would inhibit cell proliferation. Such inhibitors can be isolated through negative selection techniques, such as BrdU suicide selection, previously used to identify growth-inhibitory GSEs (22, 28). We have now used our normalized cDNA library in the same selection (Fig. 4A), taking advantage of the massive parallel sequencing technology for identifying sequences enriched in selection.

Fig. 4.

Fig. 4.

Selection and validation of growth-inhibitory shRNAs. (A) Scheme of BrdU suicide selection for doxycycline-dependent growth inhibition. (B) Scheme of selection and analysis of growth-inhibitory shRNAs. (C) Testing 22 gene targets enriched by shRNA selection. Four synthetic siRNAs per target (sets A–D) from Qiagen were tested for the ability to inhibit MDA-MB-231 cell growth. A cytotoxic siRNA mixture (tox) was used as a positive control; siRNAs targeting GFP (siGFP) or targeting no human gene (siControl) were used as negative controls. (D) The same analysis conducted on 12 gene targets with unaltered shRNA representation after BrdU suicide selection.

The scheme of this analysis strategy is presented in Fig. 4B. The shRNA library was transduced into 2.5 × 107 MDA-MB-231 breast carcinoma cells expressing tTR-KRAB. 25% of the cells were used for DNA extraction, and the rest were selected for doxycycline-dependent resistance to BrdU suicide. Cells surviving the selection were used for DNA extraction. The integrated shRNA templates were amplified by PCR from genomic DNA extracted from the unselected and the BrdU-selected library-transduced cells using vector-specific primers, and the PCR product was subjected to 454 pyrosequencing. BLAST analysis of the sequence data yielded 53,201 sequences with homology to Unigene database entries before selection and 53,803 sequences after selection. These sequences matched 14,699 and 3,316 Unigene clusters respectively, indicating that selection has occurred. Sequences of 741 genes in the selected subset were enriched at least 4-fold after selection, and one of the most enriched genes was KRAS, an oncogene that has undergone an activating mutation in MDA-MB-231 cells (29). Table 1 shows the top 10 Kyoto Encyclopedia of Genes and Genomes pathways that were significantly enriched in this group of genes, as determined using the Pathway-Express program (30). Strikingly, 9 of the 10 pathways were associated with various cancers, cell cycle progression, or apoptosis. This analysis validates our selection system as capable of identifying oncogenes.

Table 1.

Pathways enriched among genes targeted by shRNAs enriched by BrdU suicide selection (ABC, ATP-binding cassette)

Rank Pathway name Impact factor Genes in pathway (n) Input genes in pathway (n) Input genes in pathway (%) Pathway genes in input (%) P value
1 Prostate cancer 8.207 90 11 1.488 12.222 2.73E-04
2 Pathways in cancer 7.663 330 24 3.248 7.273 4.70E-04
3 Cell cycle 7.598 118 12 1.624 10.169 5.01E-04
4 Small cell lung cancer 5.928 86 9 1.218 10.465 0.002664
5 Pancreatic cancer 5.698 72 8 1.083 11.111 0.003354
6 Apoptosis 5.691 89 9 1.218 10.112 0.003375
7 ABC transporters 5.557 44 6 0.812 13.636 0.003862
8 Chronic myeloid leukemia 5.527 75 8 1.083 10.667 0.003977
9 Colorectal cancer 4.835 84 8 1.083 9.524 0.007948
10 Endometrial cancer 4.823 52 6 0.812 11.538 0.008042

Of the identified genes, 119 were targeted by two to six different enriched shRNA sequences and therefore represent the most likely targets. These genes, among which KRAS showed the strongest enrichment, are listed in Table S2. To verify the role of such genes in cell growth, we have picked 22 of the most enriched genes represented by at least two selected shRNA sequences and 12 genes that showed no change in shRNA representation after selection. Instead of the laborious process of individually assaying specific shRNAs enriched by selection, we tested the role of candidate genes in cell growth by an independent procedure, based on transfection with synthetic siRNAs (designed by Qiagen), to determine whether such siRNAs will inhibit cell growth. Individual siRNAs were transfected into MDA-MB-231 cells, at four siRNAs per gene. A cytotoxic mixture of siRNAs derived from several essential genes was used as a positive control, and siRNAs targeting either no known genes or GFP were used as negative controls. Relative cell number was determined 6 days after siRNA transfection. As shown in Fig. 4C, one to four siRNAs per gene, targeting 19 of 22 tested genes (86%), inhibited cell growth relative to the negative controls (P values <0.05 for 17 genes and 0.053 and 0.07 for two other genes); KRAS targeting siRNAs showed the strongest effect. In contrast, none of the siRNAs in the control group of 12 genes with unchanged shRNA representation inhibited cell growth (Fig. 4D). Hence, the use of an enzymatically generated shRNA library coupled with BrdU suicide selection, massive parallel sequencing, and target verification by synthetic siRNAs, provides for efficient identification of genes required for cell growth. The analysis strategy of the present study should be generally applicable to high-throughput identification of genes involved in many different phenotypes.

Methods

All of the procedures are described in detail in SI Methods. The coding sequence of GL2 firefly luciferase from pGEM-luc vector (Promega) was used to generate the luciferase-based library. The starting material for the cDNA-based library was a GSE library of normalized cDNA fragments from MCF7 breast carcinoma cells, prepared as described previously (22). The lentiviral vector LLCEP TU6LX is an optimized version of LLCEP TU6X (20).

The scheme of the high-throughput activity assays of individual shRNA clones is shown in Fig. S2. Luciferase activity assays were carried out in human HT1080 fibrosarcoma cells expressing GL2 luciferase and tTR-KRAB repressor with DsRed fluorescent protein. Human gene knockdown assays were carried out in MCF7 breast carcinoma cells expressing tTR-KRAB, by QPCR. To select growth-inhibitory shRNAs, the cDNA-based library was transduced into MDA-MB-231 breast carcinoma cells expressing tTR-KRAB repressor and subjected to selection for doxycycline-dependent resistance to BrdU suicide. The shRNA sequences, amplified by PCR from the DNA of the unselected and BrdU-selected cells using vector-specific primers, were subjected to ultra-high throughput sequencing using the 454 Sequencing System. For siRNA verification assays, four siRNAs per gene (Qiagen), along with control siRNAs, were transfected into MDA-MB-231 cells in 96-well plates, in triplicates, at 5 nM of siRNA per well. Relative cell number was determined 6 days after siRNA transfection by staining cellular DNA with Hoechst 33342. The Student t test (one-tailed, unequal variance) was used for assessing statistical difference between the inhibitory effects of tested and control siRNA.

Sequencing of plasmid DNA from randomly picked colonies was carried out on ABI 3730. Attribution of shRNA sequences was performed using National Center for Biotechnology Information BLAST and Java and Perl programs written for this analysis. Clone representation was correlated with the data from Affymetrix U133 Plus 2.0 microarray analysis of gene expression in exponentially growing MCF7 cells (23), using GeneSpring (Agilent). Different parameters used for shRNA structure-activity analysis were computed using the Sfold program (26). Plotting of the calculated parameters and nucleotides at each position in the guide strand, as well as statistical analysis, was carried out using R 2.3.1 software.

Supplementary Material

Supporting Information

Acknowledgments

We thank Dr. Yongzhi Xuan, who constructed the GSE library; George Kampo and Gregory Hurteau of Ordway Research Institute's Functional Genomics Facility for assistance with high-throughput activity assays; Drs. Pascal Bouffard and Michael Egholm for sequencing support; Dr. Inder Verma for lentiviral packaging constructs; Dr. Didier Trono for tTR-KRAB-expressing vector; and Drs. Brian Davis and Clarence Chan for helpful discussions. The Computational Molecular Biology and Statistics Core at the Wadsworth Center provided computing resources for this work. This work was supported by National Institutes of Health (NIH) Grants R33 CA95996, R01 AG028687, R01 CA62099, and R01 AG17921 (to I.B.R.), Department of Defense Grant W81XWH-08-1-0070 (to M.S.), and National Science Foundation Grant DBI-0650991 and NIH Grant R01 GM068726 (to Y.D.).

Footnotes

The authors declare no conflict of interest.

1M.S. and A.M. contributed equally to this work.

This article contains supporting information online at www.pnas.org/cgi/content/full/1003055107/DCSupplemental.

References

  • 1.Valencia-Sanchez MA, Liu J, Hannon GJ, Parker R. Control of translation and mRNA degradation by miRNAs and siRNAs. Genes Dev. 2006;20:515–524. doi: 10.1101/gad.1399806. [DOI] [PubMed] [Google Scholar]
  • 2.Pei Y, Tuschl T. On the art of identifying effective and specific siRNAs. Nat Methods. 2006;3:670–676. doi: 10.1038/nmeth911. [DOI] [PubMed] [Google Scholar]
  • 3.Taxman DJ, et al. Criteria for effective design, construction, and gene knockdown by shRNA vectors. BMC Biotechnol. 2006;6:7. doi: 10.1186/1472-6750-6-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Berns K, et al. A large-scale RNAi screen in human cells identifies new components of the p53 pathway. Nature. 2004;428:431–437. doi: 10.1038/nature02371. [DOI] [PubMed] [Google Scholar]
  • 5.Paddison PJ, et al. A resource for large-scale RNA-interference-based screens in mammals. Nature. 2004;428:427–431. doi: 10.1038/nature02370. [DOI] [PubMed] [Google Scholar]
  • 6.Silva JM, et al. Second-generation shRNA libraries covering the mouse and human genomes. Nat Genet. 2005;37:1281–1288. doi: 10.1038/ng1650. [DOI] [PubMed] [Google Scholar]
  • 7.Ngo VN, et al. A loss-of-function RNA interference screen for molecular targets in cancer. Nature. 2006;441:106–110. doi: 10.1038/nature04687. [DOI] [PubMed] [Google Scholar]
  • 8.Moffat J, et al. A lentiviral RNAi library for human and mouse genes applied to an arrayed viral high-content screen. Cell. 2006;124:1283–1298. doi: 10.1016/j.cell.2006.01.040. [DOI] [PubMed] [Google Scholar]
  • 9.Mullenders J, Fabius AW, Madiredjo M, Bernards R, Beijersbergen RL. A large scale shRNA barcode screen identifies the circadian clock component ARNTL as putative regulator of the p53 tumor suppressor pathway. PLoS One. 2009;4:e4798. doi: 10.1371/journal.pone.0004798. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Schlabach MR, et al. Cancer proliferation gene discovery through functional genomics. Science. 2008;319:620–624. doi: 10.1126/science.1149200. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Sen G, Wehrman TS, Myers JW, Blau HM. Restriction enzyme-generated siRNA (REGS) vectors and libraries. Nat Genet. 2004;36:183–189. doi: 10.1038/ng1288. [DOI] [PubMed] [Google Scholar]
  • 12.Shirane D, et al. Enzymatic production of RNAi libraries from cDNAs. Nat Genet. 2004;36:190–196. doi: 10.1038/ng1290. [DOI] [PubMed] [Google Scholar]
  • 13.Luo B, Heard AD, Lodish HF. Small interfering RNA production by enzymatic engineering of DNA (SPEED) Proc Natl Acad Sci USA. 2004;101:5494–5499. doi: 10.1073/pnas.0400551101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Dinh A, Mo YY. Alternative approach to generate shRNA from cDNA. Biotechniques. 2005;38:629–632. doi: 10.2144/05384RN02. [DOI] [PubMed] [Google Scholar]
  • 15.Du C, et al. PCR-based generation of shRNA libraries from cDNAs. BMC Biotechnol. 2006;6:28. doi: 10.1186/1472-6750-6-28. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Fukano H, Hayatsu N, Goto R, Suzuki Y. A technique to enzymatically construct libraries which express short hairpin RNA of arbitrary stem length. Biochem Biophys Res Commun. 2006;347:543–550. doi: 10.1016/j.bbrc.2006.05.124. [DOI] [PubMed] [Google Scholar]
  • 17.Xu L, et al. Construction of equalized short hairpin RNA library from human brain cDNA. J Biotechnol. 2007;128:477–485. doi: 10.1016/j.jbiotec.2006.11.013. [DOI] [PubMed] [Google Scholar]
  • 18.Siolas D, et al. Synthetic shRNAs as potent RNAi triggers. Nat Biotechnol. 2005;23:227–231. doi: 10.1038/nbt1052. [DOI] [PubMed] [Google Scholar]
  • 19.Goomer RS, Kunkel GR. The transcriptional start site for a human U6 small nuclear RNA gene is dictated by a compound promoter element consisting of the PSE and the TATA box. Nucleic Acids Res. 1992;20:4903–4912. doi: 10.1093/nar/20.18.4903. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Maliyekkel A, Davis BM, Roninson IB. Cell cycle arrest drastically extends the duration of gene silencing after transient expression of short hairpin RNA. Cell Cycle. 2006;5:2390–2395. doi: 10.4161/cc.5.20.3363. [DOI] [PubMed] [Google Scholar]
  • 21.Patanjali SR, Parimoo S, Weissman SM. Construction of a uniform-abundance (normalized) cDNA library. Proc Natl Acad Sci USA. 1991;88:1943–1947. doi: 10.1073/pnas.88.5.1943. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Primiano T, et al. Identification of potential anticancer drug targets through the selection of growth-inhibitory genetic suppressor elements. Cancer Cell. 2003;4:41–53. doi: 10.1016/s1535-6108(03)00169-7. [DOI] [PubMed] [Google Scholar]
  • 23.Chen Y, Dokmanovic M, Stein WD, Ardecky RJ, Roninson IB. Agonist and antagonist of retinoic acid receptors cause similar changes in gene expression and induce senescence-like growth arrest in MCF-7 breast carcinoma cells. Cancer Res. 2006;66:8749–8761. doi: 10.1158/0008-5472.CAN-06-0581. [DOI] [PubMed] [Google Scholar]
  • 24.Schott B, Iraj ES, Roninson IB. Effects of infection rate and selection pressure on gene expression from an internal promoter of a double gene retroviral vector. Somat Cell Mol Genet. 1996;22:291–309. doi: 10.1007/BF02369568. [DOI] [PubMed] [Google Scholar]
  • 25.Reynolds A, et al. Rational siRNA design for RNA interference. Nat Biotechnol. 2004;22:326–330. doi: 10.1038/nbt936. [DOI] [PubMed] [Google Scholar]
  • 26.Ding Y, Chan CY, Lawrence CE. Sfold web server for statistical folding and rational design of nucleic acids. Nucleic Acids Res. 2004;32(Web Server issue):W135–41. doi: 10.1093/nar/gkh449. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Shao Y, et al. Effect of target secondary structure on RNAi efficiency. RNA. 2007;13:1631–1640. doi: 10.1261/rna.546207. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Pestov DG, Lau LF. Genetic selection of growth-inhibitory sequences in mammalian cells. Proc Natl Acad Sci USA. 1994;91:12549–12553. doi: 10.1073/pnas.91.26.12549. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Kozma SC, et al. The human c-Kirsten ras gene is activated by a novel mutation in codon 13 in the breast carcinoma cell line MDA-MB231. Nucleic Acids Res. 1987;15:5963–5971. doi: 10.1093/nar/15.15.5963. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Draghici S, et al. A systems biology approach for pathway level analysis. Genome Res. 2007;17:1537–1545. doi: 10.1101/gr.6202607. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supporting Information

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES