Abstract
RNA interference (RNAi) is a widely adopted tool for loss-of-function studies but RNAi results only have biological relevance if the reagents are appropriately mapped to genes. Several groups have designed and generated RNAi reagent libraries for studies in cells or in vivo for Drosophila and other species. At first glance, matching RNAi reagents to genes appears to be a simple problem, as each reagent is typically designed to target a single gene. In practice, however, the reagent–gene relationship is complex. Although the sequences of oligonucleotides used to generate most types of RNAi reagents are static, the reference genome and gene annotations are regularly updated. Thus, at the time a researcher chooses an RNAi reagent or analyzes RNAi data, the most current interpretation of the RNAi reagent–gene relationship, as well as related information regarding specificity (e.g., predicted off-target effects), can be different from the original interpretation. Here, we describe a set of strategies and an accompanying online tool, UP-TORR (for Updated Targets of RNAi Reagents; www.flyrnai.org/up-torr), useful for accurate and up-to-date annotation of cell-based and in vivo RNAi reagents. Importantly, UP-TORR automatically synchronizes with gene annotations daily, retrieving the most current information available, and for Drosophila, also synchronizes with the major reagent collections. Thus, UP-TORR allows users to choose the most appropriate RNAi reagents at the onset of a study, as well as to perform the most appropriate analyses of results of RNAi-based studies.
Keywords: Drosophila, RNAi, annotation, genome
RNA interference (RNAi) is an effective tool to study gene function. In particular, genome-scale RNAi screens in mammalian and Drosophila cultured cells, as well as in vivo in Drosophila and Caenorhabditis elegans, have made contributions to a number of areas of study (Kamath et al. 2003; Dietzl et al. 2007; Boutros and Ahringer 2008; Mohr et al. 2010; Perrimon et al. 2010; Qu et al. 2011; Mohr and Perrimon 2012). RNAi screening is dependent not only on the availability of RNAi reagents but also on accurate information regarding the predicted gene targets of the reagents. Large-scale RNAi libraries are available for a number of model systems. Although different types of RNAi reagents are used in different systems, there is a common and significant need to keep RNAi reagent annotations up to date with new genome assemblies and gene annotations.
A large number of cell-based RNAi screens have been performed using various genome-scale RNAi reagent libraries (Mohr et al. 2010). RNAi reagents for Drosophila cells are usually long (∼100–500 bp) double-stranded RNAs (dsRNAs) made by PCR using a genomic or cDNA template, followed by in vitro transcription. In the cell, dsRNAs are processed by the endogenous RNAi machinery, generating active RNAi reagents, i.e., small dsRNA segments typically 20–22 bp in length with a 2-bp 3′ overhang (Clemens et al. 2000; Hammond et al. 2000). In Drosophila, dsRNAs can be easily introduced into cultured cells (Clemens et al. 2000; Hammond et al. 2000). Several large-scale facilities, including the Drosophila RNAi Screening Center (DRSC) at Harvard Medical School, Boutros lab at German Cancer Research Center (DKFZ), RNAi Core at New York University, and Sheffield RNAi Screening Facility (SRSF), support Drosophila cell-based RNAi screening and offer genome-wide libraries with multiple dsRNAs-per-gene coverage. For mammalian cells, RNAi screens are done using synthesized short interfering RNAs (siRNAs), endoribonuclease-prepared short interfering RNAs (esiRNAs), or plasmid- or viral-encoded short hairpin RNAs (shRNAs) (Root et al. 2006; Kittler et al. 2007; Micklem and Lorens 2007). Similar to Drosophila cell screens, mammalian screens are typically performed in individual labs or in conjunction with one of several academic screening facilities that provide automation and database support for screens.
RNAi reagents have also been developed for in vivo screens in various systems. In C. elegans, RNAi is systemic, and gene expression can be knocked down efficiently by feeding worms with bacteria expressing a long dsRNA (Fraser et al. 2000). A genome-scale RNAi feeding library is available (Kamath et al. 2003) and widely used for functional studies. For Drosophila, in vivo RNAi relies on transgenic flies carrying RNAi transgenes that can be combined with the Gal4/UAS system for developmental, stage- and/or tissue-specific knockdown (Dietzl et al. 2007). Drosophila in vivo RNAi reagents are either long dsRNA hairpins, for which gene fragments are cloned as an inverted repeat, or short hairpins synthesized as oligonucleotides and then cloned into an expression vector (Perrimon et al. 2010). Altogether, ∼90% of annotated Drosophila genes are targeted by fly RNAi collections from the Vienna Drosophila RNAi Center (VDRC), National Institute of Genetics (NIG) RNAi Resources in Japan, and the Transgenic RNAi Project (TRiP) at Harvard Medical School (Dietzl et al. 2007; Ni et al. 2008, 2009, 2011; Yamamoto 2010). Several large-scale transgenic RNAi screens have been successfully performed (reviewed in Perrimon et al. 2010) and numerous in vivo Drosophila RNAi projects are ongoing.
Obtaining meaningful results from RNAi-based studies is entirely reliant upon appropriate identification of the sequence-specific gene target(s) of the reagent. Target identification might appear to be a simple problem but this is not necessarily the case. Even though sequences associated with RNAi reagents are static (e.g., the sequences of oligonucleotides used to make a library do not change), the reference sequences and gene annotations, including gene boundaries, exon–intron boundaries, and nomenclature, are constantly being updated. Reevaluations of existing RNAi libraries have shown that by the time of reanalysis, a percentage of reagents do not target any gene or are no longer predicted to be specific (Horn et al. 2010; Qu et al. 2011). For a genome-wide C. elegans RNAi feeding library made available in 2003, for example, reanalysis in 2011 revealed that 18% of reagents needed to be reannotated (Qu et al. 2011).
For Drosophila, FlyBase is the primary resource of integrated genetic and genomic information, and FlyBase makes regular corrections and additions to gene models (FlyBase Consortium 2003) Since January 2008, FlyBase has released updated gene annotations ∼10 times per year. Because several years can pass between the design of RNAi reagents and their use or data analysis, many new FlyBase annotations are released between reagent design and experimental design, and even more between reagent design and data analysis. Off-target effects (OTEs) are also relevant to the annotation of RNAi reagents. OTEs are induced by unintended cross-hybridization between RNAi reagents and endogenous sequences other than the target (Kulkarni et al. 2006; Moffat et al. 2007). As the sequences of genes and transcripts change at each gene annotation release, annotation of potential OTEs can also change over time. Correcting for changes is not simply a matter of keeping up with new gene names and synonyms. Updates can change predictions as to the target gene, the number of predicted off targets, isoform specificity, etc. As a result, it is critically important to regularly update the annotation of RNAi reagents and make this information readily accessible to the researchers who plan, execute, and analyze RNAi-based experiments.
Several tools are available for the design of RNAi reagents, including SnapDragon for long dsRNAs (Flockhart et al. 2006, 2012), DSIR for siRNAs (Vert et al. 2006; Filhol et al. 2012), and E-RNAi and NEXT-RNAi (Arziman et al. 2005; Horn and Boutros 2010; Horn et al. 2010) for long dsRNAs and siRNAs. Nevertheless, a web-based tool that addresses the dynamic nature of gene annotation has not previously been available. Although E-RNAi can be used to evaluate long dsRNAs and siRNAs, the reference gene information for Drosophila in E-RNAi is currently out of date (FlyBase release 5.19 from July 2009). NEXT-RNAi was designed to be integrated into a back-end design/annotation pipeline and there is not currently an openly accessible web-based user interface for the approach. In addition, NEXT-RNAi does not distinguish between RNAi reagents generated from genome DNA vs. cDNA templates, a feature that is relevant to accurate annotation.
To best support community needs, the ideal tool would be based on regular, automated retrieval of new genome assemblies and gene annotation releases. The ideal tool would also handle the dynamic nature of reagent collections via regular, automatic retrieval of new reagent information from major public resources. To meet these needs, we developed a tool that allows users to query existing RNAi reagents from various sources based on the current gene annotation. The tool also allows researchers to query up-to-date information regarding gene target using user-provided RNAi reagent sequences.
Materials and Methods
Data sources
Reference gene information is downloaded from the following sources: FlyBase for Drosophila melanogaster gene annotation (ftp://ftp.flybase.net/releases/current/); WormBase for C. elegans gene annotation (ftp://ftp.wormbase.org/pub/wormbase/releases/current-production-release/species/c_elegans/PRJNA13758/); RefSeq for human and mouse gene annotation (ftp://ftp.ncbi.nlm.nih.gov/refseq/release/).
RNAi reagent information is queried and downloaded from the following sources: FlyRNAi database for information regarding DRSC and TRiP reagents (http://www.flyrnai.org/); GenomeRNAi ftp site for DKFZ library (http://b110-wiki.dkfz.de/signaling/wiki/download/attachments/917513/Annotation_1stPCR_fulllibrary_HD2.xls); NIG catalog for NIG RNAi transgenic lines (http://www.shigen.nig.ac.jp/fly/nigfly/); VDRC catalog for VDRC RNAi transgenic lines (http://stockcenter.vdrc.at/control/fullCatalogueExcel).
Data annotation pipeline
The data annotation pipeline (Figure 1) includes the following: (1) A module for automatic retrieval of reference genes and reagent information. This module downloads information from corresponding locations daily. The annotation pipeline is triggered whenever there is a new release from FlyBase or National Center for Biotechnology Information (NCBI) RefSeq, and/or when any new reagents become available. (2) A module that processes reference gene information for each species, respectively, to assemble a gene lookup table, the BLASTable database of genomic sequence for virtual PCR, as well as the BLASTable database of transcript sequences for virtual PCR of the reagents made from the cDNA library and on-target/off-target gene search. (3) A module that processes RNAi reagent information from each source to assemble the RNAi reagent lookup table and a BLASTable database of RNAi reagent sequences used when the end user queries UP-TORR by gene sequence. (4) A module that assembles the sequences of dsRNA reagents by in silico PCR. With the exception of the majority of NIG reagents, which have sequences assembled based on sequence validation data, long dsRNA reagents have not been sequenced and thus, the most accurate information is the sequences of PCR primers. Virtual PCR is performed with each new FlyBase release for all relevant reagents using either genomic sequence or transcript sequence, depending on how the reagents were initially generated, by BLASTing the primer sequences against the corresponding BLASTable database. (5) A module that matches the sequences of RNAi reagents to transcript sequences by BLAST. (6) A module that summarizes the on-target/off-target search results based on user-defined parameters and presents the summary table to the end user. (7) A module that matches the gene sequences submitted by the end user to reagent sequences by BLAST. (8) A module that aligns reagents to genomic sequences of reference genes and reformats the information about the reference gene and RNAi reagents into the Generic Feature Format 3 (GFF3) for upload to JBrowse, facilitating visual display of gene/reference alignment to the end user.
Software
The BLAST program from NCBI (Altschul et al. 1990) is among the research applications already installed on the Orchestra platform at Harvard Medical School. The BLAST parameters for virtual PCR: -W 10 -e 1 -G 5 -E 2; cutoff for virtual PCR: 100% identity; BLAST parameters for on-target/off-target searches: -W 14 -e 10 -G 5 -E 2 -F F; cutoff for on-target search: 27 bp or longer with ≥98% identity; cutoff for off-target search: 15-bp alignment or longer. JBrowse was downloaded from jbrowse.org/install (Skinner et al. 2009). More detailed information can be found at jbrowse.org/developer. Programs for reagent annotation were written in Perl and the user interface was developed using HTML, JavaScript, Java servlets, and Lucene. A Perl program provided as part of the JBrowse download converts annotations from the GFF3 format to the JBrowse format.
Results
Reference genes are “moving targets” that change over time
For D. melanogaster, FlyBase is the primary resource of integrated genetic and genomic information, including up-to-date genome assemblies and gene annotations (FlyBase Consortium 2003). Since the first assembly of the D. melanogaster genome published in 2000, four subsequent genome assemblies, with the most recent one in February 2007, have occurred (Myers et al. 2000; Celniker et al. 2002; Hoskins et al. 2007). In addition to updates to the genome assembly, there have been numerous updates since 2000 to gene annotations. Particularly given the new availability of next-generation sequencing approaches, gene annotations continue to change, for example due to the addition of newly identified genes and newly identified isoforms of previously identified genes. Thus, despite the fact that Drosophila is arguably the best annotated genome among multicellular species, our knowledge of the fly genome and proteome continues to improve. Indeed, since the availability of the fifth genome assembly (i.e., over the last 6 years or so), the FlyBase consortium has released 49 updates to Drosophila gene annotations.
Exemplifying the extent of changes, for the gene annotation release issued on September 7, 2012, 123 genes and 578 protein-coding transcripts were changed relative to the previous release. Moreover, the number and type of changes to gene annotations vary with each release. To obtain a more comprehensive picture of gene annotation changes, we looked at changes to the gene annotation over the period of 1 year (FlyBase version r5.34 vs. 5.44). On the gene level, 412 new genes were added, 12 genes were retired, and the genome location of 2287 genes was changed. On the transcript level, 3407 new transcripts were added, 833 transcripts were retired, and the specific sequences of 2902 transcripts were changed. Thus, for a Drosophila RNAi reagent designed at the beginning of this period, there is an ∼30% chance that the sequence of the gene target had changed a year later. Given that the time from RNAi reagent design to availability of the reagent for experiments can be months, and the practical reality that many RNAi reagents are put to use several years after they were designed, these changes have a significant impact on RNAi reagent annotation. Notably, gene annotation changes can affect not just the on-target predictions for a given RNAi reagent but also the number of predicted OTEs associated with a given reagent and/or whether or not it is predicted to target all isoforms of the target gene. For a summary of annotation changes in FlyBase and WormBase over the past 5 years, see Supporting Information, Table S1.
Dynamic annotation of RNAi reagents
When a large amount of information is involved (in this case, information surrounding the sequence and targets of RNAi reagents), the typical approach is to use a back-end database to store the information. At the DRSC, the back-end storage is a relational MySQL database (Flockhart et al. 2006, 2012) in which a couple dozen tables are used to store information regarding gene annotations associated with DRSC and TRiP RNAi reagents. Updating gene annotations as frequently as FlyBase releases updates is not trivial and as a result, such databases are usually out of sync with the most current release, a situation that is acceptable for most RNAi reagents but potentially misleading for a subset of reagents for which the corresponding gene annotations have changed significantly. Moreover, forever associating the RNAi reagent with its originally intended target might bias interpretation of RNAi results, even when information about alternative targets is also presented.
To address this issue, we developed a new strategy and developed a dynamic annotation tool that is “blind” to the original target gene annotations, basing the final reports presented online solely on updated information. The tool, which we named UP-TORR for updated targets of RNAi reagents, daily and automatically accesses the ftp sites available at FlyBase, WormBase, as well as RefSeq database at NCBI and whenever a new release is available, retrieves all of the new sequence and gene annotation information. Thus, at any given time, a query of UP-TORR will generate the most updated results available. For cell-based RNAi reagents from the DRSC and DKFZ as well as in vivo long hairpin reagents generated by VDRC and Ahringer lab, PCR primer sequences are aligned to the up-to-date genome assembly sequence, generating virtual PCR products. The sequences of these PCR products are then BLASTed against transcript sequences to identify the current on-target and off-target predictions. The process is similar for in vivo long hairpin reagents generated by TRiP, except that for these, transcript sequences are used to generate the virtual PCR product, as the template used to generate these was cDNA rather than genomic DNA. For the in vivo long hairpin reagents generated by NIG, because most reagent sequences were assembled by end-to-end sequencing, for these reagents we skip the virtual PCR step and go directly to BLASTing RNAi sequences against transcript sequences. When a user enters a pair of primers for analysis, the user can specify if genomic DNA or transcript sequences should be used in the virtual PCR step. For shRNA reagents, both the 21-bp sense-strand and antistrand sequences, which originated as synthetic oligonucleotides, can be directly BLASTed against transcript sequences (Figure 1).
During the reagent “live reannotation” process, UP-TORR is designed to answer the following questions: (1) What are all of the possible gene targets? (2) Does the reagent target all isoforms or only some isoforms of the gene? (3) What region of the transcript(s) does the reagent target, i.e., the 5′-UTR, CDS, or 3′-UTR? (4) Are there potential off-target genes that share a certain level of sequence similarity? The on-target matches are relative to the full reagent sequence with at least 17 bp matches for shRNA and 27 bp perfect match for long dsRNA, whereas off-target matches can be as short as 15 bp matches. The user can specify the cutoffs at the user interface.
Using this tool, we reannotated all the RNAi reagents generated at DRSC, DKFZ, VDRC, NIG, and TRiP based on FlyBase release 5.49 (Table 1). We found that a percentage of the reagents no longer met the original design goal. For example, within the TRiP shRNA collection, 3% of reagents were predicted at the time of our reannotation with UP-TORR to target multiple genes. Some of these are due to high sequence similarity of the paralogous genes such as His1, His2A, His2B, and His3 families, respectively, making it impossible to design gene-specific RNAi reagents. Additionally, the Drosophila genome is more compact than the mammalian genome, and some genes are located close to each other or fully overlap on the genome as well as at the transcript level. For example, the genes cup and CG34310 are both located at 6663968–6674780 on the + strand of chromosome 2L. Their transcripts are also identical and the only difference is the protein-coding regions (Figure 2A). In cases like this, it is impossible to design any RNAi reagent targeting one gene but not the other. Another example is eIF-2gamma and Su(var)3-9. These genes partially overlap on both the genome and transcript levels. TRiP reagent HMS00279 happened to target exons shared by the two genes; therefore, the library could be improved by targeting the regions specific to each gene (Figure 2B). In addition, 0.8% of reagents do not target any genes in the release we were testing. They aligned to introns (Figure 2C), intergene regions (Figure 2D), or pseudogenes (Figure 2E) due to the changes in the intron–exon boundary, gene boundary, or gene retirement.
Table 1. Summary of major public Drosophila and Caenorhabditis elegans RNAi reagent collections.
RNAi collection | Reagent type | All reagents | Target 1 gene, all isoforms | Target 1 gene, not all isoforms | Target multiple genes | Reagents with >5 OTEs (19 bp) | Target pseudogene | No gene target |
---|---|---|---|---|---|---|---|---|
DKFZ | dsRNA, cell based | 20,016 | 15,948 (80%) | 816 (4%) | 1466 (7%) | 394 (2%) | 78 (0.4%) | 1708 (9%) |
DRSC-Genome Library | dsRNA, cell based | 24,037 | 19,615 (82%) | 1664 (7%) | 979 (4%) | 552 (2%) | 84 (0.4%) | 1695 (7%) |
DRSC-Followup Library | dsRNA, cell based | 9448 | 8519 (90%) | 470 (5%) | 296 (3%) | 15 (0.2%) | 14 (0.2%) | 149 (2%) |
NIG | dsRNA, transgenic fly | 11,725 | 10,328 (88%) | 532 (5%) | 436 (4%) | 416 (4%) | 2 (0.02%) | 427 (4%) |
TRiP-LongHairPin | dsRNA, transgenic fly | 2483 | 2255 (91%) | 114 (5%) | 72 (3%) | 8 (0.3%) | 2 (0.1%) | 40 (2%) |
TRiP-ShortHairPin | shRNA, transgenic fly | 4132 | 3738 (90%) | 242 (6%) | 120 (3%) | 0 (0%) | 2 (0.1%) | 30 (1%) |
VDRC-GD Library | dsRNA, transgenic fly | 21,808 | 18,607 (85%) | 962 (4%) | 1357 (6%) | 745 (3%) | 60 (0.3%) | 822 (4%) |
VDRC-KK Library | dsRNA, transgenic fly | 10,748 | 9135 (85%) | 431 (4%) | 378 (4%) | 67 (1%) | 27 (0.3%) | 777 (7%) |
Ahringer Library | dsRNA, worm feeding | 16,256 | 11,002 (68%) | 678 (4%) | 1733 (11%) | 1074 (7%) | 283 (2%) | 2843 (16%) |
DKFZ, German Cancer Research Center; DRSC, Drosophila RNAi Screening Center; NIG, National Institute of Genetics (Japan); TRiP, Transgenic RNAi Project; VDRC, Vienna Drosophila RNAi Center.
Our comparison of FlyBase releases (r5.34 and r5.44) shows that 3407 new transcripts were added and 833 transcripts were removed. Thus, it is more likely that a new isoform will be added than that an existing isoform will be retired. An RNAi reagent may fail to target all isoforms even though it was initially designed to be isoform unspecific. According to FlyBase release 5.49, 38% of fly genes have more than one isoform. We found that 90% of TRiP shRNA reagents still target all isoforms, whereas 6% target one or a subset of isoforms based on current isoform annotation. Some of these reagents are limited by the genes themselves, which lack exons common among all isoforms (Figure 2F), whereas others could be improved (Figure 2G) by targeting regions shared by all isoforms. Because isoforms can be expressed specifically in certain tissues or under certain pathological conditions, and/or might have divergent functions, providing annotation at the isoform level is important for the appropriate identification of RNAi reagents and interpretation of RNAi results.
Online features of UP-TORR
To provide researchers with the most current and accurate annotation of RNAi reagents, we developed a freely accessible web-based application. To accommodate the full spectrum of community needs regarding reagent identification and live reannotation, we have provided users with five different ways to query UP-TORR. After selecting the species (Drosophila, C. elegans, mouse, or human) from the appropriate menu tab, users can (1) enter the gene-specific region of an RNAi reagent sequence (i.e., a 19–21 bp sense/antisense strand corresponding to a siRNA or short hairpin, or a DNA sequence corresponding to a dsRNA); (2) enter PCR primers for dsRNA, then choose the proper PCR template (genomic DNA or cDNA); (3) enter a list of RNAi reagent IDs (e.g., DRSC amplicon ID, GenomeRNAi amplicon ID, TRiP stock ID, NIG stock ID, VDRC transformant ID or Ahringer primer pair ID); (4) enter a list of gene identifiers for which all relevant reagents will be retrieved (e.g., FlyBase FBgn IDs, CG numbers, and/or gene symbols); or (5) enter the sequence to be targeted (e.g., a full-length transcript or exon sequence). For query types 1–3, in which an RNAi reagent is the input, UP-TORR returns a summary of all of the potentially targeted genes, including gene identifiers such as FlyBase FBgn number for fly and NCBI Entrez GeneID for other species, gene symbol, and gene isoform information, as well as the region and location of each isoform that is targeted. UP-TORR also reports the number of possible off-target genes, which is hyperlinked to detailed information about the genes (Figure 3). For query types 4 and 5, in which a target gene is the input, all of the RNAi reagents deemed relevant by the live reannotation are reported, along with a similar summary of information about isoform specificity and predicted OTEs. These search options allow users to retrieve all the available RNAi reagents quickly without searching individual resources. In addition, users can easily compare all RNAi reagents available for a given gene and select the best one(s). There has been ongoing effort evaluating the efficiency of TRiP RNAi transgenic lines by phenotyping and/or qPCR analysis. To help UP-TORR users select the most efficient reagent(s), TRiP stock IDs are hyperlinked to a page that includes validation results. With query type 5, in addition to full gene or transcript sequences, users can also enter specific exon or domain sequences to identify reagents, specifically targeting the transcript region of interest. For all query types, results are hyperlinked to an instance of JBrowse, where alignment of the RNAi reagents with genes and transcripts is displayed visually. Users also have the option to download a summary table of results and supporting information.
Finally, we note that when the output species is Drosophila or C. elegans, the output page from a DRSC Integrative Ortholog Prediction Tool (DIOPT) (flyrnai.org/diopt) or DIOPT diseases and traits (DIOPT-DIST) search (flyrnai.org/diopt-dist) (Hu et al. 2011) has been modified to include a button that carries the gene list forward from DIOPT or DIOPT-DIST to UP-TORR. We expect this should help facilitate identification of RNAi reagents relevant to conserved and disease-related genes.
Discussion
There is a necessary passage of time between the design of RNAi reagents and their use, as well as between design and analysis of results (and later reinterpretation of RNAi data, such as in meta-analyses) (Horn et al. 2010; Qu et al. 2011). As we have presented, gene annotations change over time (Table S1), leading to changes in what the latest evidence suggests is the appropriate interpretation of RNAi on-target and off-target potential. The UP-TORR approach and accompanying freely accessible user interface make it possible for researchers to identify RNAi reagents and/or interpret the results of RNAi studies based on the most current annotation available from FlyBase. Our analysis of RNAi reagents from all the public RNAi collections show that a small percentage of RNAi reagents did not meet initial design goals upon reannotation (i.e., they were no longer predicted to be gene specific and isoform nonspecific with regards to the intended target gene). By comparing the different FlyBase releases, we further found that the coding CDS are less likely to change as compared with untranslated regions (5′- or 3′-UTRs). This likely reflects the fact that it has historically been easier both computationally and experimentally to identify coding sequences than to identify full-length transcripts.
Because UP-TORR checks for updates at FlyBase, WormBase, as well as RefSeq database daily and incorporates these new data, facilitating what we refer to as a live reannotation of RNAi reagent information, the tool will be valuable to anyone interested in designing, analyzing, or reanalyzing RNAi results, including results from high-throughput screens. We recognize, however, that results from UP-TORR or any other up-to-date comparison with the current annotation of genomes and/or transcriptomes does not necessarily provide the “final word” on RNAi on-target and off-target effects. For example, RNAi treatments can have generalized, gene nonspecific effects (Muller et al. 2008). In addition, SNPs (Chen et al. 2009), RNA editing (Rodriguez et al. 2012), and chimeric transcripts (Frenkel-Morgenstern et al. 2013) can complicate the prediction of the on-target as well as off-target genes of RNAi reagents. Nevertheless, UP-TORR is the first tool available to address the issue of genome annotation and RNAi sequences. Importantly, the tool provides up-to-date annotation for RNAi reagents targeting human (Figure S1) and mouse genes, as well as for Drosophila and C. elegans, and could easily be expanded to include more species. In the future, this tool might be applied to other methods (e.g., Transcription Activator-Like Effectors (TALE) (Christian et al. 2010) and Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) (Cong et al. 2013)) for which gene annotation impacts interpretation of the reagents.
Supplementary Material
Acknowledgements
We thank the members of the Drosophila RNAi Screening Center (DRSC), Transgenic RNAi Project (TRiP), and Perrimon lab, in particular Laura Holderbaum and Dong Yan, for helpful comments and support. We thank Thomas Micheler from the Vienna Drosophila RNAi Center for help with data access. We also thank all of the past and current users of the DRSC and TRiP for providing valuable feedback. This work was supported in large part by the National Institute of General Medical Sciences R01 GM067761 and GM084947, with additional support from the National Center for Research Resources and Office of Research Infrastructure Programs (NCRR/ORIP) R24 RR032668. S.E.M. is also supported in part by the Dana Farber/Harvard Cancer Center and N.P. is an investigator of the Howard Hughes Medical Institute.
Note added in proof: See Kulathinal, 2013 (pp. 7–8) in this issue, for a related work.
Footnotes
Communicating editor: L. M. McIntyre
Literature Cited
- Altschul S. F., Gish W., Miller W., Myers E. W., Lipman D. J., 1990. Basic local alignment search tool. J. Mol. Biol. 215: 403–410. [DOI] [PubMed] [Google Scholar]
- Arziman Z., Horn T., Boutros M., 2005. E-RNAi: a web application to design optimized RNAi constructs. Nucleic Acids Res. 33: W582–W 588. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Boutros M., Ahringer J., 2008. The art and design of genetic screens: RNA interference. Nat. Rev. Genet. 9: 554–566. [DOI] [PubMed] [Google Scholar]
- Celniker, S. E., D. A. Wheeler, B. Kronmiller, J. W. Carlson, A. Halpern et al, 2002. Finishing a whole-genome shotgun: release 3 of the Drosophila melanogaster euchromatic genome sequence. Genome Biol. 3: RESEARCH0079. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen D., Berger J., Fellner M., Suzuki T., 2009. FLYSNPdb: a high-density SNP database of Drosophila melanogaster. Nucleic Acids Res. 37: D567–570. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Christian M., Cermak T., Doyle E. L., Schmidt C., Zhang F., et al. , 2010. Targeting DNA double-strand breaks with TAL effector nucleases. Genetics 186: 757–761. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Clemens J. C., Worby C. A., Simonson-Leff N., Muda M., Maehama T., et al. , 2000. Use of double-stranded RNA interference in Drosophila cell lines to dissect signal transduction pathways. Proc. Natl. Acad. Sci. U S A 97: 6499–6503. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cong L., Ran F. A., Cox D., Lin S., Barretto R., et al. , 2013. Multiplex genome engineering using CRISPR/Cas systems. Science 339: 819–823. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dietzl G., Chen D., Schnorrer F., Su K. C., Barinova Y., et al. , 2007. A genome-wide transgenic RNAi library for conditional gene inactivation in Drosophila. Nature 448: 151–156. [DOI] [PubMed] [Google Scholar]
- Filhol O., Ciais D., Lajaunie C., Charbonnier P., Foveau N., et al. , 2012. DSIR: assessing the design of highly potent siRNA by testing a set of cancer-relevant target genes. PLoS One 7: e48057. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Flockhart I., Booker M., Kiger A., Boutros M., Armknecht S., et al. , 2006. FlyRNAi: the Drosophila RNAi screening center database. Nucleic Acids Res. 34: D489–D494. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Flockhart I. T., Booker M., Hu Y., McElvany B., Gilly Q., et al. , 2012. FlyRNAi.org–the database of the Drosophila RNAi screening center: 2012 update. Nucleic Acids Res. 40: D715–719. [DOI] [PMC free article] [PubMed] [Google Scholar]
- FlyBase Consortium , 2003. The FlyBase database of the Drosophila genome projects and community literature. Nucleic Acids Res. 31: 172–175. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fraser A. G., Kamath R. S., Zipperlen P., Martinez-Campos M., Sohrmann M., et al. , 2000. Functional genomic analysis of C. elegans chromosome I by systematic RNA interference. Nature 408: 325–330. [DOI] [PubMed] [Google Scholar]
- Frenkel-Morgenstern M., Gorohovski A., Lacroix V., Rogers M., Ibanez K., et al. , 2013. ChiTaRS: a database of human, mouse and fruit fly chimeric transcripts and RNA-sequencing data. Nucleic Acids Res. 41: D142–151. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hammond S. M., Bernstein E., Beach D., Hannon G. J., 2000. An RNA-directed nuclease mediates post-transcriptional gene silencing in Drosophila cells. Nature 404: 293–296. [DOI] [PubMed] [Google Scholar]
- Horn T., Boutros M., 2010. E-RNAi: a web application for the multi-species design of RNAi reagents–2010 update. Nucleic Acids Res. 38: W332–W 339. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Horn T., Sandmann T., Boutros M., 2010. Design and evaluation of genome-wide libraries for RNA interference screens. Genome Biol. 11: R61. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hoskins R. A., Carlson J. W., Kennedy C., Acevedo D., Evans-Holm M., et al. , 2007. Sequence finishing and mapping of Drosophila melanogaster heterochromatin. Science 316: 1625–1628. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hu Y., Flockhart I., Vinayagam A., Bergwitz C., Berger B., et al. , 2011. An integrative approach to ortholog prediction for disease-focused and other functional studies. BMC Bioinformatics 12: 357. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kamath R. S., Fraser A. G., Dong Y., Poulin G., Durbin R., et al. , 2003. Systematic functional analysis of the Caenorhabditis elegans genome using RNAi. Nature 421: 231–237. [DOI] [PubMed] [Google Scholar]
- Kittler R., Surendranath V., Heninger A. K., Slabicki M., Theis M., et al. , 2007. Genome-wide resources of endoribonuclease-prepared short interfering RNAs for specific loss-of-function studies. Nat. Methods 4: 337–344. [DOI] [PubMed] [Google Scholar]
- Kulkarni M. M., Booker M., Silver S. J., Friedman A., Hong P., et al. , 2006. Evidence of off-target effects associated with long dsRNAs in Drosophila melanogaster cell-based assays. Nat. Methods 3: 833–838. [DOI] [PubMed] [Google Scholar]
- Micklem D. R., Lorens J. B., 2007. RNAi screening for therapeutic targets in human malignancies. Curr. Pharm. Biotechnol. 8: 337–343. [DOI] [PubMed] [Google Scholar]
- Moffat J., Reiling J. H., Sabatini D. M., 2007. Off-target effects associated with long dsRNAs in Drosophila RNAi screens. Trends Pharmacol. Sci. 28: 149–151. [DOI] [PubMed] [Google Scholar]
- Mohr S., Bakal C., Perrimon N., 2010. Genomic screening with RNAi: results and challenges. Annu. Rev. Biochem. 79: 37–64. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mohr S. E., Perrimon N., 2012. RNAi screening: new approaches, understandings, and organisms. Wiley Interdiscip. Rev. RNA 3: 145–158. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Muller P., Boutros M., Zeidler M. P., 2008. Identification of JAK/STAT pathway regulators–insights from RNAi screens. Semin. Cell Dev. Biol. 19: 360–369. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Myers E. W., Sutton G. G., Delcher A. L., Dew I. M., Fasulo D. P., et al. , 2000. A whole-genome assembly of Drosophila. Science 287: 2196–2204. [DOI] [PubMed] [Google Scholar]
- Ni J. Q., Markstein M., Binari R., Pfeiffer B., Liu L. P., et al. , 2008. Vector and parameters for targeted transgenic RNA interference in Drosophila melanogaster. Nat. Methods 5: 49–51. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ni J. Q., Liu L. P., Binari R., Hardy R., Shim H. S., et al. , 2009. A Drosophila resource of transgenic RNAi lines for neurogenetics. Genetics 182: 1089–1100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ni J. Q., Zhou R., Czech B., Liu L. P., Holderbaum L., et al. , 2011. A genome-scale shRNA resource for transgenic RNAi in Drosophila. Nat. Methods 8: 405–407. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Perrimon N., Ni J. Q., Perkins L., 2010. In vivo RNAi: today and tomorrow. Cold Spring Harb. Perspect. Biol. 2: a003640. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Qu W., Ren C., Li Y., Shi J., Zhang J., et al. , 2011. Reliability analysis of the Ahringer Caenorhabditis elegans RNAi feeding library: a guide for genome-wide screens. BMC Genomics 12: 170. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rodriguez J., Menet J. S., Rosbash M., 2012. Nascent-seq indicates widespread cotranscriptional RNA editing in Drosophila. Mol. Cell 47: 27–37. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Root D. E., Hacohen N., Hahn W. C., Lander E. S., Sabatini D. M., 2006. Genome-scale loss-of-function screening with a lentiviral RNAi library. Nat. Methods 3: 715–719. [DOI] [PubMed] [Google Scholar]
- Skinner M. E., Uzilov A. V., Stein L. D., Mungall C. J., Holmes I. H., 2009. JBrowse: a next-generation genome browser. Genome Res. 19: 1630–1638. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vert J. P., Foveau N., Lajaunie C., Vandenbrouck Y., 2006. An accurate and interpretable model for siRNA efficacy prediction. BMC Bioinformatics 7: 520. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yamamoto M. T., 2010. Drosophila Genetic Resource and Stock Center; The National BioResource Project. Exp. Anim. 59: 125–138. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.