Abstract
CRISPR/Cas9 screens have been widely adopted to analyse coding gene functions, but high throughput screening of non-coding elements using this method is more challenging, because indels caused by a single cut in non-coding regions are unlikely to produce a functional knockout. A high-throughput method to produce deletions of non-coding DNA is needed. Herein, we report a high throughput genomic deletion strategy to screen for functional long non-coding RNAs (lncRNAs) that is based on a lentiviral paired-guide RNA (pgRNA) library. Applying our screening method, we identified 51 lncRNAs that can positively or negatively regulate human cancer cell growth. We individually validated 9 lncRNAs using CRISPR/Cas9-mediated genomic deletion and functional rescue, CRISPR activation or inhibition, and gene expression profiling. Our high-throughput pgRNA genome deletion method should enable rapid identification of functional mammalian non-coding elements.
The CRISPR/Cas system from bacteria and archaea1 has been developed into a genome-editing tool with wide-ranging applications2–4. Functional screens of coding genes have been widely adopted, where pooled libraries of single guide RNAs (sgRNAs) target the coding regions of genes associated with specific phenotypes can be selected using cell growth or specific markers as a readout5–12. Although similar strategies have been used to tile across regulatory elements to investigate cis-element function9,13,14, such strategies may not work as well for non-coding elements since indels caused by one gRNA are unlikely to produce loss-of-function phenotypes. Although two gRNAs have been used to generate a large genomic deletion to investigate the function of individual lncRNAs15,16, a high-throughput screening using such approach has not been reported. We developed a CRISPR/Cas9 strategy utilizing paired-gRNAs (pgRNAs) to produce large-fragment deletions and enable the identification of functional long non-coding RNAs in cancer cells.
RESULTS
Lentivirally-delivered paired guide RNA system
We constructed a CRISPR pgRNA library such that the genomic sequences between two gRNA-targeting sites could be deleted. First, we tested two approaches to express the pgRNAs in one lentiviral backbone—two U6 promoters driving the two gRNAs separately (U62) and single U6 promoter driving two gRNAs linked consecutively (U61) (Fig. 1a). We compared these two approaches using six pairs of gRNAs that were predicted to delete from 2 to 4.5 kb of the human CSPG4 locus (gene CSPG4 encodes an integral membrane chondroitin sulfate proteoglycan) (Fig. 1b,c and Supplementary Table 1). In the liver cancer cell line Huh7.5OC which stably expresses Cas9 and OCT1 genes8,17, all six pgRNAs in a U62 vector produced genomic deletions with the correct sizes, whereas only two pgRNAs in a U61 vector produced the correct deletion, and at a much lower efficiency (Fig. 1c). Five pgRNAs in U62 targeting the lncRNA MALAT1 also produced genomic deletions of the correct sizes with high efficiency (Supplementary Fig. 1a,b and Supplementary Table 2). This suggests that U62 has superior deletion efficiency, and was adopted for subsequent experiments. We next investigated whether culturing time post-transduction of lentivirally-delivered pgRNAs affected the efficiency of genomic deletion, and observed continued genomic deletion over time that reached a plateau around 15 days post-transduction (Fig. 1d). Similar results were observed when genomic deletions were induced using different pgRNAs targeting CSPG4 (2+2′, Fig. 1b) or MALAT1 (2+2′, Supplementary Fig. 1a,c,d). Therefore, culturing library cells for at least 2 weeks post-transduction is desirable to allow sufficient time to produce genomic deletions in mammalian cells at a level that is optimal for screening. Genomic sequencing of five pgRNAs targeting regions in total (3 pgRNAs targeting CSPG4 and 2 pgRNAs targeting MALAT1) revealed that almost 80% of the deletions at each site were the precise joining of two Cas9 cleavage sites 3-nt upstream of the protospacer adjacent motifs (PAMs) (Fig. 1e and Supplementary Fig. 1e), consistent with previous findings18. Taken together, lentivirally-delivered pgRNAs are capable of creating large genomic deletions with high efficiency in mammalian cells.
pgRNA library construction and genome-wide lncRNA deletion screen
A pgRNA library targeting around 700 human lncRNA genes (Fig. 2a, Supplementary Table 3 and Online Methods) with known or putative roles in cancers or other diseases19 was designed. For each lncRNA target, we first identified all possible 20-nt gRNAs adjacent to the canonical PAM, then filtered gRNAs that were predicted to have low cutting specificity20 or efficiency21 (Online Methods and Supplementary Code). We selected gRNA pairs with one unique gRNA as a barcode for each pair (Online Methods), and developed a rapid and accurate method to clone the pgRNAs into a lentiviral expression vector (Fig. 2b and Supplementary Fig. 2a,b). Since the two gRNAs in each pair are driven by the same type of U6 promoter and contain identical 3′ scaffold sequences, recombination might occur which could result in erroneous pgRNA pairing. We tested the recombination rate in both pgRNA library plasmid constructs and chromosomal integrations in cells after transduction, and found that recombination occurred after viral transduction in cells at approximately 7.5%, which is comparable to oligo-synthesis error rates (Supplementary Table 4). This suggests that recombination should have a negligible effect on pgRNA library screening.
We constructed our pgRNA library in U62 at low multiplicity of infection (MOI) into Huh7.5OC that has previously been used for functional screening for coding genes17. We used 30 days of culturing post-transduction to try and maximize the identification of lncRNAs that either positively or negatively affect cell growth or viability. PCR amplified barcode-gRNA regions from the extracted genomic DNA of cells before and after CRISPR screening were subjected to deep sequencing analysis (Fig. 2c and Supplementary Fig. 2c). Overall, the read distribution of 3 independent experimental replicates within each condition showed a high level of correlation (Fig. 3a and Supplementary Fig. 3). After 30-day of culturing, pgRNAs targeting either positive control genes (mostly ribosomal genes) or lncRNAs were depleted compared with negative control pgRNAs (non-targeting pgRNAs or pgRNAs targeting the non-functional AAVS1 loci) (Fig. 3b), indicating their effect on cell survival or proliferation.
We used the MAGeCK algorithm to identify the top hits by comparing samples in day-30 with day-0 controls22. MAGeCK evaluates the statistical significance of individual pgRNA abundance changes using a negative binomial (NB) model, and compares the ranks of pgRNAs targeting each lncRNA with a null model of uniform distribution (Online Methods). The output of MAGeCK is a set of negatively (or positively) selected lncRNAs, or lncRNAs whose knockout disrupts (or stimulates) cell proliferation. In total, MAGeCK identified 43 negatively selected and 8 positively selected lncRNAs with statistical significance (false discovery rate < 0.25, Supplementary Table 5). Gene Set Enrichment Analysis (GSEA) showed that positive control pgRNAs were significantly enriched in the ranked list of negatively selected pgRNAs (Fig. 3c), as expected given the essential roles of their targets23. The top negatively selected genes include two positive control genes: RPL18A, a ribosomal gene, and EZH2, a gene encodes a member of the Polycomb-group family that has an essential role in the proliferation of liver cancer cells24. pgRNAs targeting the promoters and exons of RPL18A and EZH2 were consistently depleted (Fig. 3d,e). Similarly, 89% of the pgRNAs targeting top-ranked negatively selected lncRNAs were depleted while 76% of the pgRNAs targeting positively selected lncRNAs were enriched (Fig. 3f,g and Supplementary Fig. 4a,b). In contrast, the abundances of pgRNAs with non-targeting controls and targeting the AAVS1 loci were similar between control and treatment conditions (Supplementary Fig. 4c). Intriguingly, 266 pgRNAs targeting 25 intronic regions of essential genes decreased cell viability (Fig. 3d), possibly due to the deletion of regulatory elements or modulation of alternative splicing of the target genes25,26.
Validation of selected lncRNA candidates
From the positively or negatively selected lncRNAs with statistical significance, we obtained top ranked hits whose corresponding pgRNAs were consistently depleted (for negative selection) or enriched (for positive selection) in 3 independent experimental replicates, respectively (Fig. 3f,g and Supplementary Fig. 5). To validate the functions of some of these lncRNAs, we chose 2 pairs of gRNAs that were present in the original screening library and designed up to 3 additional new pgRNAs for each gene. In addition, 3 pairs of gRNAs were designed to target the AAVS1 loci to serve as negative controls (Supplementary Table 6). All pgRNAs were transduced afresh into Huh7.5OC cells using a lentiviral backbone carrying CMV-EGFP, and proliferation of cells was quantified based on the percentage change of EGFP-positive cells. Deletion of the promoter of RPL18A, one ribosomal gene that ranked top of the negative selection list from the screen, strongly decreased cell proliferation, while deletions of the AAVS1 loci had negligible effect on cell growth (Fig. 4a).
Using the same method, we selected lncRNAs without any overlap with coding genes from the pgRNA library screening for validation. From the initial screen, we chose 5 negatively selected lncRNAs (AC004463.6, AC095067.1, HM13-AS1, RP11-128M1.1 and RP11-439K3.1) and 4 positively selected lncRNAs (LINC00176, LINC01087, LINC00882 and LINC00883). We designed pgRNAs to target the promoters or exons of these lncRNAs. For divergently transcribed pair LINC00882 and LINC00883, which share the same promoter, 3 additional pgRNAs were designed to target their exons. All 5 negative-selected lncRNAs were essential for cell proliferation upon individual deletion, and all 4 positive-selected lncRNAs from were confirmed to negatively regulate cell proliferation (Fig. 4b,c and Supplementary Fig. 6a,b). We further introduced a cDNA clone of LINC00882 into two groups of LINC00882-deleted Huh7.5OC cells and demonstrated that the ectopic expression of LINC00882 could inhibit cell proliferation (Supplementary Fig. 6c,d). Some pgRNAs, such as RP11-439K3.1_p3 and RP11-439K3.1_p4, did not produce phenotypes (Fig. 4b), due to their failure to generate genomic deletions (Supplementary Fig. 6e). To further validate candidate genes, we used a CRISPR-inhibitor (CRISPRi) method27 that can reduce the transcription of the targeted gene. Out of the five negatively selected lncRNAs, we were able to successful decrease the expression of three (AC004463.6, RP11-439K3.1 and AC095067.1) using CRISPRi and all significantly decreased cell proliferation (Fig. 4d). We also carried out cell lethality assays on lines with deletion on the five negatively selected lncRNAs and on the CRISPRi lines (transcription of lncRNAs repressed) and found all five lncRNAs to be essential for cell viability (Supplementary Fig. 7 and Supplementary Table 7). For positively selected gene candidates LINC01087 and LINC00882, we used a CRISPR-activator (CRISPRa) method28 to up-regulate their transcription, and found both lncRNAs to be lethal when overexpressed (Supplementary Fig. 8 and Supplementary Table 7). Therefore, the CRISPR/Cas9 screen strategy from genomic deletion works well for both negatively- and positively selected lncRNAs with high efficiency and reliability.
For both CRISPR screening and candidate validation, we introduced paired gRNAs into cells. It is possible that the phenotypic changes we observed were due to the effect of one gRNA-mediated double-strand break (DSBs) instead of pgRNAs-mediated genomic deletion. To exclude this possibility, we compared the effects of pgRNAs targeting AC004463.6 and AC095067.1 with introduction of only one of their corresponding gRNAs. Only pgRNAs significantly affected cell proliferation in both cases, while none of the single gRNAs targeting introns or exons altered cell survival (Fig. 4e and Supplementary Fig. 6f). This suggests that at least for these two lncRNAs, pgRNA-mediated genomic deletion is required to generate functional knockout, an effect unlikely achieved through indels created by single gRNAs.
Functional analysis of validated lncRNAs
We next sought to investigate the potential functions of LINC01087, one of the top positively selected lncRNAs in the screen. We knocked out LINC01087 with three different pgRNAs and observed similar changes in gene expression patterns from RNA-seq (Fig. 5a and Supplementary Fig. 9a–c). Knocking out LINC01087 did not affect the expression of neighbouring protein-coding genes (Supplementary Fig. 9d), instead up-regulated a set of genes associated with liver cancer. The up-regulated genes included FOS and FOSB (Fig. 5b) which encode members of the FOS gene family and AP-1 transcription factor complex29, liver cancer up-regulated genes, targets of the hepatocellular oncogenic transcription factor HNF4α30, and genes involved in retinol metabolism (Fig. 5c).
We also evaluated the predicted functions of our top 15 lncRNA hits using “guilt by association”, a computational approach to infer lncRNA function from the enriched functions of co-expressed coding genes31. We analysed the expressions of genes and lncRNAs in five different cancers (liver, prostate, ovarian, lung cancer and glioblastoma multiforme) using datasets from the Cancer Cell Line Encyclopedia (CCLE) and The Cancer Genome Atlas (TCGA), respectively19,32 (Online Methods). Many of the genes that were co-expressed with negatively selected lncRNAs were enriched in essential processes such as RNA metabolism and cell cycle, whereas genes correlated with positively selected lncRNAs were enriched in the negative regulation of these essential processes (Fig. 5d,e, Supplementary Fig. 10 and Supplementary Table 8). This is consistent with the finding that knockout of these negatively (or positively) selected lncRNAs disrupts (or enhances) cell proliferation and viability. One of the negatively selected lncRNAs (AC004463.6) is significantly over-expressed in liver cancer and metastatic prostate cancer (Fig. 5f), and five out of the seven negatively selected lncRNAs in Huh7.5 are significantly over-expressed in metastatic prostate cancers (Supplementary Fig. 11). In addition, two of the five validated negatively selected lncRNAs, AC004463.6 and RP11-439K3.1, were confirmed to be essential in 22RV1, a relapsed prostate cancer cell line33 (Supplementary Fig. 12). These results suggest that the lncRNAs selected in liver cancer cell lines may also function in other cancer types.
LncRNA screening in HeLa cells
To assess the functions of lncRNAs in a different cell type, we screened HeLa cells using our lncRNA library (Supplementary Table 9). Positive control pgRNAs and genes were negatively selected, an indication that our screen works well in HeLa cells as well (Fig. 5g and Supplementary Fig. 13). A further comparison of the screens done in Huh7.5 and HeLa cells revealed different roles for distinct lncRNAs in these two cell types (Fig. 5g). For the top negatively selected and validated lncRNAs in Huh7.5, we tested 5 lncRNAs in HeLa, including two that seemed to be essential (AC095067.1 and RP11-128M1.1) and three that appeared to be non-essential (HM13-AS1, AC004463.6 and RP11-439K3.1, Fig. 5h) in the HeLa cell screen. Indeed, knocking out two essential lncRNAs reduced cell proliferation, and knocking out two of the three non-essential lncRNAs had no effect on cell proliferation. Our screen missed AC004463.6, which was found to be essential in HeLa through individual validation, an indication that the current lncRNA pgRNA library still has space for improvement.
DISCUSSION
The vast majority of mammalian genomes are comprised of non-coding regions, many of which have important regulatory roles. Functional analyses of non-coding regions have been challenging, and an effective screening strategy based on genomic deletion was until now lacking. We have established a genome deletion screening method using paired gRNAs of CRISPR/Cas9 screens in mammalian cells. Using this method we screened approximately 700 human lncRNAs, and identified lncRNAs that have oncogenic or tumour suppressor activities in cancer cells. Validations of top hits using complementary technologies, such as individual CRISPR/Cas9 knockout, CRISPR inhibition/activation, gene expression profiling and expression correlation analysis, confirmed the findings of our screens and showed that our method has a high level of fidelity and specificity.
There are potential limitations to our lncRNA screen. First, deleting lncRNAs may also affect other proximal functional elements, including enhancers, microRNAs, and others. It is desirable to avoid designing pgRNAs that overlap with other functional elements where possible, to examine hits for potential enhancer function and to validate screening results using orthogonal technologies. Our screening approach could not reveal mechanisms of lncRNA action31 so a detailed investigation is needed to further understand the functions of identified lncRNAs. More than 30% of the lncRNAs we identified are located in the introns of other coding genes with diverse biological functions34. Further characterization of these lncRNAs is challenging, as disrupting introns may perturb splicing or other regulatory elements and have deleterious effects on cell proliferation (e.g., the intron targeting pgRNAs in Fig. 3d). Finally, pgRNA orientation seems to have negligible effects on the knockout phenotype, but the number of pgRNAs per lncRNA is crucial to reduce the false negative rate of the screens (Supplementary Figs. 14 and 15a). Not all of the positive controls were identified in our screen, an indication the sensitivity of the screen needs improvement. As the deletion frequencies for pgRNAs vary (Fig. 1d and Supplementary Fig. 1), a sufficient number of pgRNAs (preferably > 20) targeting each lncRNA is desirable to reduce the false negative rate.
Although our CRISPR pgRNA library might cause incorrect pgRNA assembly due to paired gRNA recombination in the lentiviral packaging and integration step, owing to the sequence similarity of two U6 promoters and two repeats of gRNA scaffold sequences, our screen was unaffected because of a limited recombination rate. However, we could optimize our methodology by using different types of U6 promoters (of human and murine origins, respectively)35 and alternative sgRNA scaffold sequences to further reduce the potential lentiviral recombination rate. Our approach could be extended to study other phenotypic changes of interest beyond simple growth by incorporating a reporter system. Finally, our paired guide RNA screening strategy could be more broadly applied to study other non-coding sequences including microRNAs, cis-elements and other uncategorized elements.
ONLINE METHODS
Cells and reagents
Huh7.5 cells were from Stanley Cohen’s laboratory (Stanford University School of Medicine) and maintained in Dulbecco’s modified Eagle’s medium (DMEM, Gibco) with MEM non-essential amino acids (NEAA, Gibco), 22RV1 cells were from Myles Brown’s laboratory and maintained in RPMI1640 medium (Gibco) and HeLa cells were from Zhengfan Jiang’s laboratory (Peking University) and were maintained in Dulbecco’s modified Eagle’s medium (DMEM, Gibco), all supplemented with 10% fetal bovine serum (FBS, CellMax) with 5% CO2 at 37°C. All cells were checked to ensure they are free of mycoplasma contamination.
Plasmid construction
The lentiviral pgRNA-expressing vector was constructed by cloning the human U6 promoter, ccdB cassette and gRNA scaffold into pLL3.7 (Addgene, Inc.) by replacing its original U6 promoter8. The scaffold-linker-U6 fragment was cloned into pEASY-Blunt plasmid (TransGen Biotech).
lncRNA selection
lncRNA targets in cancer
lncRNA targets consist of known cancer-related lncRNAs and lncRNAs that are differentially expressed in tumours. We used lncRNA expression estimation from a recent study repurposing exon array probes to lncRNAs19 and used the Limma algorithm37 to identify overexpressed lncRNAs in cancer. In total, 671 lncRNAs were selected and up to 20 pgRNAs were designed for each target. Among the 20 pairs, 10 target the promoter regions, and the other 10 target promoters plus exons.
Positive controls
Positive control genes consist of 20 genes, including 17 ribosomal genes and 3 cancer-related genes, FOXA1, HOXB13 and EZH2. We designed 100 pairs for each positive control gene, including 20 targeting promoters (the distance between two gRNAs in each pair is between 200 bp–5 kb), and 80 targeting promoters plus exons. Among the 80 pairs targeting promoters plus gene bodies, 60 were designed such that their gRNA orientations are consistent with gene orientations. This is because gRNAs with the same orientation of their targeting genes have a better knockout effect than gRNAs with distinct orientation38. The rest 20 pairs were designed to have at least one different orientation with the targeting gene.
Negative controls
We designed 500 pgRNAs of negative controls with three different types. The first type of negative controls (100 pairs) consists of pgRNAs that do not target any loci in the human genome. These pgRNAs will be constructed directly from existing non-target control gRNAs from GeCKO v2 library39. The second type of controls (100 pairs) consists of pgRNAs targeting the AAVS1 region, which is a non-essential region in genome and is frequently used in CRISPR studies for efficiency test. The third type of negative control (300 pairs) consists of pgRNAs targeting the introns of positive control genes.
sgRNA filtering and design
Target regions
For positive control genes and lncRNA, the target regions are their promoters and the whole gene bodies. For promoter regions, 5-kb upstream and 200-bp downstream loci of each TSS (Transcription Start Site) were selected as the target regions.
gRNA scanning and filtering
After the regions were selected, we identified all possible gRNAs by searching the PAM motif in the genome sequence. We only kept the gRNAs if (1) their sequences are uniquely mapped to the intended loci, (2) have at least 2 mismatches to any other loci of the genome, and (3) their predicted efficiency scores are above 0.3. The efficiency score prediction was calculated from our recently published machine-learning model21. For gRNA pairs targeting lncRNAs, we further require (4) the GC content is between 0.2 and 0.9, and (5) do not include the UUUU/TTTT polymer. This is because gRNAs with extreme GC content or with UUUU/TTTT sequence have been shown to have lower cleavage efficiency27,38.
pgRNA design
For all sgRNAs targeting each lncRNA or positive control gene, we first enumerated all possible pgRNAs, and then kept pairs that satisfy all of the following conditions:
Include one sgRNA before TSS and one after TSS;
Do not overlap with any exons of coding regions (for lncRNA targets);
Have the same sgRNA orientation as target lncRNA or gene;
Are at least 5-kb away from the promoters of coding regions (for lncRNA targets);
Are at least 50-bp away from the exon-intron boundary of coding genes (for lncRNAs located inside the introns of another coding gene).
For each lncRNA or gene, if there are not enough pgRNAs, we also included pgRNAs that (1) do not cross over TSS or (2) have different orientation compared with targeted lncRNA or gene. For all pgRNAs that pass the filter mentioned above, we next sought to identify desired number of pgRNAs with barcode (Fig. 2), and require the barcode gRNA is used only once in the library. Note that randomly assigning one of the two gRNAs as the barcode may result in some pgRNAs with no available barcode (see Supplementary Text 1). Alternatively, we designed an iterative greedy algorithm to identify possible pgRNAs and their barcodes, and proved that this algorithm can identify the optimal number of pgRNAs with barcodes (see Supplementary Text 1).
The pgRNA design algorithm, “pgRNADesign”, is open-source and freely available at https://bitbucket.org/liulab/pgrnadesign. Besides the pgRNA design and barcode assignment, pgRNADesign further allows users to specify a list of “blackout” regions. Once specified, pgRNADesign will avoid designing pgRNAs that overlap with these blackout regions.
Construction of the CRISPR/Cas9 pgRNA library
We created a library targeting 671 lncRNAs with 12,472 pairs of gRNAs as mentioned above (Supplementary Table 3). The 137-nt oligonucleotides containing each pairs of pgRNA-coding sequences were designed (Supplementary Table 10) and synthesized (CustomArray, Inc.). Then primers targeting the flanking sequences of oligonucleotides were used for the amplification to create 60-bp homologies with BsmBI digested pgRNA-expressing backbone. The amplified DNA products were ligated into the lentiviral vector using Gibson cloning method40 and were transformed into Trans1-T1 competent cells (Transgen, Biotech) to obtain the plasmids. Plasmids were then digested by BsmBI and ligated with BsmBI-digested scaffold-linker-U6 fragment (Supplementary Fig. 2b), and the ligation mixture was transformed into Trans1-T1 competent cells (Transgen, Biotech) to obtain the final library plasmids (see Supplementary Text 2 for sequences). The lentivirus of the pgRNA library was produced by co-transfection of library plasmids with two viral packaging plasmids pVSVG and pR8.74 (Addgene, Inc.) into HEK293T cells using the X-tremeGENE HP DNA transfection reagent (Roche). Huh7.5OC cell library was constructed through transduction of low MOI (~ 0.3) virus, followed by FACS for EGFP+ cells, 72 h after infection.
Recombination rate calculation
The recombination rates were calculated in both plasmid constructs and chromosomal integrations in cells after transduction. For plasmid, we amplified the entire pgRNA sequence from the library plasmid as the template. For chromosomal integrations in cells, the pgRNA sequence was amplified from the genome of library cells as the template. The PCR products were then cloned into vectors for sequencing analysis. 80 and 120 clones were randomly selected from the plasmids and the cell libraries for sequencing, respectively.
CRISPR/Cas9 pgRNA library screening
A total of 1.2×107 pgRNA library cells were plated onto 150 mm Petri dishes and three replicates were arranged. The library cells of control group were collected for genomic DNA extraction and that of experimental group were incubated for one month. Then genomic DNA of experimental group was also extracted, followed by PCR amplification of the barcode gRNA-coding regions and deep-sequencing analysis.
Identification of candidate pgRNA sequences and data analysis
The genomic DNA of every replicate was isolated from 4×106 cells using the DNeasy Blood and Tissue kit (Qiagen). gRNA-coding regions integrated into the chromosomes were then PCR-amplified (TransTaq DNA Polymerase High Fidelity, TransGen) with 28 cycles of reaction using primers targeting U6 promoter and the linker between two gRNAs of each pair (Supplementary Fig. 2 and Supplementary Table 11). In every tube, 0.6 μg of genomic DNA was used as the template and 20 PCR reactions were performed for each replicate. The PCR products of each replicate were pooled together and purified with DNA Clean & Concentrator-25 (Zymo Research Corporation), followed by deep-sequencing analysis (Illumina HiSeq 2500).
The computational analysis of screens
We used the latest version of MAGeCK (0.5.3) we previously developed to analyse the screening data22. We used the MAGeCK “count” command to generate read counts of all samples. Briefly, the qualities of fastq files are evaluated using fastqc. If the fastq files are of high quality, then all reads are mapped to the screening library without tolerating any mismatches, and the raw read counts of all pgRNAs of all samples are merged into a count matrix. The distribution of read counts is reported in Supplementary Fig. 3c, and the correlations between samples are reported in Supplementary Fig. 3a,b.
We next used MAGeCK “test” command to identify the top negatively and positively selected lncRNAs. The MAGeCK algorithm consists of 4 steps: normalization, pgRNA mean-variance modeling, pgRNA ranking and lncRNA ranking. In the normalization step, MAGeCK adjusts the effect of sequencing depth of all samples by calculating a size factor for each sample. The factor is estimated from the “median ratio normalization” approach described before41. Instead of calculating the size factor from all pgRNAs (the default normalization method for MAGeCK), we estimated the size factor from all AAVS1 targeting pgRNAs (“AAVS1 normalization”), since AAVS1 normalization provides a more realistic estimation about the log fold change distribution of the negative control pgRNAs (Supplementary Fig. 15b). In the mean-variance modelling step, MAGeCK estimates the mean and variance of every pgRNA across independent experimental replicates, and fits a linear regression model to better estimate variances based on the mean of pgRNA counts. In the sgRNA-ranking step, MAGeCK estimates the p value of every pgRNA based on the negative binomial (NB) model of read counts. The parameters of the NB distribution are estimated from the mean-variance model built in previous step. In the final lncRNA ranking step, MAGeCK estimates the level of negative (or negative) selection of each lncRNA by comparing the rankings of all pgRNAs targeting that lncRNA with a null model (where all pgRNAs are distributed uniformly in the ranked list). MAGeCK uses a α-Robust Rank Aggregation (α-RRA) algorithm to calculate the “RRA score” of each lncRNA, a score to describe the degree of negative (or positive) selection. The p value of the RRA score is calculated by permuting all pgRNAs, and the adjusted p values are obtained from the Benjamini Hochberg method. To increase the statistical power, we filter lncRNAs that have fewer than 2 statistical significant pgRNAs, and only perform multiple comparison p-value correction on the remaining lncRNAs. A detailed description of the algorithm can be found in the original study22.
Cell Proliferation assay
All the pgRNAs targeting the positive control gene and lncRNAs to be validated were cloned into a lentiviral expressing backbone carrying CMV promoter-driven EGFP, and were delivered into cells through transduction. The percentage of EGFP+ cells was quantified by FACS. The first quantification started from three days post viral infection, labelled as Day 0, serving as control for normalization. Cell viability was determined by normalizing EGFP+ percentages at indicated time points with Day 0 control.
Cell Lethality Assay
All the pgRNAs targeting negatively selected lncRNAs were delivered into Huh7.5OC cells through lentiviral infection and all the sgRNAs that were designed to repress or activate the transcription level of lncRNAs were delivered into Huh7.5 cells through transient transfection. The cells were conducted with FACS enrichment 72 h after infection or transfection, and the LDH lethality assay were performed from one day to three days post FACS. LDH staining and detection were performed as described in the product instruction (CytTox96, Promega). The death signal represented by the amount of LDH release was normalized to the wells based on the maximum LDH activity of the total lysed cells. Each data point and related error bar shown in the figures represent the average results from three replicates.
CRISPR-inhibitor and CRISPR-activator
For CRISPRi, the KRAB-dCas9-P2A-mCherry (Addgene # 60954) plasmid was delivered into Huh7.5 cells through lentivirus infection. And the mCherry-positive cells were enriched by FACS 3 days after infection. Then the sgRNAs targeting the negatively selected lncRNAs were delivered into cells with stable expressing of dCas9-KRAB by lentivirus infection followed by cell proliferation assay and cell lethality assay. For CRISPRa, the three plasmids dCAS-VP64_Blast (Addgene # 61425), MS2-P65-HSF1_Hygro (Addgene # 61426) and sgRNAs carrying EGFP for each positively selected lncRNAs were delivered into cells through transient transfection. Then the EGFP-positive cells were enriched by FACS 3 days after transfection followed by cell lethality assay.
Real-time PCR
RNA of cultured cells was extracted using RNAprep Pure Micro kit (TIANGEN, DP420), and the cDNA was synthesized using QuantScript RT kit (TIANGEN, KR103-03). Real-time PCR was performed with SYBR Premix Ex Taq II (TaKaRa, RR820A) on LightCycler96 qPCR system. And GAPDH transcript levels were measured as normalized controls.
RNA sequencing and data analysis
LINC01087 targeting pgRNAs (LINC01087_p1, LINC01087_p2 and LINC01087_p4) were delivered into Huh7.5 cells through lentivirus infection. The EGFP-positive cells were enriched by FACS three days after infection and cultured for another nine days. All the samples were harvested using RNAprep Pure Micro kit (TIANGEN, DP420) and deep-sequenced on the Illumina Hiseq 4000 platform. RNA-seq reads are mapped to the human reference genome (hg19) using Tophat242. The read counts of genes are collected using HTSeq43, and the differential expression analysis is performed using DESeq244.
Functional analysis of lncRNAs
We collected the expression data of genes and lncRNAs from five different cancer types: liver, prostate, lung, ovarian and brain. The expression levels from prostate, lung, ovarian and brain cancers were downloaded from our previous study19. In this study, gene expressions were measured from human exon arrays, and lncRNA expressions were measured by repurposing some of the probes to lncRNAs. The expression profiles include 150 tumour samples from MSKCC Prostate Oncogenome Project45, 451 samples from glioblastoma multiforme (GBM)46, 585 samples from ovarian cancer47, and 113 samples from lung squamous cell carcinoma from The Cancer Genome Atlas Research (TCGA) project48. We downloaded the RNA-seq expression profiles of liver cancer patients in TCGA from the TANRIC database, an integrative platform to explore the lncRNA functions49. For the expression profiles of liver cancer cell lines, we downloaded the RNA-seq data of 32 liver cancer cell lines from the Cancer Cell Line Encyclopedia (CCLE)32,50. RNA-seq reads were obtained from UCSC Cancer Genomics Hub (http://cghub.ucsc.edu) and mapped to the human reference genome (hg19) using Tophat242, and the expressions of genes and lncRNAs were calculated using Cufflinks51. For Gene Ontology (GO) analysis, we calculated the expression correlations of all coding genes for each lncRNA, chose genes with top 10% highest positive correlation, and used the topGO R package to estimate the statistical significance of enriched GO terms52.
Accession codes
CRISPR screening results for Huh7.5 cells can be accessed in NCBI Short Read Archive (SRA) with the accession number SRX2148757 and SRX2148759. Screening results for HeLa cells can be accessed in SRA with the accession number SRX2149095. RNA-seq reads can be accessed in SRA with the accession number SRX2152480. Source codes to design pgRNAs are available in Supplementary Code, as well as in Bitbucket repository (https://bitbucket.org/liulab/pgrnadesign).
Supplementary Material
Acknowledgments
We acknowledge the staff of the BIOPIC sequencing facility (Peking University) for their assistance, and National Center for Protein Sciences Beijing (Peking University) for help in Fluorescence Activated Cell Sorting. The project was supported by funds from the National Science Foundation of China (NSFC31430025, NSFC31170126, NSFC81471909), Beijing Advanced Innovation Center for Genomics at Peking University, and the Peking-Tsinghua Center for Life Sciences (to W.W.), the NIH grant U01 CA180980 (to X.S.L.), R01 HG008728 (to M.B. and X.S.L), and the Claudia Adams Barr Award in Innovative Basic Cancer Research from the Dana-Farber Cancer Institute.
Footnotes
AUTHOR CONTRIBUTIONS
X.S.L. and W.W. conceived and supervised the project. W.W., S.Z., J.P., and P.Y. designed the experiments. S.Z., J.L., P.X. and Z.C. performed the experiments with the help from W.L. and T.X.. W.L., C.H. and H.X. designed the oligos used for pgRNA library construction. W.L. performed the data analysis, with the help of Q.L. on the functional expression analysis of candidate lncRNAs. S.Z., W.L., X.S.L. and W.W. wrote the manuscript with the help of all other authors.
The authors declare no competing financial interests.
Readers are welcome to comment on the online version of the paper.
Supplementary Information is available in the online version of the paper.
References
- 1.Barrangou R, et al. CRISPR provides acquired resistance against viruses in prokaryotes. Science. 2007;315:1709–1712. doi: 10.1126/science.1138140. [DOI] [PubMed] [Google Scholar]
- 2.Jinek M, et al. A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity. Science. 2012;337:816–821. doi: 10.1126/science.1225829. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Cong L, et al. Multiplex Genome Engineering Using CRISPR/Cas Systems. Science. 2013;339:819–823. doi: 10.1126/science.1231143. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Mali P, et al. RNA-guided human genome engineering via Cas9. Science. 2013;339:823–826. doi: 10.1126/science.1232033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Shalem O, et al. Genome-scale CRISPR-Cas9 knockout screening in human cells. Science. 2014;343:84–87. doi: 10.1126/science.1247005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Wang T, et al. Identification and characterization of essential genes in the human genome. Science. 2015;350:1096–1101. doi: 10.1126/science.aac7041. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Koike-Yusa H, Li Y, Tan EP, del Velasco-Herrera MC, Yusa K. Genome-wide recessive genetic screening in mammalian cells with a lentiviral CRISPR-guide RNA library. Nat Biotechnol. 2014;32:267–273. doi: 10.1038/nbt.2800. [DOI] [PubMed] [Google Scholar]
- 8.Zhou Y, et al. High-throughput screening of a CRISPR/Cas9 library for functional genomics in human cells. Nature. 2014;509:487–491. doi: 10.1038/nature13166. [DOI] [PubMed] [Google Scholar]
- 9.Rajagopal N, et al. High-throughput mapping of regulatory DNA. Nat Biotechnol. 2016;34:167–174. doi: 10.1038/nbt.3468. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Korkmaz G, et al. Functional genetic screens for enhancer elements in the human genome using CRISPR-Cas9. Nat Biotechnol. 2016;34:192–198. doi: 10.1038/nbt.3450. [DOI] [PubMed] [Google Scholar]
- 11.Shalem O, Sanjana NE, Zhang F. High-throughput functional genomics using CRISPR-Cas9. Nat Rev Genet. 2015 doi: 10.1038/nrg3899. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Peng J, Zhou Y, Zhu S, Wei W. High-throughput screens in mammalian cells using the CRISPR-Cas9 system. FEBS J. 2015;282:2089–2096. doi: 10.1111/febs.13251. [DOI] [PubMed] [Google Scholar]
- 13.Canver MC, et al. BCL11A enhancer dissection by Cas9-mediated in situ saturating mutagenesis. Nature. 2015;527:192–197. doi: 10.1038/nature15521. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Diao Y, et al. A new class of temporarily phenotypic enhancers identified by CRISPR/Cas9-mediated genetic screening. Genome Res. 2016 doi: 10.1101/gr.197152.115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Han J, et al. Efficient in vivo deletion of a large imprinted lncRNA by CRISPR/Cas9. RNA Biology. 2014;11 doi: 10.4161/rna.29624. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Yin Y, et al. Opposing Roles for the lncRNA Haunt and Its Genomic Locus in Regulating HOXA Gene Activation during Embryonic Stem Cell Differentiation. Cell Stem Cell. 2015;16:504–516. doi: 10.1016/j.stem.2015.03.007. [DOI] [PubMed] [Google Scholar]
- 17.Ren Q, et al. A Dual-Reporter System for Real-Time Monitoring and High-throughput CRISPR/Cas9 Library Screening of the Hepatitis C Virus. Scientific Reports. 2015;5:8865. doi: 10.1038/srep08865. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Zheng Q, et al. Precise gene deletion and replacement using the CRISPR/Cas9 system in human cells. Biotechniques. 2014;57:115–124. doi: 10.2144/000114196. [DOI] [PubMed] [Google Scholar]
- 19.Du Z, et al. Integrative genomic analyses reveal clinically relevant long noncoding RNAs in human cancer. Nat Struct Mol Biol. 2013;20:908–913. doi: 10.1038/nsmb.2591. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Hsu PD, et al. DNA targeting specificity of RNA-guided Cas9 nucleases. Nat Biotechnol. 2013;31:827–832. doi: 10.1038/nbt.2647. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Xu H, et al. Sequence determinants of improved CRISPR sgRNA design. Genome Res. 2015;25:1147–1157. doi: 10.1101/gr.191452.115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Li W, et al. MAGeCK enables robust identification of essential genes from genome-scale CRISPR/Cas9 knockout screens. Genome Biol. 2014;15:554. doi: 10.1186/s13059-014-0554-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Subramanian A, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A. 2005;102:15545–15550. doi: 10.1073/pnas.0506580102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Cheng AS, et al. EZH2-mediated concordant repression of Wnt antagonists promotes beta-catenin-dependent hepatocarcinogenesis. Cancer Res. 2011;71:4028–4039. doi: 10.1158/0008-5472.CAN-10-3342. [DOI] [PubMed] [Google Scholar]
- 25.Gillies SD, Morrison SL, Oi VT, Tonegawa S. A tissue-specific transcription enhancer element is located in the major intron of a rearranged immunoglobulin heavy chain gene. Cell. 1983;33:717–728. doi: 10.1016/0092-8674(83)90014-4. [DOI] [PubMed] [Google Scholar]
- 26.Xiao X, et al. Splice site strength-dependent activity and genetic buffering by poly-G runs. Nat Struct Mol Biol. 2009;16:1094–1100. doi: 10.1038/nsmb.1661. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Gilbert LA, et al. Genome-Scale CRISPR-Mediated Control of Gene Repression and Activation. Cell. 2014;159:647–661. doi: 10.1016/j.cell.2014.09.029. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Konermann S, et al. Genome-scale transcriptional activation by an engineered CRISPR-Cas9 complex. Nature. 2015;517:583–588. doi: 10.1038/nature14136. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Eferl R, Wagner EF. AP-1: a double-edged sword in tumorigenesis. Nature Reviews. Cancer. 2003;3:859–868. doi: 10.1038/nrc1209. [DOI] [PubMed] [Google Scholar]
- 30.Hatziapostolou M, et al. An HNF4alpha-miRNA inflammatory feedback circuit regulates hepatocellular oncogenesis. Cell. 2011;147:1233–1247. doi: 10.1016/j.cell.2011.10.043. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Rinn JL, Chang HY. Genome regulation by long noncoding RNAs. Annu Rev Biochem. 2012;81:145–166. doi: 10.1146/annurev-biochem-051410-092902. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Barretina J, et al. The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature. 2012;483:603–607. doi: 10.1038/nature11003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Sramkoski RM, et al. A new human prostate carcinoma cell line, 22Rv1. In Vitro Cell Dev Biol Anim. 1999;35:403–409. doi: 10.1007/s11626-999-0115-4. [DOI] [PubMed] [Google Scholar]
- 34.Louro R, Smirnova AS, Verjovski-Almeida S. Long intronic noncoding RNA transcription: expression noise or expression choice? Genomics. 2009;93:291–298. doi: 10.1016/j.ygeno.2008.11.009. [DOI] [PubMed] [Google Scholar]
- 35.Vidigal JA, Ventura A. Rapid and efficient one-step generation of paired gRNA CRISPR-Cas9 libraries. Nature Communications. 2015;6:8083. doi: 10.1038/ncomms9083. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Yates A, et al. Ensembl 2016. Nucleic Acids Res. 2016;44:D710–716. doi: 10.1093/nar/gkv1157. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Smyth GK. Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol. 2004;3 doi: 10.2202/1544-6115.1027. Article3. [DOI] [PubMed] [Google Scholar]
- 38.Wang T, Wei JJ, Sabatini DM, Lander ES. Genetic screens in human cells using the CRISPR-Cas9 system. Science. 2014;343:80–84. doi: 10.1126/science.1246981. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Sanjana NE, Shalem O, Zhang F. Improved vectors and genome-wide libraries for CRISPR screening. Nat Methods. 2014;11:783–784. doi: 10.1038/nmeth.3047. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Gibson DG. Enzymatic assembly of overlapping DNA fragments. Methods Enzymol. 2011;498:349–361. doi: 10.1016/B978-0-12-385120-8.00015-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Anders S, Huber W. Differential expression analysis for sequence count data. Genome Biol. 2010;11:R106. doi: 10.1186/gb-2010-11-10-r106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Kim D, et al. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 2013;14:R36. doi: 10.1186/gb-2013-14-4-r36. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Anders S, Pyl PT, Huber W. HTSeq--a Python framework to work with high-throughput sequencing data. Bioinformatics. 2015;31:166–169. doi: 10.1093/bioinformatics/btu638. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15:550. doi: 10.1186/s13059-014-0550-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Taylor BS, et al. Integrative genomic profiling of human prostate cancer. Cancer Cell. 2010;18:11–22. doi: 10.1016/j.ccr.2010.05.026. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Cancer Genome Atlas Research N. Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature. 2008;455:1061–1068. doi: 10.1038/nature07385. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Partensky F, Garczarek L. Microbiology: Arms race in a drop of sea water. Nature. 2011;474:582–583. doi: 10.1038/474582a. [DOI] [PubMed] [Google Scholar]
- 48.Cancer Genome Atlas Research N. Comprehensive genomic characterization of squamous cell lung cancers. Nature. 2012;489:519–525. doi: 10.1038/nature11404. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Li J, et al. TANRIC: An Interactive Open Platform to Explore the Function of lncRNAs in Cancer. Cancer Res. 2015;75:3728–3737. doi: 10.1158/0008-5472.CAN-15-0273. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Wilks C, et al. The Cancer Genomics Hub (CGHub): overcoming cancer through the power of torrential data. Database (Oxford) 2014;2014 doi: 10.1093/database/bau093. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Trapnell C, et al. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat Protoc. 2012;7:562–578. doi: 10.1038/nprot.2012.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Alexa A, Rahnenfuhrer J, Lengauer T. Improved scoring of functional groups from gene expression data by decorrelating GO graph structure. Bioinformatics. 2006;22:1600–1607. doi: 10.1093/bioinformatics/btl140. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.