Skip to main content
eLife logoLink to eLife
. 2016 Sep 23;5:e19760. doi: 10.7554/eLife.19760

Compact and highly active next-generation libraries for CRISPR-mediated gene repression and activation

Max A Horlbeck 1,2,3,4, Luke A Gilbert 1,2,3,4, Jacqueline E Villalta 1,2,3,4,, Britt Adamson 1,2,3,4, Ryan A Pak 1,5, Yuwen Chen 1,2,3,4, Alexander P Fields 1,2,3,4, Chong Yon Park 1,5, Jacob E Corn 5,6, Martin Kampmann 1,2,3,4,7, Jonathan S Weissman 1,2,3,4,*
Editor: Karen Adelman8
PMCID: PMC5094855  PMID: 27661255

Abstract

We recently found that nucleosomes directly block access of CRISPR/Cas9 to DNA (Horlbeck et al., 2016). Here, we build on this observation with a comprehensive algorithm that incorporates chromatin, position, and sequence features to accurately predict highly effective single guide RNAs (sgRNAs) for targeting nuclease-dead Cas9-mediated transcriptional repression (CRISPRi) and activation (CRISPRa). We use this algorithm to design next-generation genome-scale CRISPRi and CRISPRa libraries targeting human and mouse genomes. A CRISPRi screen for essential genes in K562 cells demonstrates that the large majority of sgRNAs are highly active. We also find CRISPRi does not exhibit any detectable non-specific toxicity recently observed with CRISPR nuclease approaches. Precision-recall analysis shows that we detect over 90% of essential genes with minimal false positives using a compact 5 sgRNA/gene library. Our results establish CRISPRi and CRISPRa as premier tools for loss- or gain-of-function studies and provide a general strategy for identifying Cas9 target sites.

DOI: http://dx.doi.org/10.7554/eLife.19760.001

Research Organism: Human

Introduction

Highly multiplexed pooled genetic screening methodologies have emerged as powerful and broadly accessible tools for systematically profiling gene function at the scale of mammalian genomes (Paddison et al., 2004). Recently, a number of pooled screening platforms have been developed that utilize the bacterial CRISPR (Clustered Regularly Interspaced Palindromic Repeats)-associated nuclease Cas9 paired with libraries of single-guide RNAs (sgRNAs) to disrupt targeted genes (reviewed in Shalem et al., 2015). We and others have developed tools based on nuclease-dead Cas9 (dCas9) (Qi et al., 2013) to programmably interfere with (CRISPRi) or activate (CRISPRa) transcription (Gilbert et al., 2014, 2013; Konermann et al., 2015; Maeder et al., 2013; Perez-Pinera et al., 2013; Tanenbaum et al., 2014), and used these to systematically manipulate gene expression at genome scale (Gilbert et al., 2014; Konermann et al., 2015). Together, these screening platforms represent a powerful toolkit for unbiased forward gain-of-function and loss-of-function genetic screens in mammalian cells.

A key step in implementing CRISPR genetic screens is selecting sgRNAs that mediate high Cas9 activity. We and others recently found that nucleosomes provide a direct and profound impediment to Cas9 access to DNA (Hinz et al., 2015; Horlbeck et al., 2016; Isaac et al., 2016), an observation we expected to be particularly important for applications such as CRISPRi and CRISPRa, which require sustained binding of dCas9 to DNA. We found that nucleosome occupancy was predictive of Cas9 activity complementary to and independent of previously described sgRNA sequence features (Chari et al., 2015; Doench et al., 2014; Xu et al., 2015), adding an additional dimension to the set of parameters expected to influence Cas9 activity. These observations, along with the strong nucleosome-dependent phasing observed downstream of the FANTOM consortium-annotated transcription start site (TSS) (FANTOM Consortium and the RIKEN PMI and CLST (DGT) et al., 2014; Horlbeck et al., 2016), suggested that a quantitative model incorporating all of these features could greatly enhance our ability to predict highly active sgRNAs for CRISPRi and CRISPRa.

To test this, we developed a comprehensive machine learning pipeline trained on data collected from 30 CRISPRi and 9 CRISPRa screens. We found that the resulting models were highly predictive of sgRNA efficacy and strongly weighted nucleosome positioning and specific sequence features. We used these models to design and generate CRISPRi and CRISPRa version 2 (v2) libraries, targeting human and mouse genomes, which are greatly enriched for sgRNAs with high predicted activity. These libraries include several additional improvements, including the option to screen with either 10 sgRNAs per gene or a compact half-library containing the top 5 predicted sgRNAs for each gene. To benchmark this new algorithm, we validated the human CRISPRi v2 (hCRISPRi-v2) library with a screen designed to identify genes essential for robust cell growth. In this experiment, essential genes represent a large class of expected positive controls. We identified over 2100 essential genes with high statistical confidence, significantly improving upon our CRISPRi v1 library (Gilbert et al., 2014), and precision-recall analysis showed increased discrimination of gold standard essential and non-essential genes with both 10 sgRNA/gene and 5 sgRNA/gene hCRISPRi-v2 libraries (Hart et al., 2014). A large majority of the hCRISPRi-v2 sgRNAs targeting known essential genes produced robust growth phenotypes, a key advance over previous CRISPRi libraries (Evers et al., 2016), and our algorithm can accurately predict sgRNA activity for data from screens performed with an independently designed library and in different cell types. Furthermore, we observed that CRISPRi lacked any detectable non-specific toxicity associated with genomic DNA breaks and repair, enabling sensitive detection of genes with subtle growth phenotypes. We also conducted a screen for genes that modify growth rates upon overexpression with our hCRISPRa-v2 and found this identified 60% more genes, with greater enrichment for previously established classes of hit genes, than our version 1 CRISPRa screen. Our results suggest that the CRISPRi and CRISPRa v2 libraries have numerous favorable properties relative to alternate approaches as a resource for targeted or genome-scale loss-of-function and gain-of-function studies in mammalian cells.

Results

An integrated machine learning approach predicts highly active sgRNAs for CRISPRi and CRISPRa

We sought to improve upon our first generation CRISPRi and CRISPRa libraries by taking a comprehensive approach that incorporated nucleosome positioning, sequence features, refinement of our original sgRNA design rules, and other potentially informative factors. In order to quantitatively model the contribution of these features to CRISPRi activity, we turned to our recently described CRISPRi activity dataset (Horlbeck et al., 2016) in which we integrated data from 30 CRISPRi screens to select 1,539 high-confidence hit genes, and normalized the phenotypes for sgRNAs targeting each gene to the strongest sgRNAs for that gene, resulting in 'activity scores' for 18,380 sgRNAs. We used this set as training data for elastic net linear regression (Figure 1A) (Hui Zou, 2005). As many of the features included in the model were nonlinear with activity, we first adapted each feature set according to its relationship with activity. Categorical and non-linear parameters were binned prior to linear regression. Because the relationship between CRISPRi activity and target site distance from the TSS was highly periodic and asymmetric, as we had recently shown (Horlbeck et al., 2016), we fit sgRNA positioning features using support vector regression (SVR) to predict a continuous function for any target site (Supplementary Figure 1). An important improvement was the use of FANTOM consortium annotations instead of Ensembl/GENCODE to define the TSS (Cunningham et al., 2015; FANTOM Consortium and the RIKEN PMI and CLST (DGT) et al., 2014; Harrow et al., 2012) (Supplementary file 2), a finding also recently reported by Radzisheuskaya and colleagues (Radzisheuskaya et al., 2016).

Figure 1. A machine learning approach for identifying highly active sgRNAs for CRISPRi.

(A) Schematic of machine learning strategy and datasets. 808 features were calculated for each sgRNA, linearized as indicated, and z-standardized. A linear regression model was then generated using these features to fit to the activity scores of the CRISPRi training set (Horlbeck et al., 2016). 20% of the genes in the training set were reserved to test the predictive value of the resulting model. For display in Figures 1B,C, and Figure 1—figure supplement 2, five non-overlapping 20% datasets were randomly selected and training was performed on the corresponding 80% sets. An orthogonal dataset, based on tiling of every possible sgRNA within 10 kb of the TSS of 49 genes known to modulate sensitivity to ricin (Bassik et al., 2013; Gilbert et al., 2014), was also used to assess the predictive value of this model. (B) ROC analysis of the ability of the machine learning approach in (A) to predict highly active sgRNAs. For test set 1, sgRNAs with an activity score greater than 0.75 were considered highly active. For test set 2, sgRNAs with a phenotype greater than 0.75 of the maximum phenotype for each gene were considered highly active. (C) Relative contribution of feature categories to the sgRNA predicted scores. The individual weighting of each feature assigned by the linear regression model (see Figure 2—figure supplement 2) was grouped by the indicated categories, and the summed weights for each sgRNA within the 20% test datasets was calculated. The scores of the 95th and 5th percentile sgRNAs were subtracted to compute the overall contribution of the feature category to the distribution of predicted activity scores. Bars indicate the mean of the contributions from five 20% datasets (green circles). The target site position includes both the distance to the TSS and the periodic relationship as fit by SVR (Figure 1—figure supplement 1). (D) Distribution of predicted activity scores in next-generation CRISPRi libraries. (Top) Predicted CRISPRi activity correlates with empirical activity scores. For the 80%/20% division used to predict sgRNAs for the hCRISPRi-v2.1 library, predicted scores for the 20% test set were plotted against the empirical activity score. Activity score percentiles are from all sgRNAs within 0.25 of the indicated activity score. Predicted activity was highly correlated with activity, with a Pearson R of 0.56 (p<10–296). (Bottom) Distribution of predicted activity scores for CRISPRi v1, hCRISPRi-v2, and hCRISPRi-v2.1, as calculated by the hCRISPRi-v2.1 regression model. (E) Composition of hCRISPRi-v2 and mCRISPRi-v2 sublibraries.

DOI: http://dx.doi.org/10.7554/eLife.19760.002

Figure 1.

Figure 1—figure supplement 1. Relationship between CRISPRi activity and sgRNA position relative to the TSS as predicted by SVR.

Figure 1—figure supplement 1.

An SVR model with radial basis function kernel was trained on an 80% division of the CRISPRi activity score dataset, using the position of the sgRNA relative to the upstream end of the primary FANTOM TSS for each gene as the sole feature. Hyperparameter values for SVR were selected automatically using cross-validation within the training set. To display the relationship between sgRNA position and CRISPRi activity fit by this model, predicted scores were generated for each position within a 3 kb window around the TSS. The resulting curve recapitulated the previously observed periodic relationship shown to be out-of-phase with nucleosome positioning (Horlbeck et al., 2016), and use of this SVR model within the general machine learning approach enabled regression against this highly complex relationship. The shaded area indicates the region relative to the TSS where predicted activity scores for all sgRNAs were calculated for potential inclusion in the construction of the hCRISPRi-v2(.1) and mCRISPRi-v2 library designs.
Figure 1—figure supplement 2. Individual sgRNA feature contributions to predicted CRISPRi activity.

Figure 1—figure supplement 2.

Linear regression coefficients for each model trained according to the 80%/20% divisions displayed in Figure 1 are displayed, with bars indicating the mean of the five divisions and error bars indicating minimum and maximum feature coefficients. As each feature was z-standardized before linear regression, coefficients are directly comparable. For binned feature categories, x-axis values represent the minimum value of the bin (inclusive) unless otherwise indicated.

We first evaluated the performance of this algorithm using five-fold cross-validation. By performing regression on a training set of only 80% of the genes in the CRISPRi activity dataset, we found that the model was highly predictive of activity in the test set comprising the remaining 20% of the genes, with a receiver operating characteristic area under the curve (ROC-AUC) of 0.80 (Figure 1B). Importantly, this high predictive value was consistent across randomly selected training and test sets. As the sgRNAs in this dataset were pre-selected using our CRISPRi v1 design rules, we also tested our ability to predict the performance of sgRNAs in our previously published data set in which we tiled every target site around the TSS of 49 genes known to modulate resistance or sensitivity to the toxin ricin (Gilbert et al., 2014) and obtained an ROC-AUC of 0.91 (Figure 1B), indicating that we could identify active sgRNAs in the genome with high accuracy.

We next analyzed which features contributed most to CRISPRi activity in the predictive model (Figure 1—figure supplement 2). Overall, the predicted scores were most influenced by the position relative to the TSS, including both distance from the TSS and avoidance of canonical nucleosome-occupied regions (Figure 1C and Figure 1—figure supplement 1). In particular, the nucleosome-deprived region immediately downstream of the TSS yielded the strongest predicted activity by SVR, likely due in part to the contribution of dCas9 directly interfering with early transcriptional initiation or elongation (Gilbert et al., 2013; Qi et al., 2013). Sequence features also represented a large contribution to the model, and salient relationships included the disfavoring of guanine directly downstream of the protospacer adjacent motif (PAM) which recapitulated previous findings (Doench et al., 2014; Xu et al., 2015). Additional parameters that contributed to the prediction, included sgRNA secondary structure as predicted by ViennaRNA (Doench et al., 2016; Lorenz et al., 2011), sgRNA protospacer length, and chromatin accessibility features not accounted for by the nucleosome positioning relationship. The contribution of each individual parameter was also remarkably consistent across 80%/20% divisions of the training dataset, suggesting that the model was capturing underlying biological signal rather than overfitting the data.

Having established the robustness and accuracy of this approach, we used a version of our sgRNA predictions to design a CRISPRi genome-scale library targeting the human protein-coding transcriptome (hCRISPRi-v2) (Supplementary file 3; an in silico library design based on the final version of our predictions, hCRISPRi-v2.1, is also available in Supplementary file 3). While the predicted scores for sgRNAs in our v1 library were broadly distributed, many sgRNAs of higher predicted activity were available in the genome, and by picking the top 10 predicted sgRNAs per gene we expected that we could greatly enrich the library for guides of high activity (Figure 1D). In constructing this library, we also incorporated empirical information for highly active sgRNAs where available, revised off-target filtering, and implemented changes to the sgRNA expression vector to facilitate the processing of screen samples for sequencing (see Materials and methods). In addition, we cloned the library as separate pools for the top 5 and next-best 5 predicted sgRNAs per gene to facilitate screens where a smaller library may be advantageous, and further divided the pools into 7 thematic sublibraries based on our previous divisions of shRNA libraries (Kampmann et al., 2015) (Figure 1E).

We then used the same approach to design a next-generation CRISPRa library. Due to the requirement for CRISPRa targeting to be upstream of the TSS for maximal activity (Gilbert et al., 2014), we collected an activity score dataset of 2,898 sgRNAs from 9 CRISPRa screens (Y.C. and M.K., personal communication) to train an independent predictive model (Figure 2A). While the input set was significantly smaller than for CRISPRi, the resulting linear regression still had good predictive value (ROC-AUC 0.70; Figure 2B) and generally shared features of the CRISPRi model (Figure 2C and Figure 2—figure supplement 12). We observed that periodic relationship between distance to the TSS and sgRNA activity was less pronounced for CRISPRa than for CRISPRi (Figure 2—figure supplement 1; Figure 1—figure supplement 1), a difference we attributed to the reduced dynamic range in the nucleosome-depleted region around the TSS, and the smaller number and relatively lower expression of the genes targeted in the CRISPRa training dataset. We used the top 10 predicted sgRNAs for each gene to construct a next-generation library, which significantly increased the predicted activity of the library over our v1 designs (Figure 2D). The hCRISPRa-v2 library was partitioned into 14 sublibraries as described for the v2 CRISPRi library above (Figure 2E). Importantly, while several strategies have been described for CRISPR-mediated activation (Chavez et al., 2015; Gilbert et al., 2013; Hilton et al., 2015; Konermann et al., 2015; Maeder et al., 2013; Perez-Pinera et al., 2013; Tanenbaum et al., 2014; Zalatan et al., 2015), a recent comparison of these strategies observed that sgRNA activity generally correlated across all approaches (Chavez et al., 2016). Our CRISPRa-v2 libraries are thus likely to serve as a valuable resource for effectively targeting most activator systems.

Figure 2. A machine learning approach for identifying highly active sgRNAs for CRISPRa.

(A) Schematic of CRISPRa datasets. CRISPRa activity scores were generated from screen data and subjected to 3-fold cross-validation due to the smaller sample size. Ricin tiling data was limited to 21 genes that were previously shown to modulate sensitivity to ricin upon CRISPRa overexpression (Gilbert et al., 2014). (B) ROC analysis of machine learning approach using CRISPRa datasets, conducted as in Figure 1B. (C) Relative contribution of feature categories for CRISPRa, calculated as in Figure 1C. (D) Distribution of predicted activity scores in next-generation CRISPRa libraries. (Top) Predicted CRISPRa activity correlates with empirical activity scores. For the 67%/33% division used to predict sgRNAs for the hCRISPRa-v2 library, predicted scores for the 33% test set were plotted against the empirical activity score. Activity score percentiles are from all sgRNAs within 0.25 of the indicated activity score. Predicted activity was highly correlated with activity, with a Pearson R of 0.41 (p<10–38). (Bottom) Distribution of predicted activity scores for CRISPRa v1 and hCRISPRa-v2 as calculated by the hCRISPRa-v2 regression model. (E) Composition of hCRISPRa-v2 and mCRISPRa-v2 sublibraries.

DOI: http://dx.doi.org/10.7554/eLife.19760.005

Figure 2.

Figure 2—figure supplement 1. Relationship between CRISPRa activity and sgRNA position relative to the TSS as predicted by SVR.

Figure 2—figure supplement 1.

An SVR model was trained on a 67% division of the CRISPRa dataset using sgRNA position relative to the downstream end of the primary FANTOM TSS for each gene. Analysis was conducted as described for Figure 1—figure supplement 1. The shaded area indicates the region relative to the TSS where predicted activity scores for all sgRNAs were calculated for potential inclusion in the construction of the hCRISPRa-v2 and mCRISPRa-v2 library designs.
Figure 2—figure supplement 2. Individual sgRNA feature contributions to predicted CRISPRa activity.

Figure 2—figure supplement 2.

Linear regression coefficients for each model trained according to the 67%/33% divisions displayed in Figure 2 are displayed, with bars indicating the mean of the three divisions and error bars indicating minimum and maximum feature coefficients. As each feature was z-standardized before linear regression, coefficients are directly comparable. For binned feature categories, x-axis values represent the minimum value of the bin (inclusive) unless otherwise indicated.

Finally, we applied the CRISPRi and CRISPRa models to predict highly active sgRNAs targeting the mouse protein-coding transcriptome and generated corresponding genome-scale libraries (mCRISRPi-v2 and mCRISPRa-v2) (Figures 1E, 2E). All four library designs are included as Supplementary file 36, sgRNA prediction and library design scripts are available online (see Materials and methods), and the cloned lentiviral libraries are available on Addgene.

The large majority of sgRNAs in the hCRISPRi-v2 library are effective

A central goal in developing the hCRISPRi-v2 library was to enrich for highly active sgRNAs, which would improve statistical confidence in hits and enable high sensitivity even with a compact 5 sgRNA/gene library. Underscoring the importance of this goal, Evers et al. generated a small-scale CRISPRi library targeting predetermined sets of essential and non-essential genes and conducted screens for growth phenotypes in the bladder carcinoma cell line RT-112, and found that ~50% of the CRISPRi sgRNAs targeting essential genes were inactive (Evers et al., 2016), similar to rates observed using our CRISPRi v1 library. To test whether our next-generation sgRNA prediction algorithms were able to identify these inactive sgRNAs, we evaluated the predicted activity of the sgRNAs targeting known essential genes in this screen. Despite the screen being performed with an independently designed library, using an sgRNA constant region we have found to underperform our current design (Chen et al., 2013), and in a cell type not evaluated in any of our training or test datasets, predicted activity correlated well with the sgRNA growth phenotypes observed in the screen (Figure 3A; Pearson R = −0.58, p<10–37). Over 40% of the sgRNAs in this library had a predicted activity score less than 0.4, a regime in which the vast majority (over 87%) of sgRNAs were inactive, as defined by z-score > −2 relative to non-essential gene-targeting sgRNAs, enabling a priori elimination of 60% of the inactive sgRNAs by simply applying this threshold. By contrast, of the sgRNAs in that library with predicted activity scores ≥0.6, 77% are active. Use of the improved sgRNA constant region would be expected to further increase this fraction (Chen et al., 2013; Dang et al., 2015).

Figure 3. hCRISPRi-v2 outperforms CRISPRi v1 in screens for essential genes in K562.

(A) Distribution and predicted scores for sgRNAs targeting essential genes. (Top) Predicted activity scores for sgRNAs from Evers et al., 2016 or hCRISPRi-v2 targeting essential genes as defined by Evers et al. (Evers et al., 2016), binned in increments of 0.1. (Bottom) sgRNA growth phenotypes of the sgRNAs in the above bins, z-standardized to the distribution of sgRNAs targeting Evers et al. non-essential genes. (B) ROC analysis of sgRNAs from CRISPRi v1 or hCRISPRi-v2 targeting essential and non-essential genes. sgRNAs were ranked by γ, and considered true or false positives if they targeted essential or non-essential genes, respectively, as defined by Evers et al. (C) Volcano plots of gene phenotypes and p-values for growth screens performed with CRISPRi v1 (Gilbert et al., 2014) and hCRISPRi-v2. For each screen, genes phenotypes were calculated by averaging the growth phenotype (γ) of the 3 sgRNAs with the strongest γ by absolute value, and gene p-values were calculated by performing the Mann-Whitney test comparing all sgRNAs targeting the gene to the full set of negative control sgRNAs. For genes with multiple TSSs targeted, sgRNAs were grouped by TSS and the TSS with the lowest p-value was used for downstream analysis. A comparable number of negative control (NC) genes were generated by randomly sampling 10 non-targeting sgRNAs (with replacement) and analyzed as true genes. Empirically derived thresholds (dashed lines) were calculated as shown, using the NC gene distribution to derive the background standard deviation for z-score. (D) Precision-recall analysis of essential gene screens performed in K562. Statistical precision and recall of essential and non-essential gene sets (Hart et al., 2014) were calculated for genes ranked by growth phenotype in K562. For both CRISPRi and CRISPR nuclease screens (Wang et al., 2015), gene-level phenotypes were calculated as the average log2 fold-change of all sgRNAs targeting the gene (termed CRISPR Scores in ref. [Wang et al., 2015]). (E) Boxplots of CRISPRi and CRISPR nuclease sgRNA phenotypes for several gene sets. sgRNA γ scores (CRISPRi) or log2 enrichments (nuclease) were z-standardized to the corresponding negative control set. Boxplots display the distribution of the negative control sgRNAs or sgRNAs targeting genes on Y chromosome (excluding pseudo-autosomal genes), within the BCR amplicon, or in the gold standard essential sets used in (D). Individual phenotypes for sgRNAs targeting BCR are overlaid with the corresponding boxplot.

DOI: http://dx.doi.org/10.7554/eLife.19760.008

Figure 3.

Figure 3—figure supplement 1. sgRNA phenotypes from CRISPRi v1 and hCRISPRi-v2 growth screens.

Figure 3—figure supplement 1.

(A) hCRISPRi-v2 sgRNA phenotypes correlate between screen replicates. sgRNA γ scores were calculated by computing log2 enrichments of read counts between screen start and endpoint samples and normalizing by estimated cell doublings over the course of the screen. Phenotypes for non-targeting sgRNAs and sgRNAs targeting the Y chromosome generally do not correlate between screen replicates; Spearman R = 0.08 (p<10–7) and R = 0.05 (p=0.5), respectively. (B) Histograms of growth phenotypes (γ) for sgRNAs used in the analysis in Figure 3B. Percentages indicate number of essential-targeting sgRNAs with negative γ more than two standard deviations from the mean of the non-essential-targeting sgRNAs. (C) UCSC Genome Browser tracks depicting Ensembl and FANTOM annotations for example gene VCP. CRISPRi v1 sgRNAs were chosen to be −50 bp to 300 bp relative to the 5’ end of the VCP transcript model. hCRISPRi-v2 sgRNAs were the top predicted sgRNAs chosen from all sites −25 bp to 500 bp relative to the 'p1@VCP' and 'p2@VCP' FANTOM TSS annotation. FANTOM annotations were generated from CAGE sequencing of over 800 human cell types and tissues, summarized by the maximum CAGE sequencing counts track.
Figure 3—figure supplement 2. Precision-recall analysis of second-generation CRISPR nuclease essential gene screens.

Figure 3—figure supplement 2.

Analysis was conducted as in Figure 3D. CRISPRi, Wang et al., and Doench et al. datasets were ranked according to the average log2 fold-change of all sgRNAs targeting a given gene. For Hart et al., genes were ranked according to their published Bayes Factor scores as sgRNA-level data was unavailable (Doench et al., 2016; Hart et al., 2015; Wang et al., 2015).

In order to validate our hCRISPRi-v2 library design and directly compare its performance to our published v1 library screens (Gilbert et al., 2014), we conducted a screen for genes essential for robust growth in the chronic myeloid leukemia cell line K562. We calculated the growth phenotype (γ) for each sgRNA (Bassik et al., 2013; Kampmann et al., 2013) and averaged these values across two screen replicates (Figure 3—figure supplement 1 and Supplementary file 7). We found that the hCRISPRi-v2 sgRNA growth phenotypes targeting the Evers et al. essential gene sets correlated with predicted activity as above (Figure 3A; Pearson R = −0.42, p<10–21), and therefore designing the libraries based on the top predicted scores selected for highly active sgRNAs. To quantify the fraction of active sgRNAs in our genome-scale libraries, we performed sgRNA-level ROC analysis, ranking sgRNAs by growth phenotype γ and classifying them as true or false positives if they targeted essential or non-essential genes, respectively (Evers et al., 2016). This analysis showed that hCRISPRi-v2 was greatly enriched for active sgRNAs (Figure 3B), and in particular the top 5 sgRNA/gene library contained 80% active sgRNAs at 5% false positives, comparable to the pilot nuclease library tested by Evers et al. This improvement was due to the significant reduction in the number of inactive sgRNAs rather than any difference in the noise as assessed by the background distribution of non-essential gene-targeting sgRNAs (Figure 3—figure supplement 1B). In some instances, as with the known essential gene VCP, the difference in sgRNA phenotypes between v1 and v2 libraries was likely attributable to the transition from Ensembl to FANTOM as the TSS annotation source (Figure 3—figure supplement 1C) (Cunningham et al., 2015; FANTOM Consortium and the RIKEN PMI and CLST (DGT) et al., 2014; Harrow et al., 2012). Taken together, the above observations indicate the lower fraction of effective sgRNAs in previous libraries was a result of the algorithm and the TSS annotation used rather than any intrinsic limitation of CRISPRi.

The hCRISPRi-v2 library robustly identifies essential genes in precision-recall analysis

We next sought to evaluate whether the enrichment for active sgRNAs in hCRISPRi-v2 over CRISPRi v1 resulted in improved accuracy and confidence for calling hit genes. We analyzed the screens with a consistent pipeline, scoring genes both by assigning a phenotype based on the mean of the top 3 sgRNAs targeting the gene (by absolute value) and by calculating the Mann-Whitney p-value of all 10 sgRNAs compared to the negative control sgRNAs. We visualized these gene scores as a volcano plot, with the phenotype effect size on the x-axis and p-value on the y-axis (Figure 3C and Supplementary file 8). Many hit genes exhibited much stronger p-values, including a substantial fraction that reached the optimal p-value obtainable in the Mann-Whitney test, indicating that the phenotypes of all targeting sgRNAs for these genes were more pronounced than any of the 3,790 non-targeting control sgRNAs. We also modeled noise and off-target effects in the system by generating a large set of 'negative control genes' composed of randomly selected sets of 10 non-targeting sgRNAs and scoring these genes as we did for true genes. In order to classify genes as hits in the screen, we used a score that integrated effect size and statistical confidence and applied the same threshold to both v1 and v2 screens. While in both screens fewer than 0.21% of these negative control genes scored as hits by these criteria, representing a ~2% empirically estimated false discovery rate overall, we could confidently identify 2,150 essential gene hits in the v2 screen while our v1 screen identified 1,408 essential genes (Figure 3C).

In order to assess whether the stronger sgRNA- and gene-level growth phenotype γ scores produced by the hCRISPRi-v2 library resulted in improved discrimination of essential genes, we turned to precision-recall analysis of large 'gold standard' essential and non-essential gene sets introduced by Hart and colleagues (Hart et al., 2014). We ranked genes by their growth phenotype (γ) score and calculated at each phenotype threshold the trade-off between the recall of true essential genes and the avoidance of false positive non-essential genes, termed precision. In this analysis, the hCRISPRi-v2 library recalled over 91.2% of the gold standard essential genes at 95% precision compared to 81.5% in CRISPRi v1 (Figure 3D). We also found that the precision-recall of the top 5 sgRNA/gene half library was essentially identical to the full 10 sgRNA per gene library. Therefore, the 5 sgRNA/gene CRISPRi library represents a compelling tool for applications such as cell sorting-based screens (Liberali et al., 2015), or for in vivo screens where cell engraftment and library representation may represent a limiting factor (Braun et al., 2016; Chen et al., 2015).

Finally, we wanted to benchmark our hCRISPRi-v2 library against other recent genome-scale growth screens performed with nuclease-active Cas9 and novel second-generation libraries (Doench et al., 2016; Hart et al., 2015; Wang et al., 2015). Although these screens were performed in different labs and generally in different cell lines, precision-recall analysis offers a useful metric to compare these screens in an unbiased fashion (Hart et al., 2014). We found that our CRISPRi v2 library showed comparable or in many cases much greater discrimination (Figure 3—figure supplement 2). One published CRISPR nuclease screen was conducted in K562 (Wang et al., 2015) with a ~10 sgRNA/gene library, allowing for more direct comparison albeit still tempered by differences between labs and screening protocols. This screen recalled somewhat fewer (78.7% vs 90.8%) essential genes at 95% precision than our v2 library screen (Figure 3D). Together, these results indicate that our hCRISPRi-v2 library has a low false negative rate with few false positives, and the enrichment for highly active sgRNAs enables robust detection of phenotypes even with a compact library.

CRISPRi does not induce non-specific toxicity at amplified genomic loci

We were also intrigued by the observation by Wang and colleagues of K562-specific essentiality of many genes neighboring the BCR-ABL translocation, which they demonstrated to be mediated by non-specific toxicity of CRISPR-induced double-stranded breaks in the amplified locus (Wang et al., 2015; Wu et al., 1995). This toxicity appears to be pervasive as similar effects have been observed across a range of cancer cell lines (Aguirre et al., 2016; Munoz et al., 2016). To test whether this toxicity could be caused by CRISPRi as well, we used our hCRISPRi-v2 screen as a representative dataset. When we compared the phenotypes of CRISPRi sgRNAs to CRISPR nuclease at the BCR amplicon, with all phenotypes standardized to the distribution of negative controls to facilitate comparison, we found that sgRNAs targeting BCR were strongly depleted in both screens, as expected based on the critical role of the BCR-ABL fusion in this cancer cell line (Naumann et al., 2001), but few other CRISPRi sgRNAs in the region elicited growth defects (Figure 3E). We also found that CRISPR nuclease sgRNAs targeting the non-essential gene set were generally depleted relative to negative controls or chromosome Y-targeting sgRNAs, which should have no targets in the female-derived K562 cell line (Klein et al., 1976), suggesting that in a CRISPR screen Cas9 nuclease activity can lead to measurable toxicity not related to the function of individual genes but instead due to the formation of on-target DNA double strand breaks, even with alleles present only at 2–3 copies (Naumann et al., 2001). CRISPRi did not exhibit this generic toxicity at non-essential genes, allowing for detection of genes with subtle phenotypes relative to negative controls. Importantly, however, the vast majority of sgRNAs targeting essential genes showed a clear separation from the non-essential gene distribution (Figure 3E), demonstrating the high degree of sensitivity for detecting loss-of-function phenotypes with both CRISPR nuclease and CRISPRi screens.

The hCRISPRa-v2 library identifies more genes that modify robust growth rates upon overexpression

Finally, we sought to validate our hCRISPRa-v2 library design by conducting a screen for growth phenotypes in K562 cells expressing SunTag-VP64 constructs (Gilbert et al., 2014; Tanenbaum et al., 2014). We conducted two replicate growth screens (Figure 4—figure supplement 1A and Supplementary file 9) and analyzed the screen as with the hCRISPRi-v2 screen above to directly compare the results to our published CRISPRa screens (Gilbert et al., 2014). Our hCRISPRa-v2 screen identified 540 genes to modify robust growth rates upon overexpression, 257 more than our previous CRISPRa screen (Figure 4A and Supplementary file 10). Beyond these additional hits, the v1 and v2 screens showed good agreement (Figure 4—figure supplement 1B), and the top categories in DAVID analysis of the v1 screen (Huang et al., 2009), enrichment for transcription factor genes (in particular homeobox and forkhead box transcription factors), received ~3-fold greater enrichment scores in the hCRISPRa-v2 hits (Figure 4—figure supplement 1C), indicating the strong biological coherence of the additional genes. Analysis of the sgRNA growth phenotype (γ) scores for genes that were hits in both v1 and v2 screens showed that a greater fraction of sgRNAs were highly active (69.3% in hCRISPRa-v2 with 5 sgRNAs/gene versus 45.1% in CRISPRa v1; Figure 4C and Figure 4—figure supplement 1D), further validating improvements in the library design. In addition, as with our CRISPRi results, several genes identified in the hCRISPRi-v2 screen but not in v1, including hematopoietically-expressed homeobox HHEX and forkhead box C1 FOXC1, could be attributed to the use of the CAGE-based FANTOM5 TSS annotation (Figure 4D and Figure 4—figure supplement 1E). Finally, we compared the growth phenotypes from hCRISPRi-v2 to hCRISPRa-v2 and found that the two methods identified non-overlapping sets of genes that modify robust growth (Figure 4—figure supplement 1F), consistent with our previous results and highlighting the complementary information provided by these two approaches.

Figure 4. hCRISPRa-v2 outperforms CRISPRa v1 in screens for genes that modify growth rates upon overexpression.

(A) Volcano plots of gene phenotypes and p-values for growth screens performed with CRISPRa v1 (Gilbert et al., 2014) and hCRISPRa-v2, presented as in Figure 3C. (B) Cumulative distributions of fraction of highly active sgRNAs targeting strong hit genes shared between CRISPRa v1 and hCRISPRa-v2 screens. Highly active sgRNAs for CRISPRa were defined as those with negative γ scores more than two standard deviations from the mean of non-targeting control sgRNAs (see Figure 4—figure supplement 1D). (C) UCSC Genome Browser tracks depicting TSS annotations and CRISPRa growth phenotypes for example gene HHEX.

DOI: http://dx.doi.org/10.7554/eLife.19760.011

Figure 4.

Figure 4—figure supplement 1. sgRNA phenotypes and gene category enrichment scores from CRISPRa v1 and hCRISPRa-v2 growth screens.

Figure 4—figure supplement 1.

(A) hCRISPRa-v2 sgRNA phenotypes correlate between screen replicates, presented as in Figure 3—figure supplement 1A. Phenotypes for non-targeting sgRNAs and sgRNAs targeting the Y chromosome correlate poorly between screen replicates; Spearman R = 0.10 (p<10–8) and R = 0.15 (p=0.05), respectively. (B) Comparison of CRISPRa v1 and hCRISPRa-v2 gene growth phentoypes (γ). (C) DAVID enrichment scores for hit gene categories from CRISPRa v1 and hCRISPRa-v2 screens. CRISPRa v1 categories represent the top 5 categories identified. hCRISPRa-v2 categories include the top 3 identified along with homeobox and forkhead box categories. (D) Histograms of growth phenotypes (γ) for sgRNAs used in the analysis in Figure 4B. Percentages indicate number of sgRNAs targeting v1 and v2 shared hit genes with negative γ more than two standard deviations from the mean of the non-targeting control sgRNAs. (E) UCSC Genome Browser tracks depicting TSS annotations and CRISPRa growth phenotypes for example gene FOXC1. (F) Comparison of hCRISPRi-v2 and hCRISPRa-v2 gene growth phenotypes (γ).

Discussion

Establishing design rules for effective reagents is critical to the implementation of genome-scale screening technologies. Our previous work established genome-scale CRISPRi and CRISPRa libraries as robust, specific, and complementary tools for dissecting biological pathways in human cells (Gilbert et al., 2014). Here, we significantly improve upon this technology by developing a comprehensive predictive model to accurately identify highly active sgRNAs. This model includes both features specific for CRISPRi and CRISPRa, such as positioning relative to the TSS, as well as features like nucleosome occupancy that we expect to be generally important for most Cas9-mediated applications (Horlbeck et al., 2016), and was able to accurately predict sgRNA activity in screen performed in a cell type it had not previously evaluated (Evers et al., 2016). We used this prediction algorithm to design four new genome-scale libraries targeting human and mouse genomes. These libraries are available on Addgene and in silico to facilitate design of focused libraries or targeted experiments (Supplementary file 36).

By performing CRISPRi and CRISPRa screens for genes that modify robust growth, we validate our sgRNA predictions and find that our hCRISPRi-v2 and hCRISPRa-v2 libraries represent a significant advance over our previous work in the fraction of highly active sgRNAs, the number of hits detected, and the statistical confidence of these hits. For CRISPRi, these improvements result in the accurate discrimination of essential genes by precision-recall analysis even with a compact 5 sgRNA/gene library. We believe that the greatly improved sgRNA prediction and lack of non-specific toxicity due to nuclease activity, combined with our previous findings that CRISPRi enables inducible, reversible, and homogenous manipulation of gene expression (Gilbert et al., 2014; Mandegar et al., 2016; Qi et al., 2013), make CRISPRi a state-of-the-art approach for loss-of-function studies.

While the libraries described here target genes annotated as protein-coding, CRISPRi and CRISPRa have been shown to be effective for repressing and activating transcription of non-coding genes as well (Gilbert et al., 2014; Luo et al., 2016; Zhao et al., 2014). Our v2 sgRNA prediction algorithms may enable the design of libraries to systematically manipulate expression of these transcripts as well. Furthermore, combining CRISPRi and CRISPRa with methods for robustly expressing multiple sgRNAs (Kabadi et al., 2014; Wong et al., 2016; Zalatan et al., 2015) will allow for simultaneous control of several genes, facilitating dissection of cellular pathways and systematic mapping of mammalian genetic interactions (Bassik et al., 2013; Costanzo et al., 2010; Schuldiner et al., 2005). Broadly, increased quantitative understanding of the factors dictating (d)Cas9 activity and specificity will greatly enhance the expanding set CRISPR-mediated technologies for controlling gene expression (Hilton et al., 2015; Perez-Pinera et al., 2013; Vojta et al., 2016), imaging targeted loci (Chen et al., 2013; Shao et al., 2016), or precisely editing the genome (Komor et al., 2016; Tsai et al., 2014).

Materials and methods

Machine learning for CRISPRi and CRISPRa sgRNA activity

Training and test datasets

The CRISPRi activity score dataset was obtained from Horlbeck et al. (2016). CRISPRi and CRISPRa ricin tiling data was obtained from Gilbert et al. (2014). CRISPRa activity scores were generated as previously described for the CRISPRi activity dataset, using data from 9 published and unpublished screening datasets. Hit genes were selected using the formula |effect size Z-score x log10 p-value| ≥ 20 in any screen, and the phenotypes for sgRNAs targeting each gene were extracted from the screen in which the gene was a hit and normalized to the mean of the top 3 sgRNAs by absolute value. All datasets are included here as Supplementary file 1.

Generating TSS annotations

In order to leverage the high accuracy of the FANTOM TSS annotations but remain compatible with the comprehensive, systematic, and established Ensembl transcript models, a hybrid approach was taken. First, the full set of protein-coding genes and transcripts were selected as previously described (Gilbert et al., 2014), using Ensembl release 74 (corresponding to genome assemblies hg19 for human and mm10 for mouse; RRID:SCR_002344) and the APPRIS pipeline to identify the relevant functional transcripts and establish a preliminary set of TSS annotations (Cunningham et al., 2015). FANTOM CAGE peak BED files (RRID:SCR_002678; Human: http://fantom.gsc.riken.jp/5/datafiles/phase1.3/extra/TSS_classifier/TSS_human.bed.gz, accessed March 2, 2015; Mouse: http://fantom.gsc.riken.jp/5/datafiles/phase1.3/extra/TSS_classifier/TSS_mouse.bed.gz, accessed June 28, 2015 and lifted over to mm10 coordinates with UCSC liftOver tool) were then used to refine this annotation. Specifically, for each gene, the TSS was identified as the same-stranded peaks within 30 kb of any Ensembl TSS that matched the gene symbol and was labeled 'p1@gene' (here referred to as 'primary TSS') or 'p2@gene' ('secondary TSS'; included only where the peak passed the FANTOM robust threshold for TSS), and the annotation support was labeled as 'CAGE, matched peaks.' If multiple matches were found the closest TSS to a known Ensembl TSS was used. If no match was found, the gene could then be matched with any same-stranded CAGE peak within 500 bp labeled as 'p1' or 'p2,' and annotation support was considered 'CAGE, primary peaks.' In the above cases, primary and secondary TSSs were targeted separately (i.e. by 10 sgRNAs each) if they were farther than 1 kb apart, or together as one 'P1P2' TSS. Where no matched or un-matched primary peaks were found, TSS annotations could be refined by any robust peaks or permissive peaks within 200 bp of the annotation (labeled 'CAGE, robust peak' or 'CAGE, permissive peak') or simply use the Ensembl/APPRIS annotation where no CAGE support was available ('Annotation'). This combined annotation is included as Supplementary file 2.

Calculating features

Position relative to the primary and secondary TSSs was calculated from the genomic coordinate of the 3’G of the PAM for each sgRNA to the upstream and downstream edge of each TSS range. Sequence features of the sgRNA and target sites were determined using custom scripts written in Python 2.7 (RRID:SCR_008394) with the Biopython module (RRID:SCR_007173) (Cock et al., 2009). The hCRISPRi-v2.1 algorithm included a change in these scripts to fix how certain sequence homopolymers were scored. All other libraries were generated with this improvement incorporated, and all analysis in this paper was performed with the v2.1 algorithm. RNA folding metrics were calculated using the ViennaRNA package (RRID:SCR_008550; version 2.2.5) with default parameters (Lorenz et al., 2011). Chromatin features at the target site were calculated as described previously (Horlbeck et al., 2016), averaging the signal at each base of the target site including the PAM. Custom Python scripts with the module bxpython (v0.5.0, https://github.com/bxlab/bx-python) to extract the processed continuous signal from the following BigWig files obtained from the ENCODE consortium: MNase-seq https://www.encodeproject.org/files/ENCFF000VNN/ (Michael Snyder lab, Stanford University), DNase-seq https://www.encodeproject.org/files/ENCFF000SVY/ (Gregory Crawford lab, Duke University), and FAIRE-seq https://www.encodeproject.org/files/ENCFF000TLU/ (Jason Lieb lab, University of North Carolina) (ENCODE Project Consortium, 2012). Beyond the nucleosome positioning information incorporated in the sgRNA positioning learning models, no chromatin data was used for predicting sgRNA activity for the mouse genome.

Machine learning

Training and test activity score sets were first divided into 80%/20% or 67%/33% sets of genes for CRISPRi or CRISPRa, respectively, as described in the results section. The training set parameters were then transformed according to their distribution as depicted in Figure 1A. For binning parameters, a fixed width was chosen for each feature and applied over the range of values, with the upper-most and lower-most bins collapsed with the neighboring bins if the number of data points at each edge were sparse. Each feature was then split into individual parameters for each bin and sgRNAs were assigned a 1 for the bin if the value fell within the bin or 0 if not. For linearizing sgRNA positioning parameters with continuous curves, sgRNA positions were fit to the activity score (individually for the distance to each TSS coordinate) using SVR with a radial basis function kernel and hyperparameters C and gamma determined using a grid search approach cross-validated within the training set. The fit score at each position was then used as the transformed linear parameter. Binary parameters were assigned a 1 if true or a 0 if false. All linearized parameters were z-standardized and fit with elastic net linear regression, with the l1/l2 ratio set by cross-validated grid search. All machine learning and downstream analysis was performed with custom Python scripts and the scikit-learn suite, version 0.15.0 (RRID:SCR_002577) (Pedregosa et al., 2011).

Design of next-generation CRISPRi and CRISPRa libraries

Prediction of sgRNA activity scores

All sequences within −25 and 500 bp (for CRISPRi) or −550 and −25 bp (for CRISPRa) of the upstream or the downstream edge of the primary or secondary TSS and containing 19 bp followed by an NGG PAM were extracted as potential sgRNAs for downstream scoring of predicted activity. All sequences were prepended with a 5’ G to enable robust transcription from the U6 promoter, whether or not this base was present in the genomic sequence. Parameters were calculated for all sgRNAs as above, and transformed and scored using the CRISPRi or CRISPRa regression model from an arbitrarily chosen training set (test set ROC-AUC corresponding to these sets reported in results section).

Off-target scoring

Prediction of sgRNA off-target effects was performed using weighted Bowtie (v1.0.0, RRID:SCR_005476 [Ben Langmead et al., 2009]) alignment largely as previously described (Gilbert et al., 2014) with several adjustments. The '--tryhard' flag was added to the Bowtie command to increase sensitivity for mismatched sgRNA target sites. The hg19 and mm10 genomes used for alignment were masked at mitochondrial sequences and pseudoautosomal sequence to eliminate 'false positive' multiple alignments. Most importantly, as CRISPRi and CRISPRa have maximal effects proximal to the TSS, potential off-target alignments in these regions now were prioritized by creating a reference sequence corresponding to 1 kb windows around each TSS as defined above, along with the 5’ end of every Ensembl transcript annotation. Reference sequences were generated using bedtools (v2.17.0, RRID:SCR_006646 [Quinlan and Hall, 2010]). In order to pass at the strictest threshold, sgRNAs were required to have no more than 1 alignment (the sgRNA target site itself) with 'mismatch score' (Gilbert et al., 2014) less than 31 proximal to the TSS and under 21 in the genome. (For hCRISPRi-v2, 96.6% of sgRNAs incorporated passed at this threshold.) In cases of difficult to target genes or close gene families, sgRNAs were allowed at relaxed thresholds. In descending order, these were: 1 alignment under 31 proximal to the TSS (no genomic threshold), 1 alignment under 21 in the genome, 2 alignments under 31 proximal to the TSS, and 3 alignments under 31 proximal to the TSS.

sgRNA selection

sgRNAs were chosen for inclusion into the genome-scale libraries based on predicted activity scores, empirical activity scores where available, off-target filtering, overlap with other sgRNAs already selected, and sequences with no restriction sites for enzymes used in cloning or sequencing sample processing (BstXI, BlpI, and SbfI). Empirical activity scores for CRISPRi/a v1 sgRNAs were generated as for the training sets above at a lower discriminant threshold of 7, and the corresponding sgRNAs were standardized to 19 bp with a 5’ G preprended as above and subjected to the same revised off-target scoring procedure. For each TSS targeted by the library, up to 2 sgRNAs with the strongest empirical evidence were included first if the empirical activity score was at least 0.75, the sgRNA was less that 5 kb from the revised v2 TSS, the sgRNA passed the most stringent off-target filter, the sgRNA plus flanking cloning sequences did not contain extra restriction sites, and the sgRNA target site was at least 3 bp shifted from any previously selected target sequence. Once 2 empirically validated v1 sgRNAs were included, further sgRNAs fitting these criteria were not included but their predicted activity scores were increased by 0.2 to reflect the balance of information from the algorithm and empirical activity. Predicted sgRNAs were then sorted by best predicted activity score and included if the sgRNA passed the most stringent off-target filter, the sgRNA plus flanking cloning sequences did not contain extra restriction sites, the sgRNA target site was at least 3 bp shifted from any previously selected target sequence, and no more than 10 sgRNAs had been selected for the TSS. If fewer than 10 sgRNAs were selected by this algorithm, off-target stringency was iteratively relaxed as above and selection was continued to attain 10 sgRNAs. If 10 sgRNAs passing the most relaxed threshold could not be identified, the TSS was not targeted by the library.

sgRNA on-target and off-target prediction algorithms, library design scripts, and associated files are available at https://github.com/mhorlbeck/CRISPRiaDesign.

Negative controls

For each library, the frequency of each DNA base at each position along the sgRNA protospacer sequence was calculated. Random sgRNA protospacer sequences weighted by these base frequencies were then generated to mirror the composition of the targeting sgRNAs. These were then filtered for sgRNAs with 0 alignments with a mismatch score less than 31 proximal to the TSS and 0 alignments under 21 in the genome as above.

Library cloning

Protospacer sequences were appended with cloning sequences and then unique PCR adapters corresponding to the designated sublibrary. The half libraries were determined from the first 5 and second 5 sgRNAs selected into the library for each TSS according to the algorithm above. Genes were then partitioned into one of the 13 sublibraries defined by (Kampmann et al., 2015), compressed into the indicated 7 groupings. Each sgRNA was ordered as two oligonucleotide sequences to produce a narrower distribution of sgRNA representation. Overall, oligo sequences were 84 bp and had the following format:

  5’ pcr adapter      BstXI        protospacer            BlpI      3’ pcr adapter

==================CCACCTTGTTGGNNNNNNNNNNNNNNNNNNNGTTTAAGAGCTAAGCTG==================

                             ?|||||||||||||||||||

Genomic sequence:  ..........NNNNNNNNNNNNNNNNNNNNNGG.....

Oligonucleotides were synthesized by Agilent Technologies (RRID:SCR_013575; Santa Clara, CA) on 244K oligo arrays, and cloned into the sgRNA expression vector as previously described (Gilbert et al., 2014). The library sgRNA expression vector 'pCRISPRia-v2' was identical to the CRISPRi/a v1 plasmid (pU6-sgRNA EF1Alpha-puro-T2A-BFP, Addgene #60955) with the addition of two SbfI restriction sites used for sequencing sample processing.

Genome-scale CRISPRi and CRISPRa screen for essential genes

The screens for genes required for robust growth were conducted essentially as previously described (Gilbert et al., 2014). Briefly, plasmid libraries were packaged into lentivirus in HEK293T cells (RRID:CVCL_0063) and infected into a previously established polyclonal K562 cell line stably expressing dCas9-KRAB grown in 3L spinner flasks (Bellco, Vineland, NJ). After two days, infected cells were selected with 0.75 μg/mL puromycin (Tocris, Bristol, UK) for two days, allowed to recover for one day, and then cultured at a minimum of 750 × 106 cells in 1.5L standard media (RPMI-1640 with 10% Fetal Bovine Serum and 1x supplemental glutamine, penicillin, and streptomycin) from 'T0' to 'endpoint,' determined by ~10 cell doublings after T0. CRISPRi screen cells were mock-treated with 0.1% DMSO (Sigma-Aldrich, St. Louis, MO) but otherwise left untreated. Screens were performed as independent replicates starting from the infection step. The K562 dCas9-KRAB and SunTag-VP64 cell lines were obtained from (Gilbert et al., 2014) and had been constructed from K562 cells originally obtained from ATCC (RRID:CVCL_0004). Cytogenetic profiling by array comparative genomic hybridization (not shown) closely matched previous characterizations of the K562 cell line (Naumann et al., 2001). All cell lines tested negative for mycoplasma contamination (MycoAlert Kit, Lonza, Basel, Switzerland) in regular screenings.

Frozen samples of 250 × 106 cells collected at T0 and endpoint were processed as previously described (Gilbert et al., 2014), with the substitution of an SbfI restriction digest (SbfI-HF, New England Biolabs, Ipswich, MA) in place of the MfeI digest in the genomic DNA fragmentation and enrichment step. The sgRNA-encoding regions were sequenced on an Illumina HiSeq-4000 using custom primers. Sequencing reads were aligned to the expected library sequences using Bowtie (v1.0.0, [Ben Langmead et al., 2009]) and read counts were processed using custom Python scripts (available at https://github.com/mhorlbeck/ScreenProcessing) based on previously established shRNA screen analysis pipelines (Bassik et al., 2013; Kampmann et al., 2013). sgRNAs represented with fewer than 50 sequencing reads in both T0 and Endpoint were excluded from analysis. sgRNA growth phenotypes (γ) were calculated by normalizing sgRNA log2 enrichment from T0 to endpoint samples and normalizing by the number of cell doublings in this time period. CRISPRi v1 screen data from Gilbert et al. (2014) was re-analyzed using this pipeline, and the hCRISPRi/a-v2 5 sgRNA/gene libraries were evaluated by analyzing the sgRNA read counts corresponding to only the 5 sgRNA/gene sublibraries. Gene ontology analysis was conducted using DAVID 6.7 (Huang et al., 2009) with default search categories and with background lists representing the genes targeted by the CRISPRa v1 or hCRISPRa-v2 libraries where appropriate. For Figure 4B and Figure 4—figure supplement 1D, 'shared hit' genes were 70 genes that scored as strong anti-growth hits (phenotype z-score x –log10 p-value ≤ −10) in both CRISPRa v1 and hCRISPRa-v2.

Acknowledgements

We would like to thank Dr. Manuel Leonetti, Ben Barsi-Rhyne, Dr. Jonathan Friedman, and Dr. Jodi Nunnari for generously sharing unpublished screening data for determination of sgRNA activity. We would also like to thank Dr. Xuebing Wu and members of the Weissman lab, particularly Dr. Joshua Dunn and Manny DeVera, for helpful discussions and assistance. We thank Dr. Laurakay Bruhn, Dr. Daniel Ryan, Dr. Luke Fairbairn, and Dr. Peter Tsang of Agilent Technologies for their assistance on the design and synthesis of oligonucleotide pools. MAH, LAG, JEV, BA, YC, APF, and JSW were supported by the Howard Hughes Medical Institutes and the National Institutes of Heath (P50 GM102706, U01 CA168370, R01 DA036858). LAG was supported by the Leukemia and Lymphoma Society. RAP, CYP, and JEC were supported by the Li Ka Shing Foundation. MK was supported by NIH/NIGMS DP2 GM119139.

Funding Statement

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Funding Information

This paper was supported by the following grants:

  • Howard Hughes Medical Institute to Max A Horlbeck, Luke A Gilbert, Jacqueline E Villalta, Britt Adamson, Yuwen Chen, Alexander P Fields.

  • National Institutes of Health P50 GM102706 to Max A Horlbeck, Luke A Gilbert, Jacqueline E Villalta, Britt Adamson, Yuwen Chen, Alexander P Fields.

  • National Institutes of Health U01 CA168370 to Max A Horlbeck, Luke A Gilbert, Jacqueline E Villalta, Britt Adamson, Yuwen Chen, Alexander P Fields.

  • National Institutes of Health R01 DA036858 to Max A Horlbeck, Luke A Gilbert, Jacqueline E Villalta, Britt Adamson, Yuwen Chen, Alexander P Fields.

  • Leukemia and Lymphoma Society to Luke A Gilbert.

  • National Cancer Institute Pathway to Independence Award, K99 CA204602 to Luke A Gilbert.

  • Li Ka Shing Foundation to Ryan A Pak, Chong Yon Park, Jacob E Corn.

  • National Institute of General Medical Sciences DP2 GM119139 to Martin Kampmann.

Additional information

Competing interests

MAH: patent application related to CRISPRi and CRISPRa screening (PCT/US15/40449). JSW is a founder of, and MAH and LAG are consultants for, KSQ Therapeutics, a CRISPR functional genomics company.

LAG: filed a patent application related to CRISPRi and CRISPRa screening (PCT/US15/40449). JSW is a founder of, and MAH and LAG are consultants for, KSQ Therapeutics, a CRISPR functional genomics company.

MK: patent application related to CRISPRi and CRISPRa screening (PCT/US15/40449).

JSW: filed a patent application related to CRISPRi and CRISPRa screening (PCT/US15/40449). JSW is a founder of, and MAH and LAG are consultants for, KSQ Therapeutics, a CRISPR functional genomics company.

The other authors declare that no competing interests exist.

Author contributions

MAH, Conceived of and conducted algortihm development and data analysis, contributed to genome-scale screening experiments, and wrote this report.

LAG, Contributed to sgRNA library development and genome-scale screening experiments, and helped write this report.

JEV, Contributed to sgRNA library development, generated sgRNA libraries, and contributed technical assistance.

BA, Conducted genome-scale screening experiments.

RAP, Generated sgRNA libraries.

YC, Contributed technical assistance.

APF, Contributed to algorithm development.

CYP, Supervised sgRNA library generation and conducted genome-scale screening experiments.

JEC, Contributed to algorithm development and supervised sgRNA library generation.

MK, Contributed to and supervised algorithm development.

JSW, Conceived of and supervised the study, and helped write this report.

Additional files

Supplementary file 1. CRISPRi and CRISPRa activity score datasets.

DOI: http://dx.doi.org/10.7554/eLife.19760.013

elife-19760-supp1.xlsx (1.4MB, xlsx)
DOI: 10.7554/eLife.19760.013
Supplementary file 2. TSS annotations for hg19 and mm10 genomes.

DOI: http://dx.doi.org/10.7554/eLife.19760.014

elife-19760-supp2.xlsx (2.7MB, xlsx)
DOI: 10.7554/eLife.19760.014
Supplementary file 3. Library composition of hCRISPRi-v2 and hCRISPRi-v2.1.

DOI: http://dx.doi.org/10.7554/eLife.19760.015

elife-19760-supp3.xlsx (29.7MB, xlsx)
DOI: 10.7554/eLife.19760.015
Supplementary file 4. Library composition of mCRISPRi-v2.

DOI: http://dx.doi.org/10.7554/eLife.19760.016

elife-19760-supp4.xlsx (14.3MB, xlsx)
DOI: 10.7554/eLife.19760.016
Supplementary file 5. Library composition of hCRISPRa-v2.

DOI: http://dx.doi.org/10.7554/eLife.19760.017

elife-19760-supp5.xlsx (13.9MB, xlsx)
DOI: 10.7554/eLife.19760.017
Supplementary file 6. Library composition of mCRISPRa-v2.

DOI: http://dx.doi.org/10.7554/eLife.19760.018

elife-19760-supp6.xlsx (14.3MB, xlsx)
DOI: 10.7554/eLife.19760.018
Supplementary file 7. sgRNA read counts and growth phenotypes for hCRISPRi-v2 screens performed in K562.

DOI: http://dx.doi.org/10.7554/eLife.19760.019

elife-19760-supp7.xlsx (16.7MB, xlsx)
DOI: 10.7554/eLife.19760.019
Supplementary file 8. Gene growth phenotypes and p-values for hCRISPRi-v2 screens performed in K562.

DOI: http://dx.doi.org/10.7554/eLife.19760.020

elife-19760-supp8.xlsx (4.5MB, xlsx)
DOI: 10.7554/eLife.19760.020
Supplementary file 9. sgRNA read counts and growth phenotypes for hCRISPRa-v2 screens performed in K562.

DOI: http://dx.doi.org/10.7554/eLife.19760.021

elife-19760-supp9.xlsx (16.4MB, xlsx)
DOI: 10.7554/eLife.19760.021
Supplementary file 10. Gene growth phenotypes and p-values for hCRISPRa-v2 screens performed in K562.

DOI: http://dx.doi.org/10.7554/eLife.19760.022

elife-19760-supp10.xlsx (4.3MB, xlsx)
DOI: 10.7554/eLife.19760.022

Major datasets

The following previously published datasets were used:

FANTOM Consortium,2014,A promoter-level mammalian expression atlas,http://fantom.gsc.riken.jp/5/datafiles/phase1.3/extra/TSS_classifier/,Publicly available at FANTOM (files TSS_human.bed.gz, TSS_mouse.bed.gz)

ENCODE Consortium/Snyder,2011,Determinants of nucleosome organization in primary human cells,https://www.encodeproject.org/files/ENCFF000VNN,Publicly available at ENCODE (accession no. ENCFF000VNN)

ENCODE Consortium,2012,An integrated encyclopedia of DNA elements in the human genome,https://www.encodeproject.org/files/ENCFF000TLU,Publicly available at ENCODE (accession no. ENCFF000TLU)

References

  1. Aguirre AJ, Meyers RM, Weir BA, Vazquez F, Zhang CZ, Ben-David U, Cook A, Ha G, Harrington WF, Doshi MB, Kost-Alimova M, Gill S, Xu H, Ali LD, Jiang G, Pantel S, Lee Y, Goodale A, Cherniack AD, Oh C, Kryukov G, Cowley GS, Garraway LA, Stegmaier K, Roberts CW, Golub TR, Meyerson M, Root DE, Tsherniak A, Hahn WC. Genomic copy number dictates a gene-independent cell response to CRISPR/Cas9 targeting. Cancer Discovery. 2016;6:914–929. doi: 10.1158/2159-8290.CD-16-0154. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Bassik MC, Kampmann M, Lebbink RJ, Wang S, Hein MY, Poser I, Weibezahn J, Horlbeck MA, Chen S, Mann M, Hyman AA, Leproust EM, McManus MT, Weissman JS. A systematic mammalian genetic interaction map reveals pathways underlying ricin susceptibility. Cell. 2013;152:909–922. doi: 10.1016/j.cell.2013.01.030. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Braun CJ, Bruno PM, Horlbeck MA, Gilbert LA, Weissman JS, Hemann MT. Versatile in vivo regulation of tumor phenotypes by dCas9-mediated transcriptional perturbation. PNAS. 2016;113:E3892–E3900. doi: 10.1073/pnas.1600582113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Chari R, Mali P, Moosburner M, Church GM. Unraveling CRISPR-Cas9 genome engineering parameters via a library-on-library approach. Nature Methods. 2015;12:823–826. doi: 10.1038/nmeth.3473. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Chavez A, Scheiman J, Vora S, Pruitt BW, Tuttle M, P R Iyer E, Lin S, Kiani S, Guzman CD, Wiegand DJ, Ter-Ovanesyan D, Braff JL, Davidsohn N, Housden BE, Perrimon N, Weiss R, Aach J, Collins JJ, Church GM, Iyer PR, Lin E, Guzman S. Highly efficient Cas9-mediated transcriptional programming. Nature Methods. 2015;12:326–328. doi: 10.1038/nmeth.3312. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Chavez A, Tuttle M, Pruitt BW, Ewen-Campen B, Chari R, Ter-Ovanesyan D, Haque SJ, Cecchi RJ, Kowal EJ, Buchthal J, Housden BE, Perrimon N, Collins JJ, Church G, Haque D. Comparison of Cas9 activators in multiple species. Nature Methods. 2016;13:563–567. doi: 10.1038/nmeth.3871. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Chen B, Gilbert LA, Cimini BA, Schnitzbauer J, Zhang W, Li GW, Park J, Blackburn EH, Weissman JS, Qi LS, Huang B. Dynamic imaging of genomic loci in living human cells by an optimized CRISPR/Cas system. Cell. 2013;155:1479–1491. doi: 10.1016/j.cell.2013.12.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Chen S, Sanjana NE, Zheng K, Shalem O, Lee K, Shi X, Scott DA, Song J, Pan JQ, Weissleder R, Lee H, Zhang F, Sharp PA. Genome-wide CRISPR screen in a mouse model of tumor growth and metastasis. Cell. 2015;160:1246–1260. doi: 10.1016/j.cell.2015.02.038. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Cock PJ, Antao T, Chang JT, Chapman BA, Cox CJ, Dalke A, Friedberg I, Hamelryck T, Kauff F, Wilczynski B, de Hoon MJ. Biopython: freely available python tools for computational molecular biology and bioinformatics. Bioinformatics. 2009;25:1422–1423. doi: 10.1093/bioinformatics/btp163. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Costanzo M, Baryshnikova A, Bellay J, Kim Y, Spear ED, Sevier CS, Ding H, Koh JL, Toufighi K, Mostafavi S, Prinz J, St Onge RP, VanderSluis B, Makhnevych T, Vizeacoumar FJ, Alizadeh S, Bahr S, Brost RL, Chen Y, Cokol M, Deshpande R, Li Z, Lin ZY, Liang W, Marback M, Paw J, San Luis BJ, Shuteriqi E, Tong AH, van Dyk N, Wallace IM, Whitney JA, Weirauch MT, Zhong G, Zhu H, Houry WA, Brudno M, Ragibizadeh S, Papp B, Pál C, Roth FP, Giaever G, Nislow C, Troyanskaya OG, Bussey H, Bader GD, Gingras AC, Morris QD, Kim PM, Kaiser CA, Myers CL, Andrews BJ, Boone C. The genetic landscape of a cell. Science. 2010;327:425–431. doi: 10.1126/science.1180823. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Cunningham F, Amode MR, Barrell D, Beal K, Billis K, Brent S, Carvalho-Silva D, Clapham P, Coates G, Fitzgerald S, Gil L, Girón CG, Gordon L, Hourlier T, Hunt SE, Janacek SH, Johnson N, Juettemann T, Kähäri AK, Keenan S, Martin FJ, Maurel T, McLaren W, Murphy DN, Nag R, Overduin B, Parker A, Patricio M, Perry E, Pignatelli M, Riat HS, Sheppard D, Taylor K, Thormann A, Vullo A, Wilder SP, Zadissa A, Aken BL, Birney E, Harrow J, Kinsella R, Muffato M, Ruffier M, Searle SM, Spudich G, Trevanion SJ, Yates A, Zerbino DR, Flicek P. Ensembl 2015. Nucleic Acids Research. 2015;43:D662–D669. doi: 10.1093/nar/gku1010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Dang Y, Jia G, Choi J, Ma H, Anaya E, Ye C, Shankar P, Wu H. Optimizing sgRNA structure to improve CRISPR-Cas9 knockout efficiency. Genome Biology. 2015;16:631–633. doi: 10.1186/s13059-015-0846-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Doench JG, Fusi N, Sullender M, Hegde M, Vaimberg EW, Donovan KF, Smith I, Tothova Z, Wilen C, Orchard R, Virgin HW, Listgarten J, Root DE. Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR-Cas9. Nature Biotechnology. 2016;34:184–191. doi: 10.1038/nbt.3437. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Doench JG, Hartenian E, Graham DB, Tothova Z, Hegde M, Smith I, Sullender M, Ebert BL, Xavier RJ, Root DE. Rational design of highly active sgRNAs for CRISPR-Cas9-mediated gene inactivation. Nature Biotechnology. 2014;32:1262–U130. doi: 10.1038/nbt.3026. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. ENCODE Project Consortium An integrated encyclopedia of DNA elements in the human genome. Nature. 2012;489:57–74. doi: 10.1038/nature11247. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Evers B, Jastrzebski K, Heijmans JP, Grernrum W, Beijersbergen RL, Bernards R. CRISPR knockout screening outperforms shRNA and CRISPRi in identifying essential genes. Nature Biotechnology. 2016;34:631–3. doi: 10.1038/nbt.3536. [DOI] [PubMed] [Google Scholar]
  17. FANTOM Consortium and the RIKEN PMI and CLST (DGT) Forrest AR, Kawaji H, Rehli M, Baillie JK, de Hoon MJ, Haberle V, Lassmann T, Kulakovskiy IV, Lizio M, Itoh M, Andersson R, Mungall CJ, Meehan TF, Schmeier S, Bertin N, Jørgensen M, Dimont E, Arner E, Schmidl C, Schaefer U, Medvedeva YA, Plessy C, Vitezic M, Severin J, Semple C, Ishizu Y, Young RS, Francescatto M, Alam I, Albanese D, Altschuler GM, Arakawa T, Archer JA, Arner P, Babina M, Rennie S, Balwierz PJ, Beckhouse AG, Pradhan-Bhatt S, Blake JA, Blumenthal A, Bodega B, Bonetti A, Briggs J, Brombacher F, Burroughs AM, Califano A, Cannistraci CV, Carbajo D, Chen Y, Chierici M, Ciani Y, Clevers HC, Dalla E, Davis CA, Detmar M, Diehl AD, Dohi T, Drabløs F, Edge AS, Edinger M, Ekwall K, Endoh M, Enomoto H, Fagiolini M, Fairbairn L, Fang H, Farach-Carson MC, Faulkner GJ, Favorov AV, Fisher ME, Frith MC, Fujita R, Fukuda S, Furlanello C, Furino M, Furusawa J, Geijtenbeek TB, Gibson AP, Gingeras T, Goldowitz D, Gough J, Guhl S, Guler R, Gustincich S, Ha TJ, Hamaguchi M, Hara M, Harbers M, Harshbarger J, Hasegawa A, Hasegawa Y, Hashimoto T, Herlyn M, Hitchens KJ, Ho Sui SJ, Hofmann OM, Hoof I, Hori F, Huminiecki L, Iida K, Ikawa T, Jankovic BR, Jia H, Joshi A, Jurman G, Kaczkowski B, Kai C, Kaida K, Kaiho A, Kajiyama K, Kanamori-Katayama M, Kasianov AS, Kasukawa T, Katayama S, Kato S, Kawaguchi S, Kawamoto H, Kawamura YI, Kawashima T, Kempfle JS, Kenna TJ, Kere J, Khachigian LM, Kitamura T, Klinken SP, Knox AJ, Kojima M, Kojima S, Kondo N, Koseki H, Koyasu S, Krampitz S, Kubosaki A, Kwon AT, Laros JF, Lee W, Lennartsson A, Li K, Lilje B, Lipovich L, Mackay-Sim A, Manabe R, Mar JC, Marchand B, Mathelier A, Mejhert N, Meynert A, Mizuno Y, de Lima Morais DA, Morikawa H, Morimoto M, Moro K, Motakis E, Motohashi H, Mummery CL, Murata M, Nagao-Sato S, Nakachi Y, Nakahara F, Nakamura T, Nakamura Y, Nakazato K, van Nimwegen E, Ninomiya N, Nishiyori H, Noma S, Noma S, Noazaki T, Ogishima S, Ohkura N, Ohimiya H, Ohno H, Ohshima M, Okada-Hatakeyama M, Okazaki Y, Orlando V, Ovchinnikov DA, Pain A, Passier R, Patrikakis M, Persson H, Piazza S, Prendergast JG, Rackham OJ, Ramilowski JA, Rashid M, Ravasi T, Rizzu P, Roncador M, Roy S, Rye MB, Saijyo E, Sajantila A, Saka A, Sakaguchi S, Sakai M, Sato H, Savvi S, Saxena A, Schneider C, Schultes EA, Schulze-Tanzil GG, Schwegmann A, Sengstag T, Sheng G, Shimoji H, Shimoni Y, Shin JW, Simon C, Sugiyama D, Sugiyama T, Suzuki M, Suzuki N, Swoboda RK, 't Hoen PA, Tagami M, Takahashi N, Takai J, Tanaka H, Tatsukawa H, Tatum Z, Thompson M, Toyodo H, Toyoda T, Valen E, van de Wetering M, van den Berg LM, Verado R, Vijayan D, Vorontsov IE, Wasserman WW, Watanabe S, Wells CA, Winteringham LN, Wolvetang E, Wood EJ, Yamaguchi Y, Yamamoto M, Yoneda M, Yonekura Y, Yoshida S, Zabierowski SE, Zhang PG, Zhao X, Zucchelli S, Summers KM, Suzuki H, Daub CO, Kawai J, Heutink P, Hide W, Freeman TC, Lenhard B, Bajic VB, Taylor MS, Makeev VJ, Sandelin A, Hume DA, Carninci P, Hayashizaki Y. A promoter-level mammalian expression atlas. Nature. 2014;507:462–470. doi: 10.1038/nature13182. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Gilbert LA, Horlbeck MA, Adamson B, Villalta JE, Chen Y, Whitehead EH, Guimaraes C, Panning B, Ploegh HL, Bassik MC, Qi LS, Kampmann M, Weissman JS. Genome-Scale CRISPR-Mediated Control of Gene Repression and Activation. Cell. 2014;159:647–661. doi: 10.1016/j.cell.2014.09.029. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Gilbert LA, Larson MH, Morsut L, Liu Z, Brar GA, Torres SE, Stern-Ginossar N, Brandman O, Whitehead EH, Doudna JA, Lim WA, Weissman JS, Qi LS. CRISPR-mediated modular RNA-guided regulation of transcription in eukaryotes. Cell. 2013;154:442–451. doi: 10.1016/j.cell.2013.06.044. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Harrow J, Frankish A, Gonzalez JM, Tapanari E, Diekhans M, Kokocinski F, Aken BL, Barrell D, Zadissa A, Searle S, Barnes I, Bignell A, Boychenko V, Hunt T, Kay M, Mukherjee G, Rajan J, Despacio-Reyes G, Saunders G, Steward C, Harte R, Lin M, Howald C, Tanzer A, Derrien T, Chrast J, Walters N, Balasubramanian S, Pei B, Tress M, Rodriguez JM, Ezkurdia I, van Baren J, Brent M, Haussler D, Kellis M, Valencia A, Reymond A, Gerstein M, Guigó R, Hubbard TJ. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Research. 2012;22:1760–1774. doi: 10.1101/gr.135350.111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Hart T, Brown KR, Sircoulomb F, Rottapel R, Moffat J. Measuring error rates in genomic perturbation screens: gold standards for human functional genomics. Molecular Systems Biology. 2014;10:e19760. doi: 10.15252/msb.20145216. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Hart T, Chandrashekhar M, Aregger M, Steinhart Z, Brown KR, MacLeod G, Mis M, Zimmermann M, Fradet-Turcotte A, Sun S, Mero P, Dirks P, Sidhu S, Roth FP, Rissland OS, Durocher D, Angers S, Moffat J. High-resolution CRISPR screens reveal fitness genes and genotype-specific cancer liabilities. Cell. 2015;163:1515–1526. doi: 10.1016/j.cell.2015.11.015. [DOI] [PubMed] [Google Scholar]
  23. Hilton IB, D'Ippolito AM, Vockley CM, Thakore PI, Crawford GE, Reddy TE, Gersbach CA. Epigenome editing by a CRISPR-Cas9-based acetyltransferase activates genes from promoters and enhancers. Nature Biotechnology. 2015;33:510–517. doi: 10.1038/nbt.3199. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Hinz JM, Laughery MF, Wyrick JJ. Nucleosomes Inhibit Cas9 Endonuclease Activity in Vitro. Biochemistry. 2015;54:7063–7066. doi: 10.1021/acs.biochem.5b01108. [DOI] [PubMed] [Google Scholar]
  25. Horlbeck MA, Witkowsky LB, Guglielmi B, Replogle JM, Gilbert LA, Villalta JE, Torigoe SE, Tjian R, Weissman JS. Nucleosomes impede Cas9 access to DNA in vivo and in vitro. eLife. 2016;5:e19760. doi: 10.7554/eLife.12677. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Huang da W, Sherman BT, Lempicki RA. Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Research. 2009;37:1–13. doi: 10.1093/nar/gkn923. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Zou H, Hastie T. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B. 2005;67:301–320. doi: 10.1111/j.1467-9868.2005.00503.x. [DOI] [Google Scholar]
  28. Isaac RS, Jiang F, Doudna JA, Lim WA, Narlikar GJ, Almeida R. Nucleosome breathing and remodeling constrain CRISPR-Cas9 function. eLife. 2016;5:e19760. doi: 10.7554/eLife.13450. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Kabadi AM, Ousterout DG, Hilton IB, Gersbach CA. Multiplex CRISPR/Cas9-based genome engineering from a single lentiviral vector. Nucleic Acids Research. 2014;42:e19760. doi: 10.1093/nar/gku749. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Kampmann M, Bassik MC, Weissman JS. Integrated platform for genome-wide screening and construction of high-density genetic interaction maps in mammalian cells. PNAS. 2013;110:E2317–E2326. doi: 10.1073/pnas.1307002110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Kampmann M, Horlbeck MA, Chen Y, Tsai JC, Bassik MC, Gilbert LA, Villalta JE, Kwon SC, Chang H, Kim VN, Weissman JS. Next-generation libraries for robust RNA interference-based genome-wide screens. PNAS. 2015;112:E3384–E3391. doi: 10.1073/pnas.1508821112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Klein E, Ben-Bassat H, Neumann H, Ralph P, Zeuthen J, Polliack A, Vánky F. Properties of the K562 cell line, derived from a patient with chronic myeloid leukemia. International Journal of Cancer. 1976;18:421–431. doi: 10.1002/ijc.2910180405. [DOI] [PubMed] [Google Scholar]
  33. Komor AC, Kim YB, Packer MS, Zuris JA, Liu DR. Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage. Nature. 2016;533:420–424. doi: 10.1038/nature17946. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Konermann S, Brigham MD, Trevino AE, Joung J, Abudayyeh OO, Barcena C, Hsu PD, Habib N, Gootenberg JS, Nishimasu H, Nureki O, Zhang F. Genome-scale transcriptional activation by an engineered CRISPR-Cas9 complex. Nature. 2015;517:583–588. doi: 10.1038/nature14136. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biology. 2009;10:e19760. doi: 10.1186/gb-2009-10-3-r25. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Liberali P, Snijder B, Pelkmans L. Single-cell and multivariate approaches in genetic perturbation screens. Nature Reviews Genetics. 2015;16:18–32. doi: 10.1038/nrg3768. [DOI] [PubMed] [Google Scholar]
  37. Lorenz R, Bernhart SH, Höner Zu Siederdissen C, Tafer H, Flamm C, Stadler PF, Hofacker IL. ViennaRNA Package 2.0. Algorithms for Molecular Biology. 2011;6:e19760. doi: 10.1186/1748-7188-6-26. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Luo S, Lu JY, Liu L, Yin Y, Chen C, Han X, Wu B, Xu R, Liu W, Yan P, Shao W, Lu Z, Li H, Na J, Tang F, Wang J, Zhang YE, Shen X. Divergent lncRNAs regulate gene expression and lineage differentiation in pluripotent cells. Cell Stem Cell. 2016;18:637–652. doi: 10.1016/j.stem.2016.01.024. [DOI] [PubMed] [Google Scholar]
  39. Maeder ML, Linder SJ, Cascio VM, Fu Y, Ho QH, Joung JK. CRISPR RNA-guided activation of endogenous human genes. Nature Methods. 2013;10:977–979. doi: 10.1038/nmeth.2598. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Mandegar MA, Huebsch N, Frolov EB, Shin E, Truong A, Olvera MP, Chan AH, Miyaoka Y, Holmes K, Spencer CI, Judge LM, Gordon DE, Eskildsen TV, Villalta JE, Horlbeck MA, Gilbert LA, Krogan NJ, Sheikh SP, Weissman JS, Qi LS, So PL, Conklin BR. CRISPR Interference Efficiently Induces Specific and Reversible Gene Silencing in Human iPSCs. Cell Stem Cell. 2016;18:541–553. doi: 10.1016/j.stem.2016.01.022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Munoz DM, Cassiani PJ, Li L, Billy E, Korn JM, Jones MD, Golji J, Ruddy DA, Yu K, McAllister G, DeWeck A, Abramowski D, Wan J, Shirley MD, Neshat SY, Rakiec D, de Beaumont R, Weber O, Kauffmann A, McDonald ER, Keen N, Hofmann F, Sellers WR, Schmelzle T, Stegmeier F, Schlabach MR. CRISPR Screens provide a comprehensive assessment of cancer vulnerabilities but generate false-positive hits for highlyamplified genomic regions. Cancer Discovery. 2016;6:900–913. doi: 10.1158/2159-8290.CD-16-0178. [DOI] [PubMed] [Google Scholar]
  42. Naumann S, Reutzel D, Speicher M, Decker HJ. Complete karyotype characterization of the K562 cell line by combined application of G-banding, multiplex-fluorescence in situ hybridization, fluorescence in situ hybridization, and comparative genomic hybridization. Leukemia Research. 2001;25:313–322. doi: 10.1016/S0145-2126(00)00125-9. [DOI] [PubMed] [Google Scholar]
  43. Paddison PJ, Silva JM, Conklin DS, Schlabach M, Li M, Aruleba S, Balija V, O'Shaughnessy A, Gnoj L, Scobie K, Chang K, Westbrook T, Cleary M, Sachidanandam R, McCombie WR, Elledge SJ, Hannon GJ. A resource for large-scale RNA-interference-based screens in mammals. Nature. 2004;428:427–431. doi: 10.1038/nature02370. [DOI] [PubMed] [Google Scholar]
  44. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V. Scikit-learn: Machine learning in python. The Journal of Machine Learning Research. 2011;12:2825–2830. [Google Scholar]
  45. Perez-Pinera P, Kocak DD, Vockley CM, Adler AF, Kabadi AM, Polstein LR, Thakore PI, Glass KA, Ousterout DG, Leong KW, Guilak F, Crawford GE, Reddy TE, Gersbach CA, Thakore LR, Glass PI. RNA-guided gene activation by CRISPR-Cas9–based transcription factors. Nature Methods. 2013;10:973–976. doi: 10.1038/nmeth.2600. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Qi LS, Larson MH, Gilbert LA, Doudna JA, Weissman JS, Arkin AP, Lim WA. Repurposing CRISPR as an RNA-guided platform for sequence-specific control of gene expression. Cell. 2013;152:1173–1183. doi: 10.1016/j.cell.2013.02.022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26:841–842. doi: 10.1093/bioinformatics/btq033. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Radzisheuskaya A, Shlyueva D, Müller I, Helin K. Optimizing sgRNA position markedly improves the efficiency of CRISPR/dCas9-mediated transcriptional repression. Nucleic Acids Research. 2016:e19760. doi: 10.1093/nar/gkw583. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Schuldiner M, Collins SR, Thompson NJ, Denic V, Bhamidipati A, Punna T, Ihmels J, Andrews B, Boone C, Greenblatt JF, Weissman JS, Krogan NJ. Exploration of the function and organization of the yeast early secretory pathway through an epistatic miniarray profile. Cell. 2005;123:507–519. doi: 10.1016/j.cell.2005.08.031. [DOI] [PubMed] [Google Scholar]
  50. Shalem O, Sanjana NE, Zhang F. High-throughput functional genomics using CRISPR-Cas9. Nature Reviews Genetics. 2015;16:299–311. doi: 10.1038/nrg3899. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Shao S, Zhang W, Hu H, Xue B, Qin J, Sun C, Sun Y, Wei W, Sun Y. Long-term dual-color tracking of genomic loci by modified sgRNAs of the CRISPR/Cas9 system. Nucleic Acids Research. 2016;44:e19760. doi: 10.1093/nar/gkw066. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Tanenbaum ME, Gilbert LA, Qi LS, Weissman JS, Vale RD. A protein-tagging system for signal amplification in gene expression and fluorescence imaging. Cell. 2014;159:635–646. doi: 10.1016/j.cell.2014.09.039. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Tsai SQ, Wyvekens N, Khayter C, Foden JA, Thapar V, Reyon D, Goodwin MJ, Aryee MJ, Joung JK. Dimeric CRISPR RNA-guided FokI nucleases for highly specific genome editing. Nature Biotechnology. 2014;32:569–576. doi: 10.1038/nbt.2908. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Vojta A, Dobrinić P, Tadić V, Bočkor L, Korać P, Julg B, Klasić M, Zoldoš V. Repurposing the CRISPR-Cas9 system for targeted DNA methylation. Nucleic Acids Research. 2016;44:e19760. doi: 10.1093/nar/gkw159. [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Wang T, Birsoy K, Hughes NW, Krupczak KM, Post Y, Wei JJ, Lander ES, Sabatini DM. Identification and characterization of essential genes in the human genome. Science. 2015;350:1096–1101. doi: 10.1126/science.aac7041. [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Wong ASL, Choi GCG, Cui CH, Pregernig G, Milani P, Adam M, Perli SD, Kazer SW, Gaillard A, Hermann M, Shalek AK, Fraenkel E, Lu TK. Multiplexed barcoded CRISPR-Cas9 screening enabled by CombiGEM. PNAS. 2016;113:2544–2549. doi: 10.1073/pnas.1517883113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. Wu SQ, Voelkerding KV, Sabatini L, Chen XR, Huang J, Meisner LF. Extensive amplification of bcr/abl fusion genes clustered on three marker chromosomes in human leukemic cell line K-562. Leukemia. 1995;9:858–862. [PubMed] [Google Scholar]
  58. Xu H, Xiao T, Chen CH, Li W, Meyer CA, Wu Q, Wu D, Cong L, Zhang F, Liu JS, Brown M, Liu XS. Sequence determinants of improved CRISPR sgRNA design. Genome Research. 2015;25:1147–1157. doi: 10.1101/gr.191452.115. [DOI] [PMC free article] [PubMed] [Google Scholar]
  59. Zalatan JG, Lee ME, Almeida R, Gilbert LA, Whitehead EH, La Russa M, Tsai JC, Weissman JS, Dueber JE, Qi LS, Lim WA. Engineering complex synthetic transcriptional programs with CRISPR RNA Scaffolds. Cell. 2015;160:339–350. doi: 10.1016/j.cell.2014.11.052. [DOI] [PMC free article] [PubMed] [Google Scholar]
  60. Zhao Y, Dai Z, Liang Y, Yin M, Ma K, He M, Ouyang H, Teng CB. Sequence-specific inhibition of microRNA via CRISPR/CRISPRi system. Scientific Reports. 2014;4:e19760. doi: 10.1038/srep03943. [DOI] [PMC free article] [PubMed] [Google Scholar]
eLife. 2016 Sep 23;5:e19760. doi: 10.7554/eLife.19760.030

Decision letter

Editor: Karen Adelman1

In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.

Thank you for submitting your article "Compact and highly active next-generation libraries for CRISPR-mediated gene repression and activation" for consideration by eLife. Your article has been favorably evaluated by Jessica Tyler (Senior Editor) and two reviewers, one of whom is a member of our Board of Reviewing Editors. The reviewers have opted to remain anonymous.

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

Summary:

Pooled CRISPR screenings have become important methodologies to quickly associate genes with their functions in a high-throughput fashion. These platforms include gene knockout screens, gene inactivation (CRISPRi) and activation (CRISPRa). For such powerful technologies, it is important to further boost their efficacy and minimize any potential off-target effect. This manuscript deals with this very issue, and describes a systematic analysis of features that enable effective short guide RNA (sgRNA) activity in CRISPRi or CRISPRa screens. The result is new, highly effective libraries containing 5 or 10 sgRNAs per gene. These new libraries are compact and will be powerful tools for discovery of gene function moving forward.

Specifically, the manuscript details the use of machine learning algorithms and other informatic approaches to determine what features render a sgRNA most effective. Key aspects involve targeting a nucleosome-depleted region and the position from the observed transcription start site (using the more accurate FANTOM positions for TSS designation). These properties, coupled with specific sequence features can be combined to predict very active sgRNAs. The authors do a great job demonstrating the effectiveness of their predictions and the new CRISPRi libraries. This clearly establishes the promise of CRISPRi as a potent methodology that avoids a number of the pitfalls that using catalytically active CRISPR/Cas9 entails.

To facilitate dissemination of this knowledge, they have made the libraries available on Addgene (in 5 or 10 sgRNA per gene varieties) that target mouse or human genes. Further, they share the sequences of the best sgRNAs in supplemental tables, additionally increasing the impact and breadth of this work.

In short, this is a nicely written story with convincing data. I have only a few changes to suggest to the text and display items. These are aimed at making the manuscript maximally helpful to people in the field who might be designing their own guides against non-coding RNAs, as well as increasing the interest for those who are just curious about how CRISPRi works.

Essential revisions:

1) Authors claimed that the next-generation libraries for CRISPR-mediated gene repression and activation have higher activity than the old version. The experimental demonstration for new version of CRISPRa is completely missing. It's therefore premature to make such statement in the title and in the context. This issue should be clarified.

2) The new version of library outperform the old version, mostly based on statistic prediction and analysis. However, one might argue that the original library is "good enough" with "sub-optimal" performance in the identification of genes' function in majority of cases. It would be much more convincing if they could show examples of genes identified from new libraries that would be missing in the old fashion way. In this case, comprehensive validation of these candidate genes are needed to make such a claim.

3) Authors said that the new CRISPRi screen has undetectable non-specific toxicity seen with CRISPR nuclease approaches. This is actually the feature of CRISPRi, not the new design of CRISPRi library. It's misleading to give such credit to the new algorithm of design.

4) Likewise, we couldn't quite get the point why the new library would have improved off-target effects. It's understandable that the new design might improve the on-target activity. As to the off-target rate, it's puzzling to understand the mechanism behind this observation.

5) Although the effects of nucleosomes (DNAse, MNase, FAIRE) on positioning of sgRNAs for CRISPRi is strong, the importance for CRISPRa seems quite modest from the data presented- with position relative to the TSS being much more important. Why might this be? I am not convinced that this means that activating guide RNAs are indifferent to nucleosomes (and the authors don't suggest this). So why the difference? It would be helpful to the community to get some comment on this. As the manuscript stands, one could interpret the findings to say that one needed to target nucleosome-depleted regions to inhibit gene expression, but that activating guides could more readily penetrate chromatin, which seems unlikely. Could the authors clarify?

One possibility is that the optimal location upstream of the TSS (which appears to be -100 to -200 upstream) is typically nucleosome depleted at most active genes and so there isn't much dynamic range in the nucleosome signal detected in this region. This would compress the information you could get out of this parameter, perhaps making it seem less important because there was less variability among genes.

6) Given the above distinction, it would be preferable to show the score contribution for 'target site position relative to TSS' and 'target site chromatin accessibility' separately for CRISPRi (Figure 1C) as well as CRISPRa (Figure 2C).

7) Can the authors comment on the very strong peak of effectiveness for CRISPRi sgRNAs just downstream of the TSS? This likely reflects something in addition to nucleosome-deprivation as being helpful for CRISPRi. I find the very sharp peak in maximal activity in Figure 1—figure supplement 1 to be really striking (even in comparison to previous work using less well-refined TSSs), and would encourage its inclusion in the final manuscript.

As the authors have noted before, CRISPRi really works best when the guide is positioned just downstream of the start site- perhaps because it is more effective to block early transcription elongation and the release from Pol2 pausing, rather than farther downstream once Pol2 is loaded with the machinery to plow through chromatin etc.?

For those working to block expression of novel or non-coding RNAs, we think it is worth getting this idea out there in a super-clear and obvious way- that the sweet spot for optimal guides is right downstream of the promoter.

8) Regarding the basic sgRNA prediction algorithms developed, will these be shared upon request? We see a github site for the tool created to assess chromatin features, but not for the broader prediction platform. Could the authors indicate in the Methods how an interested user might get help with sgRNA predictions?

eLife. 2016 Sep 23;5:e19760. doi: 10.7554/eLife.19760.031

Author response


In short, this is a nicely written story with convincing data. I have only a few changes to suggest to the text and display items. These are aimed at making the manuscript maximally helpful to people in the field who might be designing their own guides against non-coding RNAs, as well as increasing the interest for those who are just curious about how CRISPRi works.

Essential revisions:

1) Authors claimed that the next-generation libraries for CRISPR-mediated gene repression and activation have higher activity than the old version. The experimental demonstration for new version of CRISPRa is completely missing. It's therefore premature to make such statement in the title and in the context. This issue should be clarified.

We have now experimentally validated our hCRISPRa-v2 library by conducting a screen for genes that affect robust cell growth upon overexpression with CRISPRa. These data are presented in Figure 4 and the accompanying supplemental figure, and discussed in a new Results section. As with our hCRISPRi-v2 validation screen, we find that our next generation library outperforms CRISPRa v1 in number of hit genes identified (60% more hits and greater enrichment for functional categories; Figure 4A), enrichment for highly active sgRNAs targeting each gene (Figure 4B), and identification of new hit genes missed due to TSS mis-annotation (Figure 4C and Figure 4—figure supplement 1E).

2) The new version of library outperform the old version, mostly based on statistic prediction and analysis. However, one might argue that the original library is "good enough" with "sub-optimal" performance in the identification of genes' function in majority of cases. It would be much more convincing if they could show examples of genes identified from new libraries that would be missing in the old fashion way. In this case, comprehensive validation of these candidate genes are needed to make such a claim.

We share the reviewers’ view that improvement of reagents should result in better identification of gene function for such an improvement to be of practical value. We feel that our data demonstrate the tangible benefits of the hCRISPRi-v2 library over the original CRISPRi v1 in several ways:

A) The hCRISPRi-v2 library recalls a further 19 of the 218 gold standard essential genes (Hart et al., 2014) at 95% precision than CRISPRi v1 (Figure 3D). Of these 19 genes, 16 were also identified by the K562 CRISPR nuclease screen (Wang et al., Science 2015), providing further evidence for the validity of the gold standard gene set.

B) Similarly, the hCRISPRi-v2 library with just 5 sgRNAs per gene identifies 15 more of the gold standard essential genes than CRISPRi v1. That screen performance is improved over v1 with half the library size is a marked advance for many screening approaches where the scale of cell culture, FACS, etc. is limiting.

C) One essential gene included in the gold standard set, VCP, was not identified in our CRISPRi v1 screen but was a strong hit with our hCRISPRi-v2 growth screen. This improvement is likely due both to improved sgRNA selection and the use of the FANTOM TSS annotation, and is included as Figure 3—figure supplement 1C.

3) Authors said that the new CRISPRi screen has undetectable non-specific toxicity seen with CRISPR nuclease approaches. This is actually the feature of CRISPRi, not the new design of CRISPRi library. It's misleading to give such credit to the new algorithm of design.

The non-specific toxicity mediated by CRISPR nuclease at amplified loci has only recently been reported by several groups (Wang et al., Science 2015; Aguirre et al., Cancer Discovery 2016; Munoz et al., Cancer Discovery 2016), but to our knowledge has not yet been investigated for CRISPRi. Therefore, we felt it timely to examine our CRISPRi data for this effect, and indeed found no toxicity at amplified or normal copy number loci in contrast to CRISPR nuclease. We agree with the reviewers that the undetectable toxicity due to DNA cleavage is very likely to be a general feature of CRISPRi, and by using the hCRISPRi-v2 results we merely intended to demonstrate this feature in a comprehensive way. We have now revised the text to clarify this point (Abstract; Results subsection “The hCRISPRa-v2 library identifies more genes that modify robust growth rates upon overexpression”).

4) Likewise, we couldn't quite get the point why the new library would have improved off-target effects. It's understandable that the new design might improve the on-target activity. As to the off-target rate, it's puzzling to understand the mechanism behind this observation.

We did make several adjustments to our off-target scoring algorithm that we expected to improve our ability to predict relevant off-target sites, most notably an increase in the sensitivity used in our Bowtie alignments and a more stringent filter for sgRNAs binding at off-target sites near TSSs (and thus more likely to cause off-target effects with CRISPRi and CRISPRa). However, the adjustments to our off-target predictions were a minor component of our design and indeed the off-target effects were quite low in the v1 as documented in Gilbert et al., 2014. We have removed references to this improvement in the Abstract and we have revised our language on off-target effects to reflect that this represents an algorithmic change rather than a major practical improvement (Results, Methods).

5) Although the effects of nucleosomes (DNAse, MNase, FAIRE) on positioning of sgRNAs for CRISPRi is strong, the importance for CRISPRa seems quite modest from the data presented- with position relative to the TSS being much more important. Why might this be? I am not convinced that this means that activating guide RNAs are indifferent to nucleosomes (and the authors don't suggest this). So why the difference? It would be helpful to the community to get some comment on this. As the manuscript stands, one could interpret the findings to say that one needed to target nucleosome-depleted regions to inhibit gene expression, but that activating guides could more readily penetrate chromatin, which seems unlikely. Could the authors clarify?

One possibility is that the optimal location upstream of the TSS (which appears to be -100 to -200 upstream) is typically nucleosome depleted at most active genes and so there isn't much dynamic range in the nucleosome signal detected in this region. This would compress the information you could get out of this parameter, perhaps making it seem less important because there was less variability among genes.

6) Given the above distinction, it would be preferable to show the score contribution for 'target site position relative to TSS' and 'target site chromatin accessibility' separately for CRISPRi (Figure 1C) as well as CRISPRa (Figure 2C).

In modeling the effects of chromatin accessibility, we incorporated sequencing-based readouts (DNase, MNase, FAIRE) as additional support but found that the strongest predictor of sgRNA activity was distance to the FANTOM-annotated TSS, which naturally convolutes both the stereotyped pattern of nucleosome positions (Jiang and Pugh, Nature Reviews Genetics 2009) as well as the effectiveness of CRISPRi/a effector domains relative to the TSS. Beyond the predictive value of this relationship, FANTOM TSS annotations are derived from CAGE-seq data from hundreds of cell lines and primary tissues, enabling us to reliably predict active sgRNAs across cell lines rather than relying on the nucleosome positioning in a particular cell line for which it happened to be measured. The applicability of our sgRNA predictions across cell lines is tested in Figure 3A.

In using support vector regression to fit the patterns for CRISPRi and CRISPRa relative to the TSS and predict sgRNA activity (Figure 1—figure supplement 1 and Figure 2—figure supplement 1), however, the effects of nucleosome positioning and distance to TSS are convoluted such that the components difficult to evaluate separately. We have amended the axis labels on Figures 1C and 2C to clarify the information obtained from TSS positioning and the other sequencing-based chromatin measurements.

We share with the reviewers’ observation that the periodic relationship between CRISPRa activity and distance to the TSS is less pronounced than for CRISPRi (Figure 2—figure supplement 1), particularly in the region of maximal CRISPRa effectiveness. We also agree that this is unlikely to be a distinct feature of CRISPRa, as there are indeed troughs of CRISPRa activity downstream of the TSS and upstream of the effective CRISPRa region, and previous in vitro results demonstrated that the nucleosomes directly impeded (d)Cas9 activity independent of any effector domain (Horlbeck et al. eLife 2016; Isaac et al. eLife 2016). Instead, we believe the difference between CRISPRi and CRISPRa accessibility patterns is due in part to the limited dynamic range in this region, as the reviewers suggested; in part to the lower expression levels of genes most sensitive to CRISPRa overexpression (Gilbert et al. Cell 2014; Konermann et al. Nature 2015), resulting in more poorly phased nucleosomes; and in part to the smaller CRISPRa dataset that reduces our resolution in fitting the TSS positioning relationship. We have added a comment on this point to the main text (subsection “An integrated machine learning approach predicts highly active sgRNAs for CRISPRi and CRISPRa”, fifth paragraph).

7) Can the authors comment on the very strong peak of effectiveness for CRISPRi sgRNAs just downstream of the TSS? This likely reflects something in addition to nucleosome-deprivation as being helpful for CRISPRi. I find the very sharp peak in maximal activity in Figure 1—figure supplement 1 to be really striking (even in comparison to previous work using less well-refined TSSs), and would encourage its inclusion in the final manuscript.

As the authors have noted before, CRISPRi really works best when the guide is positioned just downstream of the start site- perhaps because it is more effective to block early transcription elongation and the release from Pol2 pausing, rather than farther downstream once Pol2 is loaded with the machinery to plow through chromatin etc.?

For those working to block expression of novel or non-coding RNAs, we think it is worth getting this idea out there in a super-clear and obvious way- that the sweet spot for optimal guides is right downstream of the promoter.

We agree that the region just downstream of the TSS is optimal for CRISPRi likely through a combination of nucleosome deprivation, effective KRAB domain-mediated histone methylation, and physical hindrance of early transcriptional machinery. This last hypothesis is supported by our previous tiling screens using dCas9 without an effector domain, which showed gene repression activity immediately downstream of the TSS but not in the wider window of activity seen with dCas9-KRAB (Gilbert et al. Cell 2014). That this mechanism contributes to optimal CRISPRi-mediated repression lends further support to the importance of transitioning to the more accurate FANTOM TSS annotation in designing sgRNAs for both protein-coding and non-coding genes whenever possible. We have now emphasized this finding in the main text (subsection “An integrated machine learning approach predicts highly active sgRNAs for CRISPRi and CRISPRa”, fourth paragraph).

8) Regarding the basic sgRNA prediction algorithms developed, will these be shared upon request? We see a github site for the tool created to assess chromatin features, but not for the broader prediction platform. Could the authors indicate in the Methods how an interested user might get help with sgRNA predictions?

We have now made the prediction algorithms and pipeline available on GitHub as well, and provide the URL in the Methods with a reference to this in the main text.

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    Supplementary file 1. CRISPRi and CRISPRa activity score datasets.

    DOI: http://dx.doi.org/10.7554/eLife.19760.013

    elife-19760-supp1.xlsx (1.4MB, xlsx)
    DOI: 10.7554/eLife.19760.013
    Supplementary file 2. TSS annotations for hg19 and mm10 genomes.

    DOI: http://dx.doi.org/10.7554/eLife.19760.014

    elife-19760-supp2.xlsx (2.7MB, xlsx)
    DOI: 10.7554/eLife.19760.014
    Supplementary file 3. Library composition of hCRISPRi-v2 and hCRISPRi-v2.1.

    DOI: http://dx.doi.org/10.7554/eLife.19760.015

    elife-19760-supp3.xlsx (29.7MB, xlsx)
    DOI: 10.7554/eLife.19760.015
    Supplementary file 4. Library composition of mCRISPRi-v2.

    DOI: http://dx.doi.org/10.7554/eLife.19760.016

    elife-19760-supp4.xlsx (14.3MB, xlsx)
    DOI: 10.7554/eLife.19760.016
    Supplementary file 5. Library composition of hCRISPRa-v2.

    DOI: http://dx.doi.org/10.7554/eLife.19760.017

    elife-19760-supp5.xlsx (13.9MB, xlsx)
    DOI: 10.7554/eLife.19760.017
    Supplementary file 6. Library composition of mCRISPRa-v2.

    DOI: http://dx.doi.org/10.7554/eLife.19760.018

    elife-19760-supp6.xlsx (14.3MB, xlsx)
    DOI: 10.7554/eLife.19760.018
    Supplementary file 7. sgRNA read counts and growth phenotypes for hCRISPRi-v2 screens performed in K562.

    DOI: http://dx.doi.org/10.7554/eLife.19760.019

    elife-19760-supp7.xlsx (16.7MB, xlsx)
    DOI: 10.7554/eLife.19760.019
    Supplementary file 8. Gene growth phenotypes and p-values for hCRISPRi-v2 screens performed in K562.

    DOI: http://dx.doi.org/10.7554/eLife.19760.020

    elife-19760-supp8.xlsx (4.5MB, xlsx)
    DOI: 10.7554/eLife.19760.020
    Supplementary file 9. sgRNA read counts and growth phenotypes for hCRISPRa-v2 screens performed in K562.

    DOI: http://dx.doi.org/10.7554/eLife.19760.021

    elife-19760-supp9.xlsx (16.4MB, xlsx)
    DOI: 10.7554/eLife.19760.021
    Supplementary file 10. Gene growth phenotypes and p-values for hCRISPRa-v2 screens performed in K562.

    DOI: http://dx.doi.org/10.7554/eLife.19760.022

    elife-19760-supp10.xlsx (4.3MB, xlsx)
    DOI: 10.7554/eLife.19760.022

    Articles from eLife are provided here courtesy of eLife Sciences Publications, Ltd

    RESOURCES