Abstract
PICKLES (https://pickles.hart-lab.org) is an updated web interface to a freely available database of genome-scale CRISPR knockout fitness screens in human cell lines. Using a completely rewritten interface, researchers can explore gene knockout fitness phenotypes across cell lines and tissue types and compare fitness profiles with fitness, expression, or mutation profiles of other genes. The database has been updated to include data from three CRISPR libraries (Avana, Score, and TKOv3), and includes information from 1162 whole-genome screens probing the knockout fitness phenotype of 18 959 genes. Source code for the interface and the integrated database are available for download.
INTRODUCTION
Loss of function fitness screens in cell lines offer the opportunity for integrative and comparative analyses to identify background-specific genetic vulnerabilities. While the technology behind CRISPR-mediated genetic and epigenetic perturbation has evolved rapidly, by far the largest data sources remain the genome-scale CRISPR/Cas9 knockout screens in hundreds of cancer cell lines. Under the various banners of The Cancer Dependency Map (DepMap) (1), Project Achilles (2) and Project Score (3), as well as additional independent projects (4–10), roughly a thousand cancer and other human cell lines have been screened using whole-genome CRISPR/Cas9 knockout libraries.
New bioinformatics tools have been developed to process the data from this new technology. A number of algorithms are available to derive gene-level fitness scores from screens where multiple reagents target each gene. The Project Achilles algorithm, CERES (2), has been supplanted by CHRONOS (11). Likewise, we have updated our original BAGEL algorithm (12) with an improved BAGEL2 (13), a variation of which is used in the Project Score release (3). We have additionally developed a Z-score based algorithm, based on a Gaussian mixture model of the fold change distribution of CRISPR guide RNAs, that is more sensitive in detecting proliferation suppressor genes—those whose knockout increases cell proliferation rates, the opposite phenotype of essential genes (14).
Here, we present an updated PICKLES: the database of Pooled In vitro CRISPR Knockout Library Essentiality Screens. PICKLES presents an easy to use interface to CRISPR screen data from three commonly used knockout libraries, enabling simple visualization of gene essentiality scores across cell line and tumor type and comparison of knockout fitness profiles with fitness or gene expression profiles of other genes.
Data sources and preprocessing
In this PICKLES update, we include knockout fitness data from three CRISPR libraries (Table 1), used in Project Achilles (Avana) and Project Score (Score) as well as several smaller independent screening projects that screened multiple cell lines (4–10) using the TKOv3 library (15) (Figure 1). We acquired raw read count data and processed each dataset with the BAGEL2 (13) pipeline, where essential genes have positive logBF scores. We processed the same data with the Z-score pipeline from (14), where negative scores indicate loss of fitness and positive scores indicate faster cell proliferation on gene knockout (e.g. tumor suppressors). We additionally downloaded the processed CHRONOS (11) scores from DepMap portal for comparison here. Cell line metadata was drawn from DepMap or manually annotated (Table 2). Processed gene expression data, in log2(TPM + 1) format, was downloaded from the CCLE portal (16).
Table 1.
Data sources used
| Library | Scoring | Genes | Cell line screens |
|---|---|---|---|
| Avana | BF, Z-score | 18 497 | 957 |
| Chronos | 17 377 | 1086 | |
| Score | BF, Z-score | 17 929 | 306 |
| TKOv3 | BF, Z-score | 17 966 | 74 |
Figure 1.

Data preprocessing pipeline. Read count data from the Project Achilles and Project Score, as well as publications with several TKOv3 screens, are downloaded and gene fitness scores are calculated with BAGEL2 and Zscores. Gene fitness scores from Chronos and CCLE molecular profile data are downloaded and integrated with other data in the PICKLES database.
Table 2.
Manual annotations for TKOv3 data
| Original | Manually annotated to | |
|---|---|---|
| TKOv3 primary disease | H358 | Lung cancer |
| HEK293A | Engineered | |
| RPE1 | Engineered | |
| TKOv3 lineage | H358 | Lung |
| HEK293A | Engineered_kidney | |
| RPE1 | Engineered_retina | |
| Cell line | 293A | HEK293A |
Database interface and tutorial
The index page of the PICKLES v3 website offers the user the choice of dataset and essentiality scoring scheme to display (Figure 2). Upon selecting a query gene from the text box, the cell lines corresponding to the selected data set are ranked by the chosen scoring metric and plotted in the display panel. The query gene's essentiality score in each cell line is represented as a point on the graph, color coded by the linage of the cell line. While no thresholding is displayed, reasonable thresholds for an essential gene are BF > 10, Z-score < –4, or Chronos score < –0.75, while a Z-score > 4 typically indicates a proliferation suppressor gene. The ‘Cancer types’ tab displays the same data as the summary page, but with a swarm plot grouped by lineage, with lineages ordered by median essentiality score.
Figure 2.
PICKLES v3 summary view. User chooses the dataset and the gene of interest. Ranked scatterplot across relevant cell lines is dynamically generated in the display window. Interactive functions include zoom, pan, select, mouseover for detail and download image as PNG.
The PICKES v3 interface is implemented in Python/Dash using the Plotly library and includes the native functionality of the Plotly system (https://plot.ly). On mouseover, each point will display the cell line name, its lineage, and the essentiality score. A single click of the primary disease (legend at right; Figure 2) will remove that lineage from the plot, and another single click returns it. A double click removes all points except the chosen lineage. In addition, hovering the mouse over the plot brings a context menu where the user can zoom, pan, and autoscale the view, and save the modified image in PNG format.
A number of prior works have demonstrated that genes with correlated knockout fitness profiles tend to operate in the same biological process or pathway (they are ‘co-functional’) (17–22). The coessentiality tab offers the ability to display and measure the essentiality score of the query gene against a second, ‘comparison’ gene in the same dataset, and in addition calculates an ordinary least squares fit between the two profiles, the coefficients of which can be viewed by mouseover.
The coessentiality tab also displays the top 10 positive and negative correlates in the data set. A recent study by Wainberg et al. (22) indicated that the combination of covariance ‘whitening’ followed by ordinary least squares (OLS) offered significantly improved performance over Pearson correlation in detecting co-functional relationships from coessentiality data. A subsequent study by Gheorghe et al. (23) confirmed the performance boost from whitening but showed that, after whitening, Pearson correlation and OLS are mathematically identical. We therefore apply the Cholesky whitening as described in Wainberg et al. before determining the top and bottom ranked correlates by PCC. Figure 3B shows the coessentiality of query gene NRAS and comparison gene SHOC2, an important effector of the MAP kinase pathway and the top correlate of NRAS.
Figure 3.
Display options for PICKLES v3 data. (A) Display by cancer types (query gene: NRAS; data source Avana/BF). (B) Co-essentiality (query gene: NRAS; comparison gene: SHOC2). (C) Essentiality versus Expression (query gene: FAM50A; comparison gene FAM50B). (D) Essentiality versus Mutation (query gene: MDM2; comparison gene TP53).
Under the expression tab, the user can view the query gene's essentiality profile vs. the comparison gene's expression profile in the same cells. The tab also displays the top ten positive and negative correlates between query gene essentiality and all genes’ expression in the same cell lines (though this data is not available for TKOv3 screens). Setting the query and comparison genes to the same value allows the user to identify essential genes with tissue-specific expression profiles (e.g. melanocyte transcription factor MITF), while comparing different genes can generate or confirm hypotheses about paralog synthetic lethality. For example, largely uncharacterized paralogs FAM50A and FAM50B are synthetic lethal (24,25) in digenic knockout screens, and absence of FAM50B gene expression is associated with dependence on FAM50A in cells (Figure 3C).
The mutations tab shows the distribution of essentiality scores in the presence or absence of a LOF or GOF mutation in the comparison gene, separated by lineage/disease type. Mutation data is inferred from CCLE genotyping data available at (16). Briefly, for each gene in each cell line, if a mutation is predicted deleterious, it is classified as Loss of Function (LOF); a deleterious mutation at a hotspot is classified as a Gain of Function (GOF). Figure 3D provides a visualization the relationship between TP53 mutation and the essentiality of MDM2, a key regulator of wildtype p53 protein.
Finally, the about tab offers a brief description of the contents of the PICKLES interface, references, information on how to cite the database, and a link to the PICKLES master data file.
Implementation
The web interface is implemented in Python/Dash, which uses the Plotly library to dynamically generate interactive graphics for web presentation. The entire application resides in a single ∼700-line Python executable, downloadable from our github repository at https://github.com/hart-lab/PicklesV3. On startup, the application loads the entire database, a text file with 21.4 million rows, into memory, consuming <6% of available RAM on a 128GB consumer-class PC.
CONCLUSIONS
This update to the PICKLES database increases the number of cell lines >20-fold and, for most cell lines, offers fitness scores from multiple analysis pipelines. Using a simple and responsive interface, researchers can not only explore the fitness profiles of their genes of interest but also examine how those profiles change across data sets and informatics pipeline. Inclusion of gene expression and mutation data also allows analysis of variation relative to mRNA level and the presence of deleterious mutations in specific genes.
DATA AVAILABILITY
The web interface is implemented in Python/Dash, which uses the Plotly library to dynamically generate interactive graphics for web presentation. The entire application resides in a single ∼500-line Python executable, downloadable from our github repository at https://github.com/hart-lab/PicklesV3. On startup, the application loads the entire database, a text file with 21.4 million rows, into memory, consuming <6% of available RAM on a 128GB consumer-class PC.
Contributor Information
Lance C Novak, TRACTION, The University of Texas MD Anderson Cancer Center, Houston, TX, USA.
Juihsuan Chou, UTHealth Graduate School of Biomedical Sciences, The University of Texas MD Anderson Cancer Center, Houston, TX, USA; Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA.
Medina Colic, Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA.
Christopher A Bristow, TRACTION, The University of Texas MD Anderson Cancer Center, Houston, TX, USA.
Traver Hart, Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA; Department of Cancer Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA.
FUNDING
J.C., M.C. and T.H. were supported by NIGMS [R35GM130119]; T.H. is a CPRIT Scholar in Cancer Research; Andrew Sabin Family Foundation Fellowship; NCI Cancer Center Support Grant [P30CA16672]. Funding for open access charge: NIGMS.
Conflict of interest statement. None declared.
REFERENCES
- 1. Tsherniak A., Vazquez F., Montgomery P.G., Weir B.A., Kryukov G., Cowley G.S., Gill S., Harrington W.F., Pantel S., Krill-Burger J.M.et al.. Defining a cancer dependency map. Cell. 2017; 170:564–576. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Meyers R.M., Bryan J.G., McFarland J.M., Weir B.A., Sizemore A.E., Xu H., Dharia N.V., Montgomery P.G., Cowley G.S., Pantel S.et al.. Computational correction of copy number effect improves specificity of CRISPR-Cas9 essentiality screens in cancer cells. Nat. Genet. 2017; 49:1779–1784. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Behan F.M., Iorio F., Picco G., Gonçalves E., Beaver C.M., Migliardi G., Santos R., Rao Y., Sassi F., Pinnelli M.et al.. Prioritization of cancer therapeutic targets using CRISPR-Cas9 screens. Nature. 2019; 568:511–516. [DOI] [PubMed] [Google Scholar]
- 4. Tang M., Pei G., Su D., Wang C., Feng X., Srivastava M., Chen Z., Zhao Z., Chen J.. Genome-wide CRISPR screens reveal cyclin c as synthetic survival target of BRCA2. Nucleic Acids Res. 2021; 49:7476–7491. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Aregger M., Lawson K.A., Billmann M., Costanzo M., Tong A.H.Y., Chan K., Rahman M., Brown K.R., Ross C., Usaj M.et al.. Systematic mapping of genetic interactions for de novo fatty acid synthesis identifies C12orf49 as a regulator of lipid metabolism. Nat Metab. 2020; 2:499–513. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Johnson F.D., Ferrarone J., Liu A., Brandstädter C., Munuganti R., Farnsworth D.A., Lu D., Luu J., Sihota T., Jansen S.et al.. Characterization of a small molecule inhibitor of disulfide reductases that induces oxidative stress and lethality in lung cancer cells. Cell Rep. 2022; 38:110343. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Hustedt N., Álvarez-Quilón A., McEwan A., Yuan J.Y., Cho T., Koob L., Hart T., Durocher D.. A consensus set of genetic vulnerabilities to ATR inhibition. Open Biol. 2019; 9:190156. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Wang C., Chen Z., Su D., Tang M., Nie L., Zhang H., Feng X., Wang R., Shen X., Srivastava M.et al.. C17orf53 is identified as a novel gene involved in inter-strand crosslink repair. DNA Repair (Amst.). 2020; 95:102946. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Olivieri M., Cho T., Álvarez-Quilón A., Li K., Schellenberg M.J., Zimmermann M., Hustedt N., Rossi S.E., Adam S., Melo H.et al.. A genetic map of the response to DNA damage in human cells. Cell. 2020; 182:481–496. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Wang R., Lenoir W.F., Wang C., Su D., McLaughlin M., Hu Q., Shen X., Tian Y., Klages-Mundt N., Lynn E.et al.. DNA polymerase ι compensates for fanconi anemia pathway deficiency by countering DNA replication stress. Proc. Natl. Acad. Sci. U. S. A. 2020; 117:33436–33445. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Dempster J.M., Boyle I., Vazquez F., Root D.E., Boehm J.S., Hahn W.C., Tsherniak A., McFarland J.M.. Chronos: a cell population dynamics model of CRISPR experiments that improves inference of gene fitness effects. Genome Biol. 2021; 22:343. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Hart T., Moffat J.. BAGEL: a computational framework for identifying essential genes from pooled library screens. BMC Bioinf. 2016; 17:164. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Kim E., Hart T.. Improved analysis of CRISPR fitness screens and reduced off-target effects with the BAGEL2 gene essentiality classifier. Genome Med. 2021; 13:2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Lenoir W.F., Morgado M., DeWeirdt P.C., McLaughlin M., Griffith A.L., Sangree A.K., Feeley M.N., Esmaeili Anvar N., Kim E., Bertolet L.L.et al.. Discovery of putative tumor suppressors from CRISPR screens reveals rewired lipid metabolism in acute myeloid leukemia cells. Nat. Commun. 2021; 12:6506. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Hart T., Chandrashekhar M., Aregger M., Steinhart Z., Brown K.R., MacLeod G., Mis M., Zimmermann M., Fradet-Turcotte A., Sun S.et al.. High-Resolution CRISPR screens reveal fitness genes and genotype-specific cancer liabilities. Cell. 2015; 163:1515–1526. [DOI] [PubMed] [Google Scholar]
- 16. DepMap B. 2022; DepMap 22Q2 Public. figshare. Dataset.
- 17. Hart T., Koh C., Moffat J.. Coessentiality and cofunctionality: a network approach to learning genetic vulnerabilities from cancer cell line fitness screens. 2017; bioRxiv doi:04 May 2017, preprint: not peer reviewed 10.1101/134346. [DOI]
- 18. Wang T., Birsoy K., Hughes N.W., Krupczak K.M., Post Y., Wei J.J., Lander E.S., Sabatini D.M.. Identification and characterization of essential genes in the human genome. Science. 2015; 350:1096–1101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Rauscher B., Heigwer F., Henkel L., Hielscher T., Voloshanenko O., Boutros M.. Toward an integrated map of genetic interactions in cancer cells. Mol. Syst. Biol. 2018; 14:e7656. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Boyle E.A., Pritchard J.K., Greenleaf W.J.. High-resolution mapping of cancer cell networks using co-functional interactions. Mol. Syst. Biol. 2018; 14:e8594. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Kim E., Dede M., Lenoir W.F., Wang G., Srinivasan S., Colic M., Hart T.. A network of human functional gene interactions from knockout fitness screens in cancer cells. Life Sci. Alliance. 2019; 2:e201800278. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Wainberg M., Kamber R.A., Balsubramani A., Meyers R.M., Sinnott-Armstrong N., Hornburg D., Jiang L., Chan J., Jian R., Gu M.et al.. A genome-wide atlas of co-essential modules assigns function to uncharacterized genes. Nat. Genet. 2021; 53:638–649. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Gheorghe V., Hart T.. Optimal construction of a functional interaction network from pooled library CRISPR fitness screens. 2022; bioRxiv doi:04 August 2022, preprint: not peer reviewed 10.1101/2022.08.03.502694. [DOI] [PMC free article] [PubMed]
- 24. Dede M., McLaughlin M., Kim E., Hart T.. Multiplex enCas12a screens detect functional buffering among paralogs otherwise masked in monogenic cas9 knockout screens. Genome Biol. 2020; 21:262. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. Thompson N.A., Ranzani M., van der Weyden L., Iyer V., Offord V., Droop A., Behan F., Gonçalves E., Speak A., Iorio F.et al.. Combinatorial CRISPR screen identifies fitness effects of gene paralogues. Nat. Commun. 2021; 12:1302. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The web interface is implemented in Python/Dash, which uses the Plotly library to dynamically generate interactive graphics for web presentation. The entire application resides in a single ∼500-line Python executable, downloadable from our github repository at https://github.com/hart-lab/PicklesV3. On startup, the application loads the entire database, a text file with 21.4 million rows, into memory, consuming <6% of available RAM on a 128GB consumer-class PC.


