Figure 2. RfxCas13d on-target guide RNA prediction model.
(a) Spearman correlation rs of predictions from a Random Forest (RF) regression model (either with all features or a minimal set of the most predictive features) and a support vector machine with L1 regression to held-out screen data (n = 1,000 bootstraps). (b) Validation of on-target model testing 3 high-scoring and 3 low-scoring guide RNAs (gRNAs) via targeting of cell-surface proteins and antibody labeling to measure target knock-down by flow cytometry. Relative knock-down indicates the percent reduction (relative to non-targeting gRNAs) in the mean fluorescence intensity (lines indicate mean of n = 3 biological replicates). (c) Validation of on-target model assaying 3 high-scoring and 3 low-scoring gRNAs per gene in a gene essentiality screen in HEK293 cells with growth dropout phenotype testing 10 essential genes and 10 control genes. Each point represents one gRNA as a mean of three replicate experiments. The y-axis depicts the log2 fold-change (FC) of the gRNA at the indicated time point relative to the Day 0 sample. One-sided KS-test comparing high-scoring and low-scoring guides, *** p = 2×10−5, **** p = 2×10-6. (d) A375 essentiality screen with growth dropout phenotype assaying 20 high-scoring and 20 low-scoring gRNAs per gene (n = 35 essential genes and n = 65 control genes). One-sided KS-test comparing high-scoring (n = 698) or low-scoring (n = 700) guides to the distribution of non-targeting gRNAs (n = 677). * p = 0.043, ** p = 0.0095, **** p < 1×10-44. (e) Gene ranking for essentiality based on the robust rank aggregation (RRA) p-value across replicates for all 20 high-scoring gRNAs and all n = 100 genes tested. Blue dots denote essential genes from a prior RNAi screen 28. (f) Spearman rank correlation of Cas13d gene depletion (as in e) with prior CRISPR-Cas9 and RNAi screens in A375 cells. Analysis includes genes represented in all libraries (n = 35 essential genes and n = 15 control genes). (RNAi screen: A375 DEMETER2 v5 score 28, Cas9 screen: A375 STARS score 29). Boxes in a, c and d indicate the median and interquartile ranges, with whiskers indicating 1.5 times the interquartile range, or the most extreme data point outside the 1.5-fold interquartile.