Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2022 Aug 14.
Published in final edited form as: Nat Biotechnol. 2022 Feb 14;40(6):862–873. doi: 10.1038/s41587-021-01172-3

Base editing sensor libraries for high-throughput engineering and functional analysis of cancer-associated single nucleotide variants

Francisco J Sánchez-Rivera 1,10,11,, Bianca J Diaz 2,3,, Edward R Kastenhuber 1,2,, Henri Schmidt 5, Alyna Katti 2,3, Margaret Kennedy 1,4, Vincent Tem 1, Yu-Jui Ho 1, Josef Leibold 1,12,13, Stella V Paffenholz 1,4, Francisco M Barriga 1, Kevan Chu 2,3, Sukanya Goswami 2, Alexandra N Wuest 1, Janelle M Simon 1, Kaloyan M Tsanov 1, Debyani Chakravarty 7,8, Hongxin Zhang 7, Christina S Leslie 5, Scott W Lowe 1,6, Lukas E Dow 2,3,9,*
PMCID: PMC9232935  NIHMSID: NIHMS1760571  PMID: 35165384

Abstract

Base editing (BE) can be applied to characterize single nucleotide variants (SNVs) of unknown function, yet defining effective combinations of single guide RNAs (sgRNAs) and base editors remains challenging. Here, we describe modular BE-activity ‘sensors’ that link sgRNAs and cognate target sites in cis and use them to systematically measure the editing efficiency and precision of thousands of sgRNAs paired with functionally distinct base editors. By quantifying sensor editing across >200,000 editor–sgRNA combinations, we provide a comprehensive resource of sgRNAs for introducing and interrogating cancer-associated SNVs in multiple model systems. We demonstrate that sensor-validated tools streamline production of in vivo cancer models, and that integrating sensor modules in pooled sgRNA libraries can aid interpretation of high-throughput BE screens. Using this approach, we identify several previously uncharacterized mutant TP53 alleles as drivers of cancer cell proliferation and in vivo tumor development. We anticipate that the framework described here will facilitate the functional interrogation of cancer variants in cell and animal models.

Keywords: base editing, sensor, SNV, cancer

INTRODUCTION

Genome sequencing studies have revealed a complex, heterogeneous mix of cancer-associated mutations, including both known and druggable oncogenic mutations (e.g. BRAF-V600E), and a large collection of variants of uncertain significance (VUS). Understanding the impact of specific oncogenic mutations requires functional analysis. Even subtle changes in cancer-associated single nucleotide variants (SNVs) can have major functional consequences in tumorigenesis and drug sensitivity16. Thus, while DNA sequencing has enormous potential to support clinical decision making, it is limited by a lack of understanding of how specific variants contribute to disease.

Base editing (BE) can introduce SNVs with high specificity and in the absence of DNA double strand breaks (DSBs) or exogenous DNA templates7. We and others have developed BE tools that enable efficient BE in cell lines, primary cells, and in vivo810. However, unlike Cas9-mediated DNA targeting of DSBs, predicting the efficiency and precision of BE guides remains challenging.

To expand the capability and feasibility of studying VUS at scale, we set out to develop a framework for systematic engineering of thousands of cancer-associated genetic variants. To do this, we developed a modular ‘BE sensor’ platform that couples a single guide RNA (sgRNA) with its cognate genomic target in cis. Thus, in the presence of a base editor, sgRNAs drive editing of a physically linked surrogate, or ‘sensor’ target site. We find that sensor-based measurement of editing efficiency correlates closely with endogenous gene targeting and that sensor-validated sgRNAs can be used to streamline the engineering and characterization of cancer-associated SNVs in vivo. Further, integrated sensors support the interpretation of pooled BE library screens, by providing a surrogate readout of sgRNA activity in parallel to sgRNA abundance or ‘screen fitness’.

To aid the development of future mutation-focused sensor libraries, we developed a flexible computational pipeline (AMINEsearch) that generates BE sensor libraries from annotated genomic data, and a web application (BE-SCAN) that simplifies selection of effective BE tools for generating cancer-associated SNVs. We expect the resources described here will accelerate the functional interrogation of VUS.

RESULTS

Development and validation of a base editing sensor

To measure the activity of individual sgRNAs in a high-throughput manner, we designed a BE sensor, in which an sgRNA is linked to its cognate target site within a lentiviral vector, allowing for high-throughput measurement of editing efficiency by PCR amplification and sequencing of the sensor cassette (Fig. 1a). To test whether BE sensors could measure qualitative and quantitative features of BE across different targets and editors, we generated a library containing 10 human and 8 mouse sgRNAs, in which sensor target sites were modified to contain all 64 possible 3 nucleotide protospacer adjacent motifs (PAMs) (ALL-PAM (AP) library; Fig. 1a; Supplementary Table 1a). We next generated MDA-MB-231 cells that stably expressed one of 9 base editors that span a range of PAM specificities, editing window sizes, and overall editing efficiency. These include: FNLS10, AncBE4max9, FNLS-2X (F2X)10, FNLS-HF1 (HF1) (ref.1011), FNLS-HiFi (HiFi)12, 13, FNLS-NG (NG)12, 14, FNLS-HiFi-NG (HiFi-NG)12, 14, FNLS-VQR (VQR)11, and xFNLS10, 15 (Fig. 1a). We also generated Cas9 and Cas9-NG nuclease controls to assess the frequency of SNVs following DSBs. As expected, apart from Cas9 and Cas9-NG, each line showed efficient BE activity as measured by a fluorescent reporter12 (Fig. 1b). We next transduced each base editor- or Cas9-expressing line with the AP library in duplicate at >2000X representation and cultured cells for one week to allow base editing. We amplified and sequenced entire sgRNA-scaffold-target cassettes from each cell population and quantified insertion or deletion (indel) frequency and target cytosine editing (Supplementary Table 1b). All lines showed high correlation of C>T editing efficiency between replicates (Supplementary Fig. 1) and, as expected, Cas9 and Cas9-NG showed indel formation but little-to-no target C>T editing (Fig. 1c; Supplementary Fig. 2 & 3). The sensor assay accurately reported the relative efficiency and known PAM preferences of individual base editors (Fig. 1c). FNLS, AncBE4max, F2X, HF1, HiFi, and xFNLS had maximum C>T editing at NGG PAMs and lower, but detectable editing, at NAG and NGA PAMs. Consistent with previous publications, VQR showed higher C>T editing at NGA PAMs11, while FNLS-NG and FNLS-HiFi-NG showed broad editing capabilities at NGN PAMs12, 14 (Fig. 1c and Extended Fig. 1). In general, high-fidelity variants showed editing patterns identical to parental base editors, albeit with overall lower efficiency (Fig. 1d). Together, these data show that BE sensor libraries reliably report known features of well-characterized base editors.

Figure 1. A high-throughput sensor assay to characterize base editing outcomes at thousands of target sites.

Figure 1.

(a) Schematic of the sensor assay. A sgRNA is paired with its cognate target site in cis such that editing outcomes can be quantitatively assessed in a massively parallel fashion using next-generation sequencing. Here, we illustrate the design of an All PAM sensor (APS) library that queries 18 target sites with all 64 possible PAM combinations upon lentiviral integration into cells expressing a range of base editors and cultured for 7 days followed by genomic DNA isolation and screen deconvolution via next generation sequencing.

(b) Schematic of the GO base editing reporter (top) used to confirm the activity of each of the 9 base editors used in APS screens by measuring C>T-dependent induction of GFP expression in mScarlet infected cells (bottom).

(c) C>T editing efficiency at each AP library human target site across the full range of base editors (n=9). Cas9 and Cas9-NG serve as nuclease controls. Rows denote target sites. Columns denote PAM subclass. See also Extended Fig. 1 and Supplementary Fig. 2 and 3.

(d) Head-to-head comparison of C>T editing efficiency at different PAMs by ‘standard’ cytosine base editors (CBEs) (top row) and PAM flexible CBEs (bottom row). * p≤0.01. p-values were determined with two-sided Wilcoxon signed rank test. Boxplots show the median and interquartile range (IQR) and whiskers represent 1.5*IQR.

AMINEsearch generates BE sensor libraries from genomics data

To establish a flexible pipeline to facilitate BE screens driven by clinical genomics data, we developed AMINEsearch (Annotated Mutation-Informed Nucleotide base Editing sgRNA search), a BE sgRNA design algorithm that compiles ready-to-clone libraries of annotated sgRNAs to model user-defined mutations (Fig. 2a). We first implemented AMINEsearch to generate sensor libraries to model cancer-associated mutations derived from targeted sequencing data (MSK-IMPACT)16, providing deep coverage of 462 cancer-relevant genes in >21,000 tumors at the time of library generation (Fig. 2b). BE is well-suited for creation of cancer-associated alterations, as they are highly enriched for C•G to T•A transition mutations (Fig. 2c). Most BE-compatible SNVs were missense mutations, followed by nonsense and splice site alterations (Fig. 2d). We identified 2,608 SNVs as recurrent (≥4 occurrences), with mutation frequency ranging from 0.02% to 5.1% (Fig. 2e).

Figure 2. AMINEsearch: a versatile computational pipeline to identify cancer-associated mutations compatible with base editing.

Figure 2.

(a) AMINEsearch pipeline. Provided a maf mutation file, Cas protein properties (e.g. PAM usage), and base editor editing ranges, AMINEsearch produces libraries of human and mouse sgRNAs and sensor constructs designed to engineer specific mutations using BE.

(b) Deploying AMINEsearch to analyze tumor mutation data from cancer patients profiled with the MSK-IMPACT clinical DNA sequencing platform. Human and mouse base editing sensor libraries (abbreviated as HBES and MBES) were developed to systematically interrogate thousands of cancer-associated mutations using BE.

(c) Mutation signatures of recurrent single nucleotide variants (SNVs) observed in the MSK-IMPACT dataset relative to their frequency. Note that this dataset is enriched for C•G to T•A mutations, which can be modeled using cytosine base editing.

(d) Distribution of missense, nonsense, and splice site variants among recurrent mutations in the MSK-IMPACT dataset, variants targeted in the HBES library, and variants targeted in the MBES library. Splice site mutations were less likely to be included in MBES libraries due to lower sequence conservation in noncoding regions (Fig. 2d) (p=0.0009, two-tailed Fisher’s exact test).

(e) Distribution observed frequency of cancer-associated mutations in the MSK-IMPACT dataset. Mutations were classified as targeted (red circle, compatible with BE) and not targeted (black square, incompatible with BE). Well established oncogene and tumor suppressor gene mutant alleles are highlighted in red.

(f) Gene-level annotation of cancer-associated function of recurrently mutated genes in the MSK-IMPACT dataset, genes targeted in the HBES library, and genes targeted in the MBES library.

(g) Pie chart denoting OncoKB annotations of variants targeted in the HBES library, split by level of evidence (oncogenic, likely/predicted oncogenic, or unknown significance).

(h) Predicted coverage (fraction of sites with at least one predicted sgRNA) in HBES and MBES libraries relative to the base editor used.

(i) Predicted specificity (fraction of sgRNAs with no expected off-target editing of bystander nucleotides within the locus) in HBES and MBES libraries relative to the base editor used.

By inputting the parameters of well-characterized, efficient Cas modules (SpCas9, Cas9-NG, xCas9, ScCas9) and a BE targeting window of 4–11bp, we identified 5855 sgRNAs covering 1450 unique mutations. This human BE sensor (HBES) library represented ~56% of all recurrent mutations in the dataset (Supplementary Table 2cd). While the MSK-IMPACT targeted sequencing assay is specifically designed to focus on known cancer-associated genes (Fig. 2f), cross-referencing with the OncoKB precision oncology knowledge base (sop.oncokb.org)17, 18 showed that the plurality of SNVs targeted in HBES are VUS, with the most frequent subset of mutations being enriched in variant-level annotation (Fig. 2g; Extended Fig. 2).

To model cancer-associated mutations in the mouse genome, we included additional steps in the AMINEsearch workflow to identify orthologous murine sites. For simplicity, unless otherwise stated, we refer to sgRNAs using the human mutation nomenclature. The mouse BE sensor (MBES) library contained 4686 sgRNAs targeting 1177 unique mutations (Supplementary Tables 2cf). We noted modest attrition of BE sgRNAs when targeting the mouse genome due to lack of sequence conservation (Fig. 2d,f,h). The diversity of available BE tools allows for distinct trade-offs. For instance, using base editors with an expanded editing window (F2X) or PAM flexibility (FNLS-NG) increases theoretical coverage (Fig. 2h) at the expected cost of reduced local specificity (Fig. 2i) or potentially increased global off-target effects (Extended Fig. 3 and Supplementary Table 3), respectively.

BE sensor identifies optimal sgRNAs for engineering variants

To measure editing efficiency of sgRNAs in the HBES and MBES libraries, we transduced each in duplicate into MDA-MB-231 cells expressing one of three base editors: FNLS10 (highest editing efficiency), F2X10 (expanded editing range), or FNLS-NG12, 14 (PAM flexibility) (Fig. 3a, Supplementary Tables 4 & 5). Cas9-expressing cells served as a control to measure baseline C>T transitions in the absence of BE. Base editors were expressed to approximately equal levels, while Cas9 showed higher protein abundance (Supplementary Fig. 4a). We observed excellent correlation of cytosine editing between replicates from each library and base editor combination (Supplementary Fig. 5). As observed in AP screens, editing efficiency was strongly influenced by PAM, target cytosine position, and dinucleotide sequence context (TC>AC=CC>GC) (Fig. 3ab; Supplementary Fig. 6). Consistent with previous studies, FNLS and FNLS-NG showed maximum editing efficiency at positions 3–9 of the protospacer while F2X showed expanded editing at positions 3–13 (ref. 10) (Extended Fig. 4). While F2X had an extended editing range, its average efficiency within the canonical editing window (3–9bp) was lower than FNLS (Supplementary Fig. 7). As expected, FNLS and F2X had maximum editing efficiency at NGG PAMs, while FNLS-NG showed broad activity at NGN PAMs (Fig. 3a). Cas9 showed no detectable BE activity, with ≤0.1% C>T editing across all base editor-PAM combinations (Fig 3a,b; Supplementary Fig. 6).

Figure 3. Massively parallel assessment of base editing outcomes across thousands of cell line/editor/sgRNA combinations using the sensor assay.

Figure 3.

(a) Top: Percentage of all C>T editing (y-axis) across Cas9, FNLS, F2X, and FNLS-NG at every cytosine among the −5 to 20 positions of the target site (x-axis). Colored dots specify dinucleotide contexts. Bottom: Percentage of target C>T editing (y-axis) across Cas9, FNLS, F2X, and FNLS-NG relative to PAM class (x-axis). Box plots represent median (line), 25th, and 75th percentiles as upper and lower bounds, largest and smallest value within 1.5 times interquartile range (whiskers), and outliers (dots). Colored dots specify dinucleotide contexts. See also Supplementary Fig. 6.

(b) Heatmap of base editing efficiency across MDA-MD-231, NIH3T3, PC9, PDEC, and KPT1 cell lines (columns) at every cytosine among the −5 to 20 positions of the target site classified by cytosine base editor and PAM class (rows). See also Supplementary Fig. 6.

(c) Correlation of individual sgRNA efficiency across screen cell lines (PC9, NIH3T3, PDEC, and KPT1) compared to MDA-MB-231. Only HBES sgRNAs that had >1% activity in the sensor were ranked. See also Extended Fig. 5.

To determine whether sensor editing scores identified in one cell line could be extrapolated to other cell types, we repeated HBES and MBES screens in four additional cell lines: human PC9 and murine KrasG12D;Trp53−/− mutant (KPT1) lung adenocarcinomas19, as well as immortalized NIH3T3 and KrasG12D/+; Trp53WT/WT pancreatic (PDECs) cells20. In all, we measured editing across > 200,000 base editor-sgRNA-cell line combinations. Each cell line showed high concordance between replicates and different base editors (Supplementary Fig. 5). Average editing efficiency varied by cell line; however, PAM specificity, editing range, and relative efficiency of individual sgRNAs remained highly correlated (Fig. 3c; Supplementary Fig. 4b and Extended Fig. 5). We observed a moderate, non-linear relationship between Cas9-induced indels and BE across all cell lines, where sgRNAs with high BE scores were a subset of sgRNAs with efficient Cas9-mediated indel generation (Extended Fig. 6). These data suggest that the relative potency of individual sgRNAs across different cell systems can be predicted en masse using the BE sensor assay.

Base editors can exhibit collateral (bystander) cytosine editing, whereby C>T mutations are induced in both target and neighboring cytosines within the editing window7. To investigate collateral editing, we calculated C>T editing “purity” as the frequency of target C>T editing without additional mutations. As expected, purity decreased with the presence of additional cytosines within the target window, especially with immediately adjacent bystander cytosines (Supplementary Fig. 8). Collectively, these results demonstrate that the sensor platform can be used to assess on-target and collateral cytosine editing across multiple base editors and thousands of target sites in a high-throughput manner.

To directly test how well the BE sensor scores predicted activity at endogenous targets we measured editing at 12 independent genomic sites with a panel of 13 sgRNAs, that showed high editing in the sensor assay. Using either FNLS or AncBE4max, endogenous editing aligned well with sensor-based estimates, with more than 50% of cases (7/13) within 10% of the sensor reported efficiency (Fig. 4a, Supplementary Fig. 9a). To ask whether BE sensor scores could predict the relative efficiency of target editing given a range of possible options, we tiled the R213 site in TP53 with a series of 7 sgRNAs. In this case, we used the F2X base editor to allow editing across the wide range of target positions in this series (5–11bp). Consistent with the data described above (Fig. 4a, Supplementary Fig. 9a) sensor estimates closely resembled editing at the endogenous locus (Fig. 4b).

Figure 4. Validation of canonical and non-canonical base editing activity predicted by the sensor assay.

Figure 4.

(a) Experimental validation of C>T editing activity observed in the sensor (blue) when targeting endogenous (yellow) loci in FNLS-PC9 cells. Each dot corresponds to a single replicate (n=2 for sensor screening data; n=3 for endogenous validation). Data are presented as mean values +/− SEM. Base editing rates (efficiencies) across endogenous loci were determined via next-generation sequencing of edited loci and analyzed using CRISPResso248. See also Extended Fig. 7 and Supplementary Fig. 9. No direct statistical comparisons were performed between sensor and endogenous C>T editing data because sensor screens were performed in duplicate.

(b) Top: schematic of the human TP53-R213 locus. Horizontal bars denote sgRNAs, and numbers to the right denote sgRNA identifiers (based on HBES whitelists). Target cytosine is labeled in blue. Bottom: heatmap comparing C>T editing efficiency in an allelic series of TP53-R213 sgRNAs between the sensor results, F2X-MDA-MB-231 cells targeting the endogenous locus, and BE-Hive predictions (CP1028) (ref. 21).

(c) Correlation of observed base editing efficiency measured by the sensor in MDA-MB-231 cells vs. efficiencies predicted by the BE-HIVE algorithm21 classified by base editing enzyme and cytosine position (fill). Here, the HBES library was stratified to include all targets (top) and all NGG targets showing >5% editing in the sensor (bottom). FNLS and FNLS-NG values were compared to BE4 prediction results and F2X values were compared to CP1028 prediction results. See also Supplementary Fig. 1012.

(d) Canonical (C>T) and non-canonical (C>A and C>G) base editing activity profiled across all screen cell lines (rows) at every cytosine in position −5 – 20 of HBES library targets.

(e) Validation of non-canonical C>R editing events at sensor target sites (blue) and endogenous targets (yellow). Each dot corresponds to a single replicate (n=2 for sensor screening data; n=3 for endogenous validation). Data are presented as mean values +/− SEM. Base editing rates (efficiencies) across endogenous loci were determined via next-generation sequencing of edited loci and analyzed using CRISPResso2 (ref. 48). See also Extended Fig. 7 and Supplementary Fig. 9.

(f) A heatmap of a panel of mammalian cell lines expressing FNLS and/or AncBE4max transduced with either canonical C>T (top) or non-canonical C>G (bottom) GO reporters measuring GFP induction in mScarlet infected base editor cells. See also Extended Fig. 7.

Arbab et al recently reported a machine learning tool – BE-Hive – for predicting base editing outcomes21. We noted that BE-Hive predictions for the TP53.R213 series did not accurately predict editing outcomes for non-NGG sgRNAs, likely because BE-Hive does not incorporate the PAM sequence as a prediction feature. To assess this more broadly, we determined the similarity of BE-Hive predictions to the sensor-measured editing activity for the HBES and MBES libraries. As expected, given the strong dependence on PAM for editing activity, comparison of all sgRNAs showed relatively low overall correlation between BE-Hive and BE sensor estimates (Fig. 4c; Supplementary Fig. 10; Supplementary Table 6). Restricting our analyses to sgRNAs associated with NGG PAMs improved the correlation (Supplementary Fig. 11); however, much of it was driven by low-scoring guides, as focusing on sgRNAs with >5% BE sensor activity led to lower overall similarity (Fig. 4c; Supplementary Fig. 12). Together, these data show that BE sensor editing is well correlated with editing at endogenous sites, allowing reliable identification of sgRNAs with high editing efficiency across multiple biological contexts.

Non-canonical editing identified by BE sensor

APOBEC-driven mutation signatures in cancer include transitions (C>T; signature 2) and transversions (C>G; signature 13) (ref. 22). Cytosine base editors containing rAPOBEC1 can induce C>G mutations in some contexts12, 21. Such ‘non-canonical’ transversion editing could be leveraged to increase the breadth of mutations that can be modeled using BE (22% of MSK-IMPACT dataset) (Fig. 2c). Transversion editing was apparent in our sensor screen (Fig. 4d) and for some targets, C>G editing occurred at levels greater than C>T editing (Extended Fig. 7a). Instances of transversion editing (C>R) closely resembled editing outcomes at three endogenous loci, chosen for their high (30–60%) predicted C>R editing rates (Fig. 4e). As expected, C>R editing by AncBE4max was slightly lower, likely due to the presence of an additional uracil glycosylase inhibitor domain23 (Supplementary Fig. 9b).

We next looked at sequence features affecting C>R editing outcomes and noted that transversion mutations were strongly disfavored at CC dinucleotides (average 4% of BE events) but were ~3-fold higher in AC and TC contexts (13% and 12%, respectively) (Fig. 4d, Extended Fig. 7b). Transversions were also disfavored when target cytosines were followed by another cytosine (NCC, particularly CCC and GCC) and enriched when followed by a thymine (NCT, particularly ACT and TCT) (Extended Fig. 7b). Comparison across lines revealed that not all cells induce transversions with equal efficiency. While MDA-MB-231 and PC9 cells showed frequent and high level transversion editing, NIH3T3, PDEC, and KPT1 cells had a very low frequency of C>R alterations (Fig. 4d).

We developed a lentiviral BE reporter that drives GFP induction following target C>G editing (Extended Fig. 7c). This construct accurately reported cellular C>G editing bias, showing efficient C>G induction in MDA-MB-231, PC9, and HCC1806, but not NIH3T3 or PANC1, consistent with sensor measurements (Fig. 4f). Notably, C>G editing efficiency in the reporter was similar to average C>G editing seen at TCT motifs (Fig. 4d,f; Extended Fig. 7b), suggesting it is a useful tool for gauging C>R editing potential in different cells. Thus, transversion editing bias is not a universal feature of human cancer cells. Systematic studies employing this reporter could provide insight into mechanisms that dictate this activity.

Sensor-validated sgRNAs streamline in vivo model development

A major advantage of sensor-based validation is the ability to identify active sgRNAs that generate specific missense mutations with little-to-no collateral editing. Such guides can be used to interrogate the impact of specific mutations in vitro and in vivo. As proof-of-concept, we focused on the TP53 tumor suppressor gene, which is the most frequently mutated gene in cancer and shows remarkable mutational heterogeneity24. Hundreds of TP53 SNVs have been identified16, most of which are missense variants that may have loss-of-function, gain-of-function, dominant negative, or neomorphic behavior24, 25. Our mouse sensor library contained 244 sgRNAs targeting 62 distinct and recurrent p53 mutations were represented in the mouse library. To measure the tumorigenic potential of p53 variants, we used immortalized murine KrasG12D/+; Trp53WT/WT PDECs, a genetically defined and physiologically relevant setting to model pancreatic cancer20. To test this concept, we cloned five sensor-validated sgRNAs to introduce specific missense mutations in Trp53 with low collateral activity (high “purity”) (C135Y, M237I, G199E, E271K, and R337C; human TP53 gene nomenclature) (Fig. 5ab). Introduction of each sgRNA into F2X-PDECs enabled low density growth in the presence of Nutlin-326 (Fig. 5c), suggesting these mutations compromise p53 function. To test whether these mutations impaired tumor suppression in vivo, we transplanted PDECs transduced with Trp53 or control sgRNAs into the pancreas of recipient mice (n=5 mice per sgRNA) (Fig. 5d). In cases where the sensor assay predicted multiple sgRNAs for a single mutation, we included all available sgRNAs to rule out off-target effects (Extended Fig. 8a). Orthotopic transplantation of control PDECs does not lead to pancreatic tumor development (up to 200 days), but all mice transplanted with PDECs carrying Trp53 sgRNAs succumbed to pancreatic tumors (46–99 days) (Fig. 5e and Extended Fig. 8a). In each case, analysis of bulk tumor tissue showed high frequency of C>T mutations at their respective sites in the Trp53 gene (Fig. 5f, Extended Fig. 8b and 8e, Supplementary Fig. 13 and 14). Identical results were obtained with FNLS-PDECs (n=5 mice per mutation) (Extended Fig. 8cd). Thus, BE-sensor validated sgRNAs can be used to synchronously engineer endogenous patient-derived mutations in experimental in vivo systems, facilitating systematic variant-to-phenotype studies in cancer and other diseases.

Figure 5. In vivo validation of cancer-associated single nucleotide TP53 variants using base editing.

Figure 5.

(a) Candidate TP53 variant-specific base editing sgRNAs sorted by C>T efficiency scores obtained from FNLS-MDA-MB-231 MBES screening data. Only TP53 sgRNAs with a percentage of C>T editing >25% and no collateral cytosine editing are shown. Data are presented as mean values +/− SEM.

(b) Lollipop plot showing frequency of candidate TP53 variants detected in the MSK-IMPACT cohort. TAD = transactivation domain; DBD = DNA binding domain; OD = oligomerization domain.

(c) FNLS-expressing KrasG12D/+; Trp53WT/WT pancreatic ductal epithelial cells (PDECs) were transduced with sgRNAs designed to introduce defined mutations in the mouse Trp53 gene followed by plating at low density (1,000 cells per well in 6-well plates; three wells per variant) and treatment with DMSO or Nutlin-3 (10 μM). Upper panel: plates were stained with crystal violet to assess colony formation capacity. Bottom panel: quantification of crystal violet staining. N=3 wells per variant (or control) per treatment arm (control or Nutlin-3). Data are representative of n=3 independent experiments and are presented as mean values +/− SD. * p ≤ 0.05, ** p ≤ 0.01. P-values were calculated using unpaired, two-sided t-test.

(d) Schematic for in vivo validation of candidate TP53 variants via orthotopic transplantation of F2X-expressing KrasG12D/+; Trp53WT/WT pancreatic ductal epithelial cells (PDECs) transduced with sgRNAs designed to introduce defined mutations in the mouse Trp53 gene.

(e) Survival analysis of mice transplanted with F2X-expressing PDECs transduced with specific Trp53-targeting base editing sgRNAs. N=5 mice per mutation. See also Extended Fig. 8 and Supplementary Fig. 13 and 14. * p ≤ 0.05, ** p ≤ 0.01. P-values were calculated using the log-rank test.

(f) Frequency of target C>T editing in tumors from transplanted mice. Each dot corresponds to a single tumor or tumor fragment. Data are presented as mean values +/− SD. Target C>T editing was measured by next generation sequencing of amplified target loci and data was analyzed using CRISPResso2 (ref. 48). See also Extended Fig. 8 and Supplementary Fig. 13 and 14.

Pooled BE sensor screens to interrogate cancer variants

The experiments above demonstrated the robustness of BE sensor-validated sgRNAs for in vivo interrogation of cancer variants. Encouraged by these results, we set out to test whether BE sensor libraries could be coupled with high-throughput screening approaches for massively parallel functional interrogation of cancer-associated SNVs. An advantage of screening BE sensor libraries is that cells should harbor editing at both the sensor module and endogenous target site. Hence, variant-specific effects on cellular phenotype can be correlated with editing precision and efficiency at sensor target sites, minimizing false positives. In theory, this approach should also identify sgRNAs that edit their target but induce no phenotypic effect. To test this concept, we transduced KrasG12D/+; Trp53WT/WT FNLS-PDECs with the MBES library at low MOI and >1000X representation (Fig. 6a). Six technical replicates of PDEC-MBES cells (Supplementary Fig. 15) were used for a multi-time point in vitro proliferation screen performed for ~36 cumulative population doublings to quantitatively assess sgRNA activity and abundance in parallel.

Figure 6. Massively parallel interrogation of cancer-associated single nucleotide variants via pooled base editing screening.

Figure 6.

(a) Schematic of base editing proliferation screen. Briefly, FNLS-expressing PDECs were transduced with a library of ~5,000 MBES constructs at 1000X representation followed by selection and culture for a total of ~36 cumulative population doublings. A total of n=6 independent transduction replicates were established and cultured separately. Cells were sampled at multiple time points over the course of the screen until reaching the final time point at day 30 post-transduction. Screens were deconvoluted using next-generation sequencing (see Methods for more details).

(b) Heatmap for correlation coefficients between samples. See also Supplementary Fig. 15.

(c) Waterfall plots comparing sgRNA log fold changes with mean percentage of C>T editing in the sensor target site between days 5 and 30 post-transduction. Top plot denotes all sgRNAs (and corresponding sensor target sites). Bottom plot denotes Trp53 sgRNAs (and corresponding sensor target sites).

(d) Bubble plot comparing sgRNA log fold changes with mean frequency of C>T editing in the sensor target site between days 5 and 30 post-transduction. Blue bubbles denote Trp53 sgRNAs (and corresponding sensor target sites). Yellow bubbles denote all other sgRNAs (and corresponding sensor target sites). Inset denotes MaGeCK27, 28 score (see Supplementary Table 7). Note the use of human gene-based nomenclature of protein residues (e.g. p53_Q100 corresponds to Q97 in mouse Trp53).

(e) Schematic of the mouse Trp53-R210 locus (TP53-R213 in humans). The number to the right of the sgRNA is the sgRNA identifier (Trp53_4315; based on MBES whitelists). Target cytosines are labeled in red. As denoted by the black arrows in the diagram, C>T base editing of C6 and C11 is predicted to produce the T208I and R210C mutations, respectively. See also Supplementary Fig. 13.

(f) In vivo validation of T208I mutation via orthotopic transplantation of F2X-expressing PDECs transduced with the Trp53_4315 sgRNA. Median survival for mice harboring tumors initiated by the Trp53_4315 sgRNA was 53 days. N=5 mice per condition. Data are presented as mean values +/− SD. * p ≤ 0.01, Log-rank test. See also Supplementary Fig. 13.

(g) Frequency of target C>T editing in tumors from mice transplanted with F2X-expressing PDECs transduced with the Trp53_4315 sgRNA. Each dot corresponds to a single tumor or tumor fragment (total n=8). Data are presented as mean values +/− SD. Target C>T editing was measured by next generation sequencing of amplified target loci and data was analyzed using CRISPResso2 (ref. 48). See also Supplementary Fig. 13.

Pairwise correlation analyses at the first timepoint (day 5) demonstrated excellent technical screening performance and replicates diverged at later time points (Fig. 6b). To quantify sensor editing and sgRNA enrichment, we used our analytical pipeline to calculate target editing efficiency, followed by MAGeCK27, 28 to determine changes in sgRNA abundance (Supplementary Table 7). Focusing on day 30 vs day 5 comparisons, our analysis identified 150 sgRNAs that appeared to promote (n=125; LFC ≥ 1.5) or inhibit (n=25; LFC ≤ −1.5) PDEC proliferation (FDR ≤ 0.01) (Fig. 6cd and Supplementary Table 7b). Significantly enriched sgRNAs were predicted to install mutations in genes with known oncogenic activity, including Jak3, Fgfr2, and Egfr (Supplementary Table 7d). Mutations in genes with known tumor suppressive function were also represented, including Trp53, Apc, Fbxw7, Nf2, and Chek2 (Supplementary Table 7d). In fact, after filtering for sgRNAs with >20% editing activity, 72% of enriched sgRNAs (26/36) targeted known or likely oncogenic mutations, compared to 38% in non-enriched sgRNAs (p=0.0003; Fisher’s exact test) (Extended Fig. 9ab). Notably, more than half (19/36) of the enriched sgRNAs targeted Trp53, consistent with our proof-of-concept experiments (Fig. 6) and the role of p53 in suppressing mutant Kras-driven proliferation29 (Fig. 6c,d). In fact, collapsing the data to ‘gene-level’ scores identified Trp53 as the only significantly scoring gene in this screen (FDR < 0.01) (Supplementary Table 7c).

Using Trp53 as a case study, we next compared fitness scores with sensor editing data from the same screen. Most sgRNAs enriched in the proliferation screen showed high editing activity, including two potent sgRNAs we previously validated in vivo (C135Y and M237I) (Fig. 6e and Extended Fig. 8c). We identified several Trp53 missense and nonsense mutations that were enriched exclusively in vitro or in vivo (E271K, R337C, G199E), highlighting the importance of the context in measuring p53 variant fitness advantage. Most enriched sgRNAs demonstrated relatively high sensor editing, but notably, Trp53-R213 sgRNA showed >10-fold enrichment in the screen, despite <3% sensor target editing. Inspection of the cognate sensor cassette revealed that this sgRNA showed a high level of editing at an adjacent cytosine, creating a T211I mutation (Fig. 6e), a variant also observed in human cancers16, 30. Mice transplanted with Trp53T208I cells (corresponding to human TP53T211I) succumbed to a fully penetrant disease (median survival of 53 days) (Fig. 6f). Sequencing analysis of bulk tumor tissue gDNA confirmed the C>T (Trp53T208I) mutation, with <15% C>T editing at the cytosine within R210 (corresponding to human TP53R213) (Fig. 6g), implying T208I is the oncogenic driver in this case. These data identify multiple TP53 missense mutations as drivers of proliferation in non-transformed pancreatic epithelium and establish Trp53T208I/TP53T211I as a bona fide driver mutation in this mouse model of pancreatic cancer.

A flexible platform of BE sensor predictions

Mutation databases are expanding, and new Cas and base editor variants are being identified at a rapid pace. Motivated by this reality, we expanded the capabilities of AMINEsearch (see Methods) and applied it to a more recent release of MSK-IMPACT that contains sequencing of 47,550 tumors (Extended Fig. 10, Supplementary Table 8gj). The characteristics of an expanded set of Cas variants and base editors (including adenine base editors/ABEs8) were included as input and can be leveraged to select tools tailored to experiments that require maximum coverage or specificity (Supplementary Fig. 16).

DISCUSSION

BE is an efficient strategy to engineer and study SNVs, yet the identification of effective sgRNA-base editor combinations remains challenging. Here, we describe a versatile sensor-based BE platform that enables identification of efficient sgRNAs from large, pooled libraries across multiple species, cell lines, and base editor configurations. We show that sensor predictions can accelerate the characterization of cancer-associated SNVs in vivo and demonstrate that integrating a BE sensor can support the interpretation of BE genetic screens.

All-in-one library strategies have been described for measuring Cas9 and BE outcomes21, 3134. Such libraries have been used to develop machine learning tools to predict activity and purity of BE tools21, 32, 33, 35. These tools are useful for their scope of prediction, but predictions can diverge significantly from experimentally observed editing at endogenous sites (Fig. 4 and Supplementary Fig. 6, 1012). Our work shows that sensor-based activity estimates closely reflect editing outcomes at endogenous loci. Moreover, while it is possible that cell-specific differences in DNA accessibility or expression levels of base editors could impact editing efficiency, our results show relative consistency across cell lines, suggesting that sequence context is an important determinant of editing.

In addition to expected target C>T editing, we observed frequent ‘non-canonical’ (C>R) transversion editing. Transversion frequencies were influenced by local sequence features surrounding target cytosines (Fig. 4 and Extended Fig. 7b) and cell line. Analysis of multiple cell lines revealed that transversion editing is not universal and cannot be easily predicted by association with Signature 13 (APOBEC-driven C>G mutations)22. Our data showed that C>R editing occurs most frequently at DCT motifs (ACT>TCT>GCT) and is strongly disfavored when the target C is flanked by another cytosine. This observation is similar but not identical to that reported by Arbab et al21, who reported RCT motifs as the most prone to transversion editing. This distinction may reflect different cell types used or enrichment of specific genomic sites from cancer-associated mutations in our dataset compared to rationally designed sequences used in their study. Recently, multiple groups have described new base editors (CGBEs) that enable efficient transversion editing9, 3638. The ability to engineer transversions will significantly expand the mutation repertoire that can be engineered using BE. It remains unclear whether CGBE editors can overcome cell line-dependent effects that limit transversion editing.

Gene function is complex, reflected by the diversity of phenotypes, including therapeutic responses, that can be driven by distinct variant alleles39. Building defined genetic models of specific oncogenic alterations is critical to define their direct impact and to reveal new treatment strategies. Similarly, BE screens offer a new approach to interrogate gene variant function en masse40, 41. Unlike traditional CRISPR screens that provide ‘gene-level’ information, BE screens reveal ‘amino acid level’ information and, as such, cannot always rely on the activity of multiple sgRNAs to define ‘scoring’ hits. We argue that incorporation of a BE sensor cassette in BE libraries will enhance the interpretation of BE screens by providing preliminary validation of sgRNA activity, flagging possible false positives, and improving the classification of phenotype-neutral mutational events. Indeed, our proof-of-concept fitness screen in non-transformed pancreatic epithelial cells identified multiple candidate oncogenic variants spread across a collection of genes. These included Trp53 mutations that drive increased proliferation, but also those that induce target mutations without driving increased cellular fitness (at least in vitro) or that potentially exhibit context-specific phenotypes. Future iterations of this approach could employ unique molecular identifiers embedded within sensor backbones or sgRNAs42 to account for clonal phenotypes or elucidate variant-specific transcriptional effects using single cell RNA sequencing4345. Furthermore, the sensor framework should be compatible with emerging genome editing technologies like Prime Editing46, 47.

For those who wish to use individual validated BE sgRNAs or design alternate BE sensor libraries, we developed a web application, BE-SCAN (BE sensor-validated cancer-associated mutations; https://dowlab.shinyapps.io/BEscan/), that allows browsing and selection of guides by species, gene, target mutation, and/or base editor. Further, the expanded AMINEsearch-defined (non-sensor validated) collection of somatic cancer mutations is also available as an interactive portal within BE-SCAN. To enable the creation of sensor-based BE libraries beyond those described in this study, the full AMINEsearch pipeline is available (https://github.com/Kastenhuber/AMINEsearch) and can be run on any set of mutations and base editor characteristics. Beyond cancer-associated somatic mutations, we envision this approach could be employed to functionally annotate GWAS variants and mutations associated with heritable genetic disease. While we performed proliferation screens in immortalized cells, screening of genetic variants could just as easily be conducted using any sortable cellular feature or biosensor. We expect that the compendium of experimentally vetted BE tools described here will accelerate the development of next generation allele-focused in vitro and in vivo cancer models.

MATERIALS AND METHODS

DATA AVAILABILITY

All source data (including p-values) are available in Supplementary Table 10. Processed screening data is available in Supplementary Tables 1, 4 & 5 and primary sequencing data is available at the Sequence Read Archive (SRA) under accession PRJNA746395.

CODE AVAILABILITY

Code for analysis and data visualization is available at: https://github.com/schmidt73/base-editing-analysis, https://github.com/Kastenhuber/AMINEsearch, and https://github.com/lukedow/BEsensor

Plasmids and sgRNA cloning

Base editor plasmids

The following lentiviral base editing plasmids were used in this manuscript: FNLS (Addgene, #110841), AncBE4max (this manuscript), FNLS-2X (F2X) (Addgene, #110840), FNLS-HF1 (HF1) (Addgene, #110866), FNLS-HiFi (HiFi) (Addgene, #136902), FNLS-NG (NG) (Addgene, #136900), FNLS-HiFi-NG (HiFi-NG) (Addgene, #136903), FNLS-VQR (VQR) (this manuscript), and xFNLS (Addgene, #110872). All new plasmids and libraries will be available Addgene.

CRISPR nuclease plasmids

The following CRISPR nuclease plasmids were used in this manuscript: lentiCas9-Blast (Addgene, #52962), Cas9-NG (Addgene, #117919), and Cas9-Puro (Addgene, #110837).

Single guide RNA plasmids

The following sgRNA plasmids were used in this manuscript: LRT2B (Addgene, #110854) (ref. 1), pUSEPR (U6-sgRNA-EFS-Puro-P2A-TurboRFP)2, and pUSEBR (pUSE-Blast-P2A-TurboRFP) (this manuscript). We cloned Esp3I/BsmBI-compatible annealed and phosphorylated oligos encoding sgRNAs into Esp3I/BsmBI-linearized pLRT2B, pUSEPR, or pUSEBR using high concentration T4 DNA ligase (NEB). A 5’ G (to boost U6 transcriptional initiation) was added to sgRNAs that lacked it either by appending it to the 5’ or by substituting the first nucleotide in the 5’ position for a G. All sgRNA sequences used are listed in Supplementary Table 2.

Other plasmids

The GO (C>G) reporter was cloned by modifying the GO reporter system as described in ref.3. Briefly a custom GFP(ATC) gBlock cassette was inserted to EcoRI- and BsrgI-digested mUGISGO by standard InFusion assembly protocol. To insert GO3 sgRNA (C>G targeting guide), mU6-GO3-scaffold was amplified. Both inserts were digested with XhoI and Nsil and ligated using T4 DNA ligase.

Cell culture

HEK293T (ATCC CRL-3216), A549 (CCL-185), MDA-MB-231 (ATCC HTB-26), and KPT1 cells were cultured in DMEM supplemented with 10% fetal bovine serum (FBS) and 100 IU/mL of penicillin/streptomycin. KP cells4 were a kind gift from Dr. Tyler Jacks (MIT). PC9 cells were a kind gift from Dr. Harold Varmus (Weill Cornell) and cultured in RPMI supplemented with 10% fetal bovine serum (FBS) and 100 IU/mL of penicillin/streptomycin. NIH3T3 cells (ATCC CRL-1658) were cultured in DMEM supplemented with 10% fetal calf serum (FCS) and 100 IU/mL of penicillin/streptomycin. Pancreatic ductal epithelial cells (PDECs)5 were a kind gift from Dr. Dafna Bar-Sagi (New York University) and cultured in collagen-coated plates (100 μg/mL PureCol 5005, Advanced Biomatrix) with Advanced DMEM/F12 supplemented with 10% FBS (Gibco), 100 IU/mL of penicillin/streptomycin (Gibco), 100 mM Glutamax (Gibco), ITS Supplement (Sigma), 0.1 mg/mL soy trypsin-inhibitor (Gibco), Bovine Pituitary Extract (Gibco), 5 nM T3 (Sigma), 100 μg/mL Cholera toxin (Sigma), 4 μg/mL Dexamethasone (Sigma), and 10 ng/mL human EGF (Preprotech).

Virus production

Lentiviruses were produced by co-transfection of HEK293T cells with the relevant lentiviral transfer vector and packaging vectors psPax2 (Addgene, #12260) and pMD2.G (Addgene, #12259) using Lipofectamine 2000 (Invitrogen). Viral supernatants were collected at 48 and 72 hours post transfection and stored at −80°C.

Drug treatments

Nutlin-3 (Selleck Chemicals, S1061) was dissolved in DMSO at a stock concentration of 10 mM and used at a final concentration of 10 μM.

Flow cytometric analyses

GO validation experiments were measured in either a Thermo Fisher 2018 Attune NxT flow cytometer or a Guava Easycyte (Millipore). Fluorescence assisted cell sorting was performed in either BD FACS Aria II or Sony MA900 cell sorters.

Protein analysis

231’s, PC9’s, and 3T3s screen pellets were resuspended with 500ul RIPA buffer then centrifuged at 4°C at 13,000rpm to collect protein lysates. Antibodies used for western blot analyses were: Cas9 (CellSignaling, #19526S) and Actin (Abcam, #ab49900).

Animal work

Animals

All mouse experiments were approved by the Memorial Sloan-Kettering Cancer Center (MSKCC) Internal Animal Care and Use Committee under MSKCC IACUC protocol 11–06-018. Mice were maintained under specific pathogen-free conditions, and food and water were provided ad libitum. Foxn1nu (Swiss nude) mice were purchased from Envigo. All mice used were 6 to 8 week-old females.

Pancreatic orthotopic transplants

For transplantation of PDEC cells into the pancreas of adult mice, animals were anesthetized and a survival surgery was performed to expose the pancreas. Independent of genotype, a total of 1×105 PDEC cells resuspended in 25 μL of growth factor reduced Matrigel (354230; Corning) diluted 1:1 with cold OptiMEM (Gibco) were injected into the tail region of the pancreas of each mouse. Mice were monitored for tumor development over time by abdominal palpation and were euthanized upon developing overt disease and becoming moribund following disease monitoring guidelines of IACUC and the MSKCC Animal Facility.

Genomic DNA isolation

Isolation of gDNA from cells

Genomic DNA was extracted from cells using the DNeasy Blood and Tissue Kit (Qiagen) following manufacturer’s instructions. Cell pellets were processed in parallel and resulting gDNA was resuspended in 100–200 μL of 10 mM Tris-Cl; 0.5 mM EDTA; pH 9.0. Samples from corresponding replicates from MBES and HBES screens were pooled at the gDNA level, measured using a NanoDrop 2000 (ThermoFisher), and normalized before performing sequencing deconvolution.

Isolation of gDNA from tumor tissues

Genomic DNA was extracted from tissues using the DNeasy Blood and Tissue Kit (Qiagen) following manufacturer’s instructions. Multiple tumor fragments or nodules were microdissected and either processed immediately by finely mincing the tissue and incubating overnight in a lysis buffer containing proteinase K and following the manufacturer protocol or snap-frozen in liquid nitrogen and stored at −80°C until day of processing. Resulting gDNA was resuspended in 100–200 μL of 10 mM Tris-Cl; 0.5 mM EDTA; pH 9.0, measured using a NanoDrop 2000 (ThermoFisher), and normalized before assessing genome editing at the relevant locus of interest using deep sequencing.

AMINEsearch bioinformatic pipeline

We developed a genome editing design tool, AMINEsearch (Annotated Mutation-Informed Nucleotide Editing sgRNA search), implemented in R, to comprehensively build libraries of annotated gene editing reagents to model a user-defined set of mutations. This algorithm can be applied to any sequencing dataset that uses standard maf format files. For further description of the algorithm and analysis, including the process of library design, Off-target analysis, and conservation of variant protein sequence between human and mouse, please see Supplementary Note 1.

HBES and MBES library design

MSK-IMPACT sequencing data (n=21694 tumors) was used to design sgRNAs and sensors compatible with commonly used BE configurations, incorporating Cas variants (SpCas9, Cas9-NG, xCas9, and ScCas9) combined within the FNLS or F2X (expanded window) BE vector variants (Supplementary Table 2ab). IMPACT-derived outputs of AMINEsearch (Supplementary Table 2c,e) were used to compile unique sensor constructs to construct HBES (Supplementary Table 2d) and MBES libraries (Supplementary Table 2f) that target the human and mouse genome, respectively. These libraries served as the basis for experimental validation and screening of base editing sensors, which are available under the “Sensor validated” tab of the BE-SCAN web portal (https://dowlab.shinyapps.io/BEscan/).

AMINEsearch v2

Modifications to the algorithm were made to increase functionality of the AMINEsearch algorithm. Specifically, modifications were made to accommodate BE variants that edit outside the region complementary to the sgRNA (CDA-BE4). As the demands of running larger dataset grew, we incorporated the capacity to run parallel execution on multiple cores or processors. The option to reverse the effects of mutations, rather than model them, given a list of pathogenic mutations as input, was added. A known issue was addressed to handle multiple genotypes that converge on the same protein sequence substitution as independent mutations. Version 2 includes the ability to track expected variant protein sequence when modeling human mutations in the mouse genome (See Supplementary Note 1). Versions 1 and 2 of the algorithm can be accessed at https://github.com/Kastenhuber/AMINEsearch/tree/AMINEsearch_v1.0 and https://github.com/Kastenhuber/AMINEsearch/tree/AMINEsearch_v2.0, respectively and are generalizable to analyze new mutation datasets and/or new base editor configurations.

Exploratory set of BE sensor predictions

We applied the algorithm to the recently updated mutation dataset from the MSK-IMPACT platform6, containing 341,736 total somatic cancer SNVs derived from targeted sequencing of 47,550 tumor samples (Supplementary Fig. 17). This targeted sequencing panel captures the coding region of up to 580 genes. Candidates for base editing included 5542 unique SNVs, classified as missense, nonsense, splice site, or nonstop mutations, which were observed greater than six or more times (>0.01% frequency). We considered all combinations of 13 Cas9 orthologs and 11 deaminases, yielding 143 possible base editor configurations (Supplementary Table 8). This includes configurations that have been extensively characterized as well as combinations of Cas9 orthologs and deaminases that have not yet been assembled and used experimentally. Collectively, this exploratory set of sgRNA predictions provides a broad set of options to generate mutations in human and mouse (Supplementary Table 8). A searchable, filterable interface for the exploratory predicted set of sgRNAs are available under the “Sensor Design - Human” and “Sensor Design - Mouse” tabs in the shiny app web portal alongside sensor validated sgRNAs in BE-SCAN (https://dowlab.shinyapps.io/BEscan/).

Design and construction of mouse and human base editing sensor libraries

Base editing sensor module design

Each sensor module is composed of the following parts: 1) a 22nt long 5’ adapter/priming site with a Esp3I restriction site; 2) a 20nt long 5’ G-containing sgRNA; 3) a 93nt long improved SpCas9 sgRNA scaffold partially based on7; 4) an 11nt long sequence corresponding to the 5’ flanking sequence of the endogenous target site; 5) the 23nt cognate target site; 6) a 7nt long sequence corresponding to the 3’ flanking sequence of the endogenous target site; 7) and a 28nt long 3’ adapter/priming site with a EcoRI restriction site. Thus, oligos encoding individual sensor modules are 204nt long.

Cloning of mouse and human base editing sensor libraries

Due to longer-than-average oligo length, early attempts at design and construction of sensor libraries showed unacceptable synthesis and assembly error rates where, in some instances, over half of the sensors before or after assembly into the backbone were found to harbor insertions, deletions, single nucleotide mutations, and incorrect chimeric sgRNA-target site molecules (data not shown). Through extensive trial and error, we found that assembling sensor libraries using Agilent’s High Fidelity oligo synthesis platform significantly mitigated these issues.

All PAM sensor (APS) (1,152 oligos white-listed), MBES (4,686 oligos), and HBES (5,855 oligos) libraries were cloned into the pLRT2B backbone1 as follows (all library oligos are in Supplementary Table 2). Briefly, each oligo pool was amplified using forward and reverse primers that append Esp3I and EcoRI sites to the 5’ and 3’ ends of the sensor insert, purified using the QIAquick PCR Purification Kit (Qiagen), and ligated into Esp3I-digested and dephosphorylated pLRT2B vector using high-concentration T4 DNA ligase (NEB) (all cloning and sequencing oligos are in Supplementary Table 9). To ensure maximum library recovery, we set up n=24 parallel PCR reactions per pool. A minimum of 2.4 ug of ligated pLRT2B plasmid DNA per pool (corresponding to n=8 ligations) was electroporated into Endura electrocompetent cells (Lucigen), recovered for one hour at 37C, plated across four 15cm LB-Carbenicillin plates (Teknova), and incubated at 37°C for 16 hours. The total number of bacterial colonies per pool was quantified using serial dilution plates to ensure a library representation of >10,000X. The next morning, bacterial colonies were scraped and briefly expanded for 4 hours at 37°C in 500mL of LB-Carbenicillin. Plasmid DNA was isolated using the Plasmid Plus Maxi Kit (Qiagen). To assess sensor distribution and fidelity of assembly per pool, we amplified the sensor region using primers that append Illumina sequencing adapters on the 5’ and 3’ ends of the amplicon, as well as a random nucleotide stagger and unique demultiplexing barcode on the 5’ end (Supplementary Table 9). Library amplicons were size-selected on a 2.5% agarose gel, purified using the QIAquick Gel Extraction Kit (Qiagen), and sequenced on an Illumina MiSeq instrument.

Analysis of base editing activity using GO reporter system

Base editor expressing cells were plated at a density of 5,000 cells/well in 12 well plates and transduced 24 hours later with a defined amount of GO reporter to achieve 20–50% transduction efficiency. Virus-containing media was replaced with complete media 24 hours post-transduction and cells were harvested for flow cytometry at 96 hours post-transduction. We used an Attune NxT flow cytometer (Thermo Fisher). Cells were trypsinized with a 100 μl of 0.25% Trypsin+EDTA and resuspended in 300 μl of complete medium in a 96 well U bottom plate. Data was acquired at a flow rate of 500 μl/min and at least 10,000 events from the single cell population gating were recorded.

Screening and deconvolution of mouse and human base editing sensor libraries

Screening of mouse and human base editing sensor libraries

We first screened the APS library in MDA-MB-231 cells expressing one of nine different base editors, as well as either the Cas9 or Cas9-NG nucleases as cutting controls. APS screens were performed essentially as described below in detail. We then screened a total of five mouse and human base editor-expressing cell lines with either MBES or HBES libraries using the following approach. Human cell lines (MDA-MD-231 and PC9) were screened with MBES to minimize fitness differences between sensor modules due to endogenous targeting of genes that suppress cellular proliferation. Following the same rationale, mouse cell lines (KPT1, NIH3T3, and PDECs) were screened with HBES. Each screen (including the APS set) was performed as follows. To ensure that most cells harbor a single sgRNA integration event, we determined the volume of viral supernatant that would achieve an MOI between ~0.3–0.5 upon standard transduction of a population of base editor-expressing cells. All screens were performed in technical duplicate and each step of the screen – from infection to sequencing – was optimized to achieve a minimum representation of 1000X. For instance, to ensure a representation of >1000X for HBES libraries at the transduction step, we spinfected a total of 24 million cells across two 12-well plates per technical replicate using the volume of viral supernatant that would achieve a 30% infection rate (~7.2 million transduced cells per technical replicate). 24 hours after infection, cells from each corresponding replicate were pooled into a minimum of 2 × 150mm tissue culture dishes (Corning) and selected with Blasticidin S (Gibco) at an empirically-determined final concentration ranging from 5 μg/mL to 30 μg/mL depending on the cell line. Cells were cultured and kept under Blasticidin selection for seven days post-transduction. When needed, cells were trypsinized and re-plated at a minimum of 6 million cells per replicate to ensure a minimum representation of 1000X. For PDEC screens, cell representation per replicate was maintained at >600X at all points. Subsequently, at least 6 million cells were pelleted and stored at −20°C. Genomic DNA (gDNA) from cells was isolated using the DNeasy Blood & Tissue Kit (Qiagen) following the manufacturer’s guidelines. Genomic DNA was harvested from all timepoints and both sensor BE activity and sgRNA abundance were assessed via NGS.

Deconvolution of mouse and human base editing sensor screens

We assumed that each cell contains approximately 6.6 pg of gDNA. Therefore, screen deconvolution at 1000X required sampling ~6 million × 6.6 pg of gDNA, or ~39.6 ug. We employed a modified 2-step PCR version of the protocol published by Doench et al8 adapted to our unique library design. Briefly, we performed an initial PCR, whereby the integrated sensor cassettes were amplified from gDNA, followed by a second PCR to append Illumina sequencing adapters on the 5’ and 3’ ends of the amplicon, as well as a random hexamer and unique demultiplexing barcode on the 5’ end. Each “PCR1” reaction contained either 25 μL of Q5 High-Fidelity 2X Master Mix (NEB), 2.5 μL of Sensor_v6_Fwd Primer (10 μM), 2.5 μL of Sensor_v6_Rev Primer (10 μM), and 5 μg of gDNA in 20 μL of water (for a total volume of 50 μL per reaction) or 10μL of Herculase II 5X Master Mix (Agilent), 0.5 uL dNTPs, 2.5 μL of Sensor_v6_Fwd Primer (10 μM), 2.5 μL of Sensor_v6_Rev Primer (10 μM), 1 uL of Herculase II polymerase, and 5 μg of gDNA in 33.5 μL of water (for a total volume of 50 μL per reaction). The number of PCR1 reactions was scaled accordingly; therefore, we performed eight PCR1 reactions per technical replicate and time point for all screens. PCR1 amplicons were purified using the QIAquick PCR Purification Kit (Qiagen) and used as template for “PCR2” reactions. Each PCR2 reaction contained either 25 μL of NEBNext 2X Master Mix (NEB), 2.5 μL of a uniquely barcoded PCR2_Fwd Primer (10 μM), 2.5 μL of a common PCR2_Rev Primer (10 μM), and 300 ng of PCR1 product in 20 μL of water (for a total volume of 50 μL per reaction). We performed two PCR2 reactions per PCR1 product. Library amplicons were size-selected either on a 2.5% agarose gel and purified using the QIAquick Gel Extraction Kit (Qiagen) or using AMPure XP beads (Beckman Coulter) followed by normalization, pooling, and sequencing on an Illumina NextSeq 500 instrument (150 nt paired-end reads). All primer sequences are available in Supplementary Table 6. PCR program for PCR1 using Q5 High-Fidelity 2X Master Mix (NEB) was: 1) 98°C × 30s; 2) 98°C × 10s; 3) 55°C × 30s; 4) 72°C × 30s; 5) Go to step 2 × 24 cycles; 6) 72°C × 2 min; 7) 4°C forever. When using Herculase II, denaturation steps were done at 95°C and the initial denaturation lasted for 2 minutes. PCR program for PCR2 using NEBNext 2X Master Mix (NEB) was: 1) 98°C × 30s; 2) 98°C × 10s; 3) 65°C × 30s; 4) 72°C × 30s; 5) Go to step 2 × 17 cycles; 6) 72°C × 2 min; 7) 4°C forever.

Analysis of mouse and human base editing sensor screening data

To quantify base editing outcomes, raw paired-end FASTQ reads were paired using Pandaseq and merged FASTQ files were used as input for downstream analysis. We first removed reads with mutated sgRNAs or scaffolds, or reads with non-matching sgRNA and target sequences (due to templates switching during PCR amplification). Next, the 5’ scaffold and linker were used to associate each sgRNA with the read; sgRNAs that did not match the whitelist were also discarded. All remaining reads were aligned to their cognate target found in the whitelist and aligned reads with no indels were considered for base editing analysis. Sensors that deviated from the expected length were flagged as indels and their actual frequency calculated as a specific insertion or deletion. Editing events were classified over all cytosines within a −5 to 20 position window of the target, where 1 is defined as the first position of the protospacer. Target cytosine editing (tCTN, tCGN, tCAN) quantified the frequency of editing at the target cytosine regardless of editing at other adjacent cytosines. Target cytosine editing without collateral editing (tCT) was measured as specific C>T editing without associated mutations at adjacent sites. Purity values were calculated as the ratio of tCT/tCTN. Custom code to perform the BE analysis is available at: https://github.com/schmidt73/base-editing-analysis. All data for APS, HBES, and MBES screens can be found in Supplementary Tables 1, 4, and 5, respectively.

Comparison of base editing sensor screening outcomes with BE-Hive

To consider the performance of the BE-Hive base-editing outcome prediction model in relation to our data, we pulled the fully trained network from the Github repository linked by Arbab et. al9. This model is split into two parts: a component that predicts the probability of any edit occurring and another that predicts the probability of a specific base editing outcome, conditioned on any editing occurring. These are referred to as the editing efficiency and bystander model, respectively. Per instructions, we fed in our input spacer and its 50-mer context to both models and computed the posterior probability of each observed outcome using the chain rule. One caveat is that the editing efficiency model directly predicts an un-transformed score, not a probability. To convert this to a probability, we sigmoid transformed it into the unit interval [0, 1]. Following the recommendations provided in their README file, we first linearly re-scaled the score using the mean and standard deviation of reads prior to sigmoid transforming it into a probability to account for variance in base editor expression by experimental condition and cell-type. We note that this is a monotonic operation; therefore, it should not affect any SpearmanR correlations used to analyze performance.

BE-Hive does not consider PAM sequence as a feature in their prediction mode. Given that the PAM is an important determinant of the activity of standard base editors, we considered two different sets of sgRNAs in our comparisons. First, we considered all sgRNAs used in our screen. Second, we considered only the sgRNAs with canonical NGG PAMs.

Phenotypic screens using base editing sensor libraries

Stable base editor-expressing KrasG12D/+; Trp53WT/WT pancreatic ductal epithelial cells (PDECs)5 were generated by lentiviral transduction with FNLS (Addgene, #110841) and validated using GO3 (Addgene, #136896). Phenotypic MBES screens in FNLS-PDECs were set up essentially as described above for HBES/MBES sensor screens with a few modifications. MBES FNLS-PDEC screens were performed across six independent transduction replicates in parallel. Each replicate was maintained at a minimum 500X representation at every step of the screen by replating 3 million cells per time point and pelleting the rest of the cells for gDNA isolation and screen deconvolution. Screens ran for approximately 36 cumulative population doublings across 34 days, after which we isolated gDNA and proceeded to perform screen deconvolution essentially as described above for MBES/HBES screens.

Proliferation screen analysis

Paired-end reads were joined using Pandaseq. Merged reads were processed as described for BE analysis above. Total read counts for each replicate were used as input for MAGeCK10, 11 analysis. Any sgRNA with read counts < 100 were removed from analysis. Comparisons of T0 (day 5 post-transduction) versus T1 (day 14) and T2 (day 30 post-transduction) for each replicate were performed using MAGeCK to determine log fold changes. Base editing outcomes at the sensor target site were concurrently measured using the sensor screen pipeline described above.

Validation experiments

For validation of individual targets, sgRNAs were cloned into the lentiviral guide expression vector LRT2B (Addgene, #110854) and lentiviral particles were produced as described above. Base editor expressing cells were plated at a density of 25,000 cells/well in 12 well plates and were infected 24-hours later with enough virus to achieve 50% transduction efficiency. Virus-containing media was replaced with complete media 24 hours post-transduction and cells were plated into selection media containing 3 μg/mL Blasticidin S (Gibco). Experimental cells remained in selection media until the final collection time point at 7 days post-transduction. Final LRT2B infection efficiency was determined by measuring the levels of tdTomato in 10% of the cells remaining at day 7 using flow cytometry. Genomic DNA was isolated using the protocol found on dowlab.org/protocols, and targets were amplified using a 100 μl reaction following the standard NEB Taq 2x MM protocol with primers found in Supplementary Table 9. Each PCR was performed 3X/target and pooled. Amplicons were confirmed on a 2% agarose gel and PCR purified using QIAGEN QIAquick PCR purification kit. DNA concentration was measured using a Nanodrop and samples were normalized to 20 ng/μl and sequenced using EZ-amplicon sequencing (MiSeq; 2 × 250bp) by GENEWIZ, Inc (South Plainfield, NJ, USA).

Analysis of deep sequencing data from validation experiments

CRISPResso212 was used to process sequencing reads from the validation experiments and the corresponding sensor sequencing results for each individual target. The data was analyzed on default CRISPResso2 base editor mode with the exception to the following parameters for endogenous locus results: -quantification_window_center −15 and sensor results: --quantification_window_size 10 --quantification_window_size 10 --base_editor_output --quantification_window_center −15 --exclude_bp_from_right 1 --plot_window_size 18. To calculate target C>T editing and non-canonical editing we used the “Alleles_frequency_table_around_sgRNA.txt” file to get the read counts for a specific allele.

Statistical analyses and Data Visualization

Analysis and data visualization in R

Heatmaps, dotplots, and correlation analyses (including correlation graphs) were performed in R version 3.6.3 and plots were produced using the ggplot2 and ggpubr package. Statistical considerations are reported in each figure legend.

Analysis and data visualization in GraphPad PRISM

Additional bar plots, survival curves, and associated statistical analyses were generated using Prism 8 (GraphPad) and are indicated in figure legends. Error bars represent standard deviation, unless otherwise noted. We used Student’s t-test (unpaired, two-tailed) to assess significance between experimental and control groups, and to calculate P values. P<0.05 was considered statistically significant. Schematics were created using BioRender.com.

Source data availability

All source data (including p-values) are available in Supplementary Table 10. Processed screening data is available in Supplementary Table 7 and primary data has been deposited in the SRA repository under accession PRJNA746395.

Extended Data

Extended Data Fig. 1. BE efficiency for mouse sgRNAs in the APS library.

Extended Data Fig. 1

C>T editing efficiency (%) at each APS library mouse target site across base editor enzymes, as indicated. Cas9 and Cas9-NG serve as nuclease controls. Rows denote sgRNAs; columns denote PAM subclass.

Extended Data Fig. 2. Cancer somatic mutation-derived base editing sensor libraries.

Extended Data Fig. 2

(a) Number of unique recurrent SNVs per gene, ordered by mutation frequency of gene. Bars are split to indicate proportion of SNVs targeted (red) or not (black) in the HBES library. (b) Focality of mutations by cancer gene classification. Number of cumulative mutations observed in recurrent sites with respect to the number of unique SNVs observed per gene. Oncogenes are indicated by red dots and tumor suppressor genes are indicated by blue dots. Mutations in oncogenes tend to be more focal on distinct hotspot sites, with greater number of recurrent mutations per unique SNV allele (11.1 vs 6.2 mutations per unique recurrent SNV, p=0.011, two-tailed t-test). (c) Venn diagram of sgRNAs in HBES library compatible with each base editor configuration. (d) Venn diagram of sgRNAs in MBES library compatible with each base editor configuration. (e) SNV-level annotation with each color bar sorted in order of observed mutation frequency (top). NV characteristics are indicated, including oncogenic function (OncoKB assertion of oncogenic/Likely oncogenic/VUS) and therapeutic implications (OncoKB highest level of evidence for drug sensitivity or resistance) {Chakravarty, 2017 #76;Chakravarty, 2021 #105}

Extended Data Fig. 3. Off-target editing predictions for base editing sensor libraries.

Extended Data Fig. 3

(a) For sgRNAs in HBES library, distribution of potential off-target (OT) sites identified by PAM specificity and extent of mismatch. (b) Number of sgRNAs in HBES library targeting the human genome with 0 (white) and 1 or more (black) predicted OT sites depending on SPCas9 or Cas9-NG PAM specificity. A greater number of sgRNAs have no predicted OT sites used in conjunction with SpCas9 than with Cas9-NG. p<2.2e-16, 2-sided Fisher’s exact test. (c) For sgRNAs in HBES library, distribution of potential OT sites identified by PAM specificity and extent of mismatch. (d) Number of sgRNAs in MBES targeting mouse genome with 0 (white) and 1 or more (black) predicted OT sites depending on SPCas9 or Cas9-NG PAM specificity. A greater number of sgRNAs have no predicted OT sites used in conjunction with SpCas9 than with Cas9-NG. p<2.2e-16, 2-sided Fisher’s exact test. (e) Distribution of not-target editable bases (C for CBE) within the editing window for HBES library targeting human genome. (f) Distribution of not-target editable bases (C for CBE) within the editing window for MBES library targeting mouse genome.

Extended Data Fig. 4. Comparison of editing range (editing window) across FNLS, F2X, and FNLS-NG base editors as a function of dinucleotide context.

Extended Data Fig. 4

Plots represent the mean normalized BE editing efficiency for each base editor (FNLS = yellow, F2X = blue, FNLS-NG = grey) across 5 cell lines (rows) and 4 dinucleotide contexts (columns). Area shaded in grey denotes maximum editing range in each condition where normalized BE is above 30% (dotted line).

Extended Data Fig. 5. Correlation of sgRNA efficiency ranking.

Extended Data Fig. 5

Plots represent correlation of individual sgRNA efficiency rankings between MDA-MB-231 and NIH3T3, KPT1, and PDEC cells, as indicated. To reduce noise created by low efficiency sgRNAs, only HBES sgRNAs that had >1% activity in the sensor were included. Pearson correlation coefficients are shown; for all comparisons, p<2.22 e-16.

Extended Data Fig. 6. Indel and BE correlation across cell lines.

Extended Data Fig. 6

Correlation of indel and C>T editing frequencies for all sgRNAs in the HBES library across 5 screen cell lines. Pearson correlation coefficients were calculated using ggpubr(0.4.0) package in R, the p value represents the significance of two-sided t-test.

Extended Data Fig. 7. Non-canonical cytosine editing identified by BE Sensor.

Extended Data Fig. 7

(a) Plots of C>T and C>G editing of sensor constructs in MDA-MB-231 and PC9 cells as indicated. Dotted line indicated 1:1 ratio of C>T/C>G editing for a given target. R represent Spearman correlation. (b) Ratio of C>G/C>T editing in FNLS-MDA-MB-231 cells transduced with the HBES library classified by dinucleotide context (fill) and trinucleotide context (column). Data includes all base editors (FNLS, F2X and FNLS-NG) and is filtered for sgRNAs that show more than 5% C>T editing in the sensor assay. Boxplots show the median and interquartile range (IQR) and whiskers represent 1.5*IQR. Outliers are shown as individual points. ns indicate p>0.05; p values were determined with two-sided Wilcoxon signed rank test. Complete list of all comparisons is available in Supplementary Table 10g. (c) Schematic of (C>G) reporter developed by modifying the GO (C>T) reporter.

Extended Data Fig. 8. In vivo validation of cancer-associated TP53 missense mutations using BE.

Extended Data Fig. 8

(a) Survival analysis of mice transplanted with F2X-expressing PDECs transduced with specific Trp53-targeting base editing sgRNAs. N=5 mice per sgRNA per mutation. (b) Frequency of target C>T editing in tumors from transplanted mice. Each individual point represents a single isolated tumor (n=3+ per sgRNA) Target C>T editing was measured by next generation sequencing of amplified target loci and data was analyzed using CRISPResso2. Data are presented as +/− SD. (c) In vivo validation of M237I and C135Y mutations via orthotopic transplantation of FNLS-expressing PDECs transduced with sgRNAs designed to introduce the corresponding mutations in the mouse Trp53 gene (M234I and C132Y, respectively). N=5 mice per mutation. (d) Representative macroscopic (left) and microscopic (right; H&E) images of pancreatic tumors isolated from mice transplanted with FNLS-expressing PDEC cells transduced with specific Trp53-targeting base editing sgRNAs. (e) Representative Sanger sequencing traces from tumors in (d). Red arrows denote target cytosines that, when mutated to thymine, give rise to the corresponding amino acid changes in the p53 protein. Nucleotide triplets on the right denote the precise mutational events that give rise to mutant p53 proteins. * p ≤ 0.05, ** p ≤ 0.01. P-values were calculated using the log-rank test.

Extended Data Fig. 9. Classification of screen hits by OncoKB.

Extended Data Fig. 9

(a) sgRNAs from the MBES proliferation screen were binned by categories: i) all sgRNAs; ii) sgRNAs depleted by <1.5 LFC and exhibiting 20% editing at the sensor; iii) sgRNAs enriched by >1.5 LFC; or iv) sgRNAs enriched >1.5 LFC and exhibiting 20% editing at the sensor followed by calculation of the percentage of each OncoKB classification. P-values indicate two-sided Fisher’s exact test comparison of the frequency of known or likely oncogenic mutations in each subset. (b) Bubble plot comparing sgRNA log fold changes with mean frequency of C>T editing in the sensor target site between days 5 and 30 post-transduction. Bubbles were colored by their OncoKB classification. Size denotes MaGeCK score (see Supplementary Table 6d).

Extended Data Fig. 10. Expanded base editing predictions.

Extended Data Fig. 10

(a) We used the MSK-IMPACT clinical tumor sequencing dataset and the characteristics of commonly used base editors to inform the design of base editing sensor libraries used in the experiments in Fig. 36. These results are available in the Shiny web portal (https://dowlab.shinyapps.io/BEscan/). Using updated and expanded versions of MSK-IMPACT sequencing data, base editing configurations, and AMINEsearch v2, we generated an exploratory set of sgRNA and sensor predictions, which are also available in the Shiny web portal. The more recent version of MSK-IMPACT contains increased numbers of (b) tumors sequenced, (c) total SNVs observed, and (d) candidate unique recurrent SNVs. These factors in the input led to to an increase in the exploratory set (v2) compared to the HBES and MBES libraries (v1) in respect to (e) Cas variants (determining PAM recognition) and base editor variants (determining editing window), collectively making base editor configurations with distinct properties (f). These factors in the input led to to an increase in the exploratory set (v2) compared to the HBES and MBES libraries (v1) in respect to (g) number of sgRNAs designed and (h) unique SNVs targeted by one or more sgRNAs in the database.

Supplementary Material

Supplementary Table S1

Supplementary Table 1. APS library screening results. Each tab denotes a different base editor APS screen.

Supplementary Table S2

Supplementary Table 2. AMINEsearch parameters, outputs, and HBES/MBES libraries.

Supplementary Table S3

Supplementary Table 3. Off target analyses using CasOFFinder.

Each tab denotes a different enzyme.

Supplementary Table S4

Supplementary Table 4. HBES library screening results. Each tab denotes a different base editor HBES screen.

Supplementary Table S5

Supplementary Table 5. MBES library screening results. Each tab denotes a different base editor MBES screen.

Supplementary Table S6

Supplementary Table 6. Sensor editing comparison to BE-Hive predictions

Supplementary Table S7

Supplementary Table 7. Results from MBES library proliferation screen in PDECs.

Supplementary Table S8

Supplementary Table 8. Expanded AMINEsearch parameters, outputs, and HBESv2/MBESv2 libraries.

Supplementary Table S9

Supplementary Table 9. Primers and oligos used in this study.

Supplementary Table S10

Supplementary Table 10. Source data and p-values.

1760571_Sup_Info

ACKNOWLEDGEMENTS

We thank David Solit, Nikolaus Schultz, Michael Berger, and Benjamin Gross for access to MSK-IMPACT data, Tyler Jacks for sharing KP cells, Direna Alonso Curbelo and Dafna Bar-Sagi for sharing PDEC cells, Maria Paz Zafra for sharing primers to assess tumor purity, Thomas M. Norman for conceptual advice, and Lewis Cantley for support and mentorship. We gratefully acknowledge the members of the Molecular Diagnostics Service in the Department of Pathology and the Marie-Josée and Henry R. Kravis Center for Molecular Oncology and the members of the Integrated Genomics Operation and Bioinformatics Core (P30 CA008748). This work was supported by a project grant from the NIH/NCI (R01CA229773-01A1), P01 CA087497 (SWL), a MSKCC Functional Genomics Initiative (FGI) grant (SWL), and an Agilent Technologies Thought Leader Award (SWL). FJSR was supported by the MSKCC TROT program (5T32CA160001), a GMTEC Postdoctoral Researcher Innovation Grant, and is an HHMI Hanna Gray Fellow. BJD was supported by an F31 Ruth L. Kirschstein Predoctoral Individual National Research Service Award (F31-CA261061-01). ERK was supported by an F31 Ruth L. Kirschstein Predoctoral Individual National Research Service Award (F31-CA192835) and is currently supported by NCI R35CA197588, awarded to Lewis Cantley. AK was supported by an F31 Ruth L. Kirschstein Predoctoral Individual National Research Service Award (F31-CA247351-02). JL was supported by the German Research Foundation (DFG) and the Shulamit Katzman Endowed Postdoctoral Research Fellowship. SVP was supported by the German Academic Scholarship Foundation. FMB was supported by a GMTEC Postdoctoral Fellowship, an MSKCC’s Translational Research Oncology Training Fellowship (5T32CA160001-08), and a Young Investigator Award from the Edward P. Evans Foundation. KMT is supported by the Jane Coffin Childs Memorial Fund for Medical Research. DC and HZ acknowledge funding from the MSKCC Marie-Josée and Henry R. Kravis Center for Molecular Oncology for supporting OncoKB. SWL is the Geoffrey Beene Chair of Cancer Biology and an Investigator of the Howard Hughes Medical Institute. LED is the Burt Gwirtzman Research Scholar in Lung Cancer at Weill Cornell Medicine. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH.

Footnotes

COMPETING INTERESTS

LED is a scientific advisor and holds equity in Mirimus Inc. SWL is an advisor for and has equity in the following biotechnology companies: ORIC Pharmaceuticals, Faeth Therapeutics, Blueprint Medicines, Geras Bio, Mirimus Inc., and PMV Pharmaceuticals. SWL acknowledges receiving funding and research support from Agilent Technologies for the purposes of massively parallel oligo synthesis. The remaining authors declare no competing interests.

REFERENCES

  • 1.Gorelick AN et al. Phase and context shape the function of composite oncogenic mutations. Nature 582, 100–103 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Hyman DM et al. AKT Inhibition in Solid Tumors With AKT1 Mutations. J Clin Oncol 35, 2251–2259 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Vasan N et al. Double PIK3CA mutations in cis increase oncogenicity and sensitivity to PI3Kalpha inhibitors. Science 366, 714–723 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Zafra MP et al. An In Vivo Kras Allelic Series Reveals Distinct Phenotypes of Common Oncogenic Variants. Cancer Discov 10, 1654–1671 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Findlay GM et al. Accurate classification of BRCA1 variants with saturation genome editing. Nature 562, 217–222 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Vivanco I et al. Differential sensitivity of glioma- versus lung cancer-specific EGFR mutations to EGFR kinase inhibitors. Cancer Discov 2, 458–471 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Komor AC, Kim YB, Packer MS, Zuris JA & Liu DR Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage. Nature 533, 420–424 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Gaudelli NM et al. Programmable base editing of A•T to G•C in genomic DNA without DNA cleavage. Nature 551, 464–471 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Koblan LW et al. Improving cytidine and adenine base editors by expression optimization and ancestral reconstruction. Nat Biotechnol 36, 843–846 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Zafra MP et al. Optimized base editors enable efficient editing in cells, organoids and mice. Nat Biotechnol 36, 888–893 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Kleinstiver BP et al. Engineered CRISPR-Cas9 nucleases with altered PAM specificities. Nature 523, 481–485 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Katti A et al. GO: a functional reporter system to identify and enrich base editing activity. Nucleic Acids Res 48, 2841–2852 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Vakulskas CA et al. A high-fidelity Cas9 mutant delivered as a ribonucleoprotein complex enables efficient gene editing in human hematopoietic stem and progenitor cells. Nature medicine 24, 1216–1224 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Nishimasu H et al. Engineered CRISPR-Cas9 nuclease with expanded targeting space. Science 361, 1259–1262 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Hu JH et al. Evolved Cas9 variants with broad PAM compatibility and high DNA specificity. Nature 556, 57–63 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Zehir A et al. Mutational landscape of metastatic cancer revealed from prospective clinical sequencing of 10,000 patients. Nat Med 23, 703–713 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Chakravarty D et al. OncoKB: A Precision Oncology Knowledge Base. JCO Precision Oncology, 1–16 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Chakravarty D & Solit DB Clinical cancer genomic profiling. Nat Rev Genet 22, 483–501 (2021). [DOI] [PubMed] [Google Scholar]
  • 19.Dimitrova N et al. Stromal Expression of miR-143/145 Promotes Neoangiogenesis in Lung Cancer Development. Cancer Discov 6, 188–201 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Lee KE & Bar-Sagi D Oncogenic KRas suppresses inflammation-associated senescence of pancreatic ductal cells. Cancer Cell 18, 448–458 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Arbab M et al. Determinants of Base Editing Outcomes from Target Library Analysis and Machine Learning. Cell 182, 463–480 e430 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Alexandrov LB et al. Signatures of mutational processes in human cancer. Nature 500, 415–421 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Komor AC et al. Improved base excision repair inhibition and bacteriophage Mu Gam protein yields C:G-to-T:A base editors with higher efficiency and product purity. Sci Adv 3, eaao4774 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Kastenhuber ER & Lowe SW Putting p53 in Context. Cell 170, 1062–1078 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Muller PA & Vousden KH Mutant p53 in cancer: new functions and therapeutic opportunities. Cancer Cell 25, 304–317 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Vassilev LT et al. In Vivo Activation of the p53 Pathway by Small-Molecule Antagonists of MDM2. Science 303, 844–848 (2004). [DOI] [PubMed] [Google Scholar]
  • 27.Li W et al. Quality control, modeling, and visualization of CRISPR screens with MAGeCK-VISPR. Genome Biol 16, 281 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Li W et al. MAGeCK enables robust identification of essential genes from genome-scale CRISPR/Cas9 knockout screens. Genome Biology 15, 554 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Morris J.P.t. et al. alpha-Ketoglutarate links p53 to cell fate during tumour suppression. Nature 573, 595–599 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Kanda M et al. Mutant TP53 in duodenal samples of pancreatic juice from patients with pancreatic cancer or high-grade dysplasia. Clin Gastroenterol Hepatol 11, 719–730 e715 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Koblan LW et al. Efficient C*G-to-G*C base editors developed using CRISPRi screens, target-library analysis, and machine learning. Nat Biotechnol (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Shen MW et al. Predictable and precise template-free CRISPR editing of pathogenic variants. Nature 563, 646–651 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Song M et al. Sequence-specific prediction of the efficiencies of adenine and cytosine base editors. Nature Biotechnology 38, 1037–1043 (2020). [DOI] [PubMed] [Google Scholar]
  • 34.Tycko J et al. Pairwise library screen systematically interrogates Staphylococcus aureus Cas9 specificity in human cells. Nat Commun 9, 2962 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Marquart KF et al. Predicting base editing outcomes with an attention-based deep learning algorithm trained on high-throughput target library screens. Nature Communications 12, 5114 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Chen L et al. Programmable C:G to G:C genome editing with CRISPR-Cas9-directed base excision repair proteins. Nature Communications 12, 1384 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Kurt IC et al. CRISPR C-to-G base editors for inducing targeted DNA transversions in human cells. Nature Biotechnology 39, 41–46 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Zhao D et al. Glycosylase base editors enable C-to-A and C-to-G base changes. Nature Biotechnology 39, 35–40 (2021). [DOI] [PubMed] [Google Scholar]
  • 39.Hyman DM, Taylor BS & Baselga J Implementing Genome-Driven Oncology. Cell 168, 584–599 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Cuella-Martin R et al. Functional interrogation of DNA damage response variants with base editing screens. Cell 184, 1081–1097 e1019 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Hanna RE et al. Massively parallel assessment of human variants with base editor screens. Cell 184, 1064–1080 e1020 (2021). [DOI] [PubMed] [Google Scholar]
  • 42.Xu P et al. Genome-wide interrogation of gene functions through base editor screens empowered by barcoded sgRNAs. Nat Biotechnol (2021). [DOI] [PubMed] [Google Scholar]
  • 43.Adamson B et al. A Multiplexed Single-Cell CRISPR Screening Platform Enables Systematic Dissection of the Unfolded Protein Response. Cell 167, 1867–1882.e1821 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Datlinger P et al. Pooled CRISPR screening with single-cell transcriptome readout. Nature Methods 14, 297–301 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Dixit A et al. Perturb-Seq: Dissecting Molecular Circuits with Scalable Single-Cell RNA Profiling of Pooled Genetic Screens. Cell 167, 1853–1866.e1817 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Anzalone AV et al. Search-and-replace genome editing without double-strand breaks or donor DNA. Nature 576, 149–157 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Kim HK et al. Predicting the efficiency of prime editing guide RNAs in human cells. Nature Biotechnology 39, 198–206 (2021). [DOI] [PubMed] [Google Scholar]
  • 48.Clement K et al. CRISPResso2 provides accurate and rapid genome editing sequence analysis. Nature Biotechnology 37, 224–226 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Pagès H, Carlson M, Falcon S, Li N AnnotationDbi: Annotation Database Interface. R package version 1.44.0. (2018). [Google Scholar]
  • 50.Pagès H BSgenome: Software infrastructure for efficient representation of full genomes and their SNPs. R package version 1.50.0. (2018). [Google Scholar]
  • 51.Pagès H, Aboyoun P, G.R , DebRoy S Biostrings: Efficient manipulation of biological strings. R package version 2.50.2. (2019). [Google Scholar]
  • 52.Team TBD BSgenome.Hsapiens.UCSC.hg19: Full genome sequences for Homo sapiens (UCSC version hg19). R package version 1.4.0. (2014). [Google Scholar]
  • 53.Zhang Y et al. Comparison of non-canonical PAMs for CRISPR/Cas9-mediated DNA cleavage in human cells. Scientific Reports 4, 5405 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Durinck S, Spellman PT, Birney E & Huber W Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt. Nature Protocols 4, 1184–1191 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Bae S, Park J & Kim JS Cas-OFFinder: a fast and versatile algorithm that searches for potential off-target sites of Cas9 RNA-guided endonucleases. Bioinformatics 30, 1473–1475 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Rodriguez JM et al. APPRIS 2017: principal isoforms for multiple gene sets. Nucleic Acids Res 46, D213–d217 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Pujar S et al. Consensus coding sequence (CCDS) database: a standardized set of human and mouse protein-coding regions supported by expert curation. Nucleic Acids Res 46, D221–d228 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]

REFERENCES (METHODS SECTION)

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Table S1

Supplementary Table 1. APS library screening results. Each tab denotes a different base editor APS screen.

Supplementary Table S2

Supplementary Table 2. AMINEsearch parameters, outputs, and HBES/MBES libraries.

Supplementary Table S3

Supplementary Table 3. Off target analyses using CasOFFinder.

Each tab denotes a different enzyme.

Supplementary Table S4

Supplementary Table 4. HBES library screening results. Each tab denotes a different base editor HBES screen.

Supplementary Table S5

Supplementary Table 5. MBES library screening results. Each tab denotes a different base editor MBES screen.

Supplementary Table S6

Supplementary Table 6. Sensor editing comparison to BE-Hive predictions

Supplementary Table S7

Supplementary Table 7. Results from MBES library proliferation screen in PDECs.

Supplementary Table S8

Supplementary Table 8. Expanded AMINEsearch parameters, outputs, and HBESv2/MBESv2 libraries.

Supplementary Table S9

Supplementary Table 9. Primers and oligos used in this study.

Supplementary Table S10

Supplementary Table 10. Source data and p-values.

1760571_Sup_Info

Data Availability Statement

All source data (including p-values) are available in Supplementary Table 10. Processed screening data is available in Supplementary Tables 1, 4 & 5 and primary sequencing data is available at the Sequence Read Archive (SRA) under accession PRJNA746395.

RESOURCES