Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2025 Jul 11.
Published in final edited form as: Mol Cell. 2024 Jun 24;84(13):2553–2572.e19. doi: 10.1016/j.molcel.2024.05.024

Genome-Scale Exon Perturbation Screens Uncover Exons Critical for Cell Fitness

Mei-Sheng Xiao 1,10, Arun Prasath Damodaran 1,10,*, Bandana Kumari 1, Ethan Dickson 1, Kun Xing 1, Tyler On 2, Nikhil Parab 1, Helen E King 3, Alexendar R Perez 4,5, Wilfried M Guiblet 1, Gerard Duncan 6, Anney Che 7, Raj Chari 8, Thorkell Andresson 6, Joana A Vidigal 4, Robert J Weatheritt 3,9, Michael Aregger 2,*, Thomas Gonatopoulos-Pournatzis 1,11,*
PMCID: PMC11246229  NIHMSID: NIHMS2005373  PMID: 38917794

SUMMARY

CRISPR-Cas technology has transformed functional genomics, yet understanding how individual exons differentially shape cellular phenotypes remains limited. Here, we optimized and conducted massively parallel exon deletion and splice site mutation screens in human cell lines to identify exons that regulate cellular fitness. Fitness-promoting exons are prevalent in essential and highly expressed genes, and commonly overlap with protein domains and interaction interfaces. Conversely, fitness-suppressing exons are enriched in non-essential genes, exhibiting lower inclusion levels, and overlap with intrinsically disordered regions and disease-associated mutations. In-depth mechanistic investigation of the screen hit TAF5 alternative exon-8 revealed that its inclusion is required for assembly of the TFIID general transcription initiation complex, thereby regulating global gene expression output. Collectively, our orthogonal exon perturbation screens established a comprehensive repository of phenotypically important exons and uncovered regulatory mechanisms governing cellular fitness and gene expression.

Graphical Abstract

graphic file with name nihms-2005373-f0001.jpg

eTOC Blurb

Xiao, Damodaran, et al., optimized and implemented orthogonal large-scale exon-deletion and splice site mutation screens, uncovering numerous exons in human cells that either promote or suppress fitness, each exhibiting unique features. The data generated from this study guided mechanistic inquiries, elucidating how alternative splicing of TAF5 exon-8 controls transcription outputs.

INTRODUCTION

The majority of eukaryotic genes are organized as non-contiguous units of exons, which contain regulatory and protein-coding regions, and introns, that are spliced out from the pre-mRNA as exons are joined together. Throughout evolution, the frequency of alternative pre-mRNA processing increased, with almost all human multi-exon genes undergoing alternative splicing15. Alternative cassette exons, which are either included or skipped from mature mRNAs, are the most common form of alternative splicing in metazoans6, yet, at a genome-scale, their regulatory impacts on cellular functions are not well understood5,7. Generally, alternatively spliced exons that alter the open reading frame affect transcript stability, thus reducing overall protein levels, whereas frame-preserving exons that overlap coding sequences are translated and contribute to proteomic diversity8,9. The functional consequences of such splice isoforms are difficult to predict, and whether they represent stochastic differences with little biological repercussions has been debated10,11. Accordingly, identification of the full repertoire of alternative splicing events contributing to phenotypic outcomes represents a major goal5,1214. Such knowledge could help prioritize exons for focused studies and yield insights into diseases characterized by widespread splicing alterations1518.

The advent of CRISPR-Cas and corresponding advances in genome engineering technologies1922 have been applied in high-throughput screening, identifying genes that function in various biological processes and impact phenotypes of interest at a genome-scale2326. Genome-wide loss-of-function screens across various human cell lines have defined core and context-dependent essential genes required for cell viability and proliferation, highlighting molecular processes and pathways underlying cell fitness2731. Although CRISPR-Cas technology has revolutionized genetics, almost all studies have focused on the gene level, whereas the functional complexity and variability introduced through alternative pre-mRNA processing has been largely overlooked. Consequently, elucidating the biological role of individual exons stands to elevate our understanding of genome regulation and may inform development of therapeutic interventions.

In past work, we optimized tools for conducting high-throughput gene segment deletions32. Specifically, we innovated a genetic perturbation platform, dubbed CHyMErA, in which Cas9 and Cas12a nucleases are co-expressed alongside libraries of hybrid guide (hg)RNAs, engineered by the fusion of Cas9 and Cas12a single guide (sg)RNAs, transcribed from a single promoter3234. The hgRNAs can be processed into individual sgRNAs through the intrinsic RNA-processing activity of Cas12a35,36, enabling both nucleases to conduct combinatorial genome editing. We have successfully applied CHyMErA to conduct exon deletions by strategically targeting flanking intronic sites with programmable DNA cuts (Figure 1A). In comparative analyses, CHyMErA outperforms combinatorial systems relying on two gRNA promoters32, which have also been utilized for gene segment deletions37,38. However, our previous CHyMErA screens were associated with high false negative rates, highlighting the need for optimization to enable their broader application in large-scale exon deletion screens.

Figure 1: Generation of an enhanced exon deletion screening platform.

Figure 1:

(A) Schematic of the pooled cell fitness genetic screens. The DNA oligo library was cloned into pLCHKOv3, a modified version of the lentiviral pLCHKO vector, to generate exon-deletion screening libraries containing either the direct repeat (DR) sequence compatible for Lb or AsCas12a nucleases from a single oligo library pool. High-titer lentiviral stocks were transduced into Cas9/Cas12a expressing HAP1 and RPE1 cells at a low multiplicity of infection (MOI). Uninfected cells were removed through puromycin selection, and the remaining population was cultured for ~ 20 cell doublings, after which genomic DNA was extracted and the PCR-retrieved hgRNA cassette abundance was quantitated with Illumina paired-end sequencing.

(B) Schematic of Cas9 and Cas12a lentiviral constructs engineered in this study. Specific point mutations are indicated. NLS: Nuclear Localization Signal; eIFα: human elongation factor-1 alpha promoter; NP: Nucleoplasmin NLS; SV40: SV40 NLS; T2A: 2A self-cleaving peptide; NeoR: neomycin/geneticin resistance gene; BlastR: blasticidin resistance gene.

(C) Western blot analysis of Cas12a (using anti-Myc tag antibody) and Cas9 expression in total cell extracts from stably transduced HAP1 and RPE1 cells with GAPDH used as a loading control.

(D) Schematic of the optimization hgRNA library design. hgRNA constructs were designed to delete or mutate frame-disruptive exons in core-essential and non-essential genes by targeting flanking intronic (top) or exonic sequences (bottom). Cas9 and Cas12a spacers are displayed as blue and orange triangles and total numbers of hgRNA pairs in different editing categories are indicated.

(E-F) Receiver operating characteristic (ROC) curves of CHyMErA cell fitness screens using different Cas12a variants as described in (A). (E) Cas12a gene knockout (KO) paired with Cas9 intergenic hgRNAs single-targeting core-essential (true positive rates) and non-essential (false positive rates) genes are depicted. (F) Exon deletion hgRNAs targeting frame-disruptive exons in core-essential (true positive rates) and non-essential (false positive rates) genes are displayed. Dashed lines indicate random classifier. Area under the curve (AUC) values are listed for CHyMErA variants screened in HAP1 and RPE1 cells.

(G) Flow cytometry analysis of CD46 protein expression in HAP1 (top) and RPE1 (bottom) CHyMErA variant cell lines following transduction with three independent hgRNA pairs targeting the frame-disruptive CD46 exon-3 for deletion. Values display cell percentage with undetectable CD46 expression.

(H) Representative PCR assay monitoring CD46 exon-3 deletion efficiency from genomic DNA using different CHyMErA variants in HAP1 (top) and RPE1 (bottom) cells (see also Figure S2G). Bar plots indicate percentage of CD46 exon-3 deletion across three independent hgRNAs.

(G-H) All data are represented as mean ± standard deviation. * p < 0.05, ** p < 0.01; two-tailed paired t test.

Here, by screening six Cas12a variants, we engineered a CHyMErA screening platform with substantially improved exon-deletion efficiency. When applied to interrogate > 12,000 exons in human cell lines, we uncovered > 2,000 frame-preserving exons affecting cell fitness and proliferation, providing a valuable resource of phenotypically relevant protein segments. Overall, these exons overlap protein domains and interaction interfaces. Mechanistic analysis of the essential alternative exon-8 in TAF5 uncovered a regulatory pathway controlling TFIID complex assembly thereby impacting TATA-binding protein (TBP) and RNA polymerase II (RNA pol II) recruitment to transcription start sites.

RESULTS

Generating an ultra-efficient gene segment deletion screening platform

To improve the gene segment deletion efficiency of CHyMErA, we decided to focus on optimizing Cas12a activity, given that its efficiency is lower than that of Streptococcus pyogenes (Sp)Cas932,3941. We cloned and stably expressed variants of Lachnospiraceae bacterium (Lb) and Acidaminococcus sp. (As) Cas12a orthologues36,42 so that we could compare their activity side-by-side in human HAP1 and RPE1 cells expressing SpCas9. Specifically, we tested LbCas12a32,33, enhanced AsCas12a43 (enCas12a), optimized AsCas12a44 (opCas12a), UltraAsCas12a (Cas12a-Ultra)45, and a Cas12a variant combining the mutations of opCas12a and Ultra-AsCas12a (opCas12a-Ultra). The two latter variants were modified with six c-Myc NLS (6×NLS) similar to the opCas12a44. In parallel, we engineered LbCas12a with 8×NLS (Figure 1B). We cloned all constructs into the same lentiviral backbone, conducted transductions at low multiplicity of infection (MOI), and confirmed expression of the Cas12a variants by western blotting and immunofluorescence analysis in HAP1 and RPE1 cell lines (Figures 1C and S1AB). As expected, the addition of 6×NLS enhances nuclear accumulation of both Lb and AsCas12a variants (Figures S1AB) and results in increased LbCas12a expression in both cell lines (Figures 1C and S1AB).

To systematically compare editing efficacy across Cas12a nucleases, we designed an optimization library comprising 18,000 hgRNAs targeting 482 essential and 362 non-essential genes46 through either gene knockout (KO) or exon deletion (Figure 1D; Table S1). The latter category includes 3,236 hgRNAs designed to delete frame-disruptive exons in essential and non-essential genes (Figure 1D), likely resulting in loss of gene function due to switching the reading frame. To achieve exon deletion, we designed Cas9-Cas12a gRNA pairs to recruit the two nucleases to intronic sequences flanking the targeted exons at least 75 nucleotides away from the splice sites (to avoid introducing mutations to coding sequences and/or splicing cis regulatory elements). To control for potential unpredicted single-cutting effects, we randomly paired each gRNA used for exon deletion with an intergenic control gRNA creating 4,032 intronic-intergenic hgRNA pairs. The library also contains 9,129 hgRNAs eliciting conventional gene KO via indel formation in coding sequences (Figure 1D). The latter group consists of 6,086 single-targeting (i.e., Cas9 or Cas12a gRNA targeting coding sequence paired with a Cas12a or Cas9 intergenic gRNA control, respectively) and 3,043 dual-targeting guide pairs (i.e., both the Cas9 and the Cas12a gRNAs are targeting coding sequence of the same gene). Furthermore, the library contains 1,503 and 100 paired intergenic and non-coding hgRNA controls (Table S1; STAR Methods). The library was cloned into lentiviral vectors and screened in HAP1 and RPE1 cells expressing Cas9 and Cas12a as previously described33 (Figure 1A; STAR Methods). Calculation of the log2-fold change (LFC) distribution of gene KO hgRNAs reveals a strong separation of hgRNAs targeting essential and non-essential genes in all screens irrespective of the employed nuclease (Figure S2A and Table S1; p-value < 7.9e−59; Wilcoxon rank sum).

To compare the performance of each Cas12a nuclease, we calculated the area under the curve (AUC) of the receiver operator characteristic (ROC) using hgRNAs targeting reference core-essential and non-essential genes46. These curves plot the true positive against the false positive rate, providing a comprehensive measurement of the screen’s sensitivity and specificity. We first analyzed single-targeting hgRNAs where the Cas9 guide targets intergenic regions and the Cas12a guide is directed towards coding sequences (Figure 1E). In both RPE1 and HAP1 cells, consistent with previous work44, the addition of the 6×NLSs improved Cas12a activity (for both Lb and As variants), with opCas12a outperforming other Cas12a variants (Figure 1E and S2A). In contrast, when Cas9 guides target coding sequences and Cas12a guides target intergenic regions, the respective ROC curves are nearly identical (Figure S2B). Encouragingly, the magnitude of opCas12a editing efficiency is more similar to that of Cas9 and significantly enhanced compared to our original LbCas12a nuclease (Figure S2A; p-value < 2.66e−74, Wilcoxon rank sum). For dual targeting KO guides, the use of different Cas12a variants did not further improve the ROC AUC scores (Figure S2C). However, we observe significantly enhanced average guide depletion when Cas12a nucleases with 6×NLS are used (Figure S2D; p-value < 6.05e−18; Wilcoxon rank sum), consistent with improved editing efficiency of Cas12a nucleases containing strong NLSs.

Next, to evaluate exon deletion efficiency, we plotted ROC-AUC curves for the Cas12a variants by defining hgRNAs that elicit excision of frame-disruptive exons in core-essential and non-essential genes as true and false positive hits, respectively. Consistent with the gene KO analysis (Figure 1E), Lb and AsCas12a nucleases with the 6×NLS achieve superior discrimination between true positive and true negative cell fitness exons (Figure 1F). Furthermore, enCas12 performs marginally better than the original LbCas12a, while opCas12a outperforms all other tested Cas12a variants in both HAP1 and RPE1 cells (Figure 1F). Reassuringly, when generating the same plots using ‘single-targeting’ intronic-intergenic hgRNA controls, the curves are close to identical with a random classification (Figure S2E). These data indicate that the observed cell fitness phenotypes are mediated through exon deletions rather than by mutations introduced independently by each of the two intron-targeting guides. Furthermore, we detect marginally positive LFC values of the intronic-intergenic control hgRNAs, in contrast to the exon deletion-eliciting intronic-intronic hgRNAs that are depleted (Figure S2F; p-value < 7.6e−54; Wilcoxon rank sum). Collectively, our screening data indicate that editing efficiency and accuracy in human cells are most improved in opCas12a, substantially enhancing gene segment deletion activity in CHyMErA.

To validate the screen results, we designed hgRNA pairs targeting the frame-disruptive CD46 exon-3 for deletion. Flow cytometry and PCR analysis of CD46 protein expression and exon-3 deletion confirmed that opCas12a outperforms the other Cas12a variants in both HAP1 and RPE1 cells (Figures 1GH and S2G). This is likely due to a combination of enhanced expression, nuclear accumulation (Figures 1C and S1AB), and hgRNA processing activity (Figures S2HI). Furthermore, northern blot data suggest that Cas12a nucleases not only are required for hgRNA processing, but also enhance overall Cas12a gRNA expression, likely protecting it from exonucleolytic decay (Figure S2H). Taken together, through side-by-side comparisons and validation of reported Cas12a variants in human cells, we generated a combinatorial screening platform with enhanced genome editing and gene segment deletion capabilities.

Large-scale identification of exons that affect cell fitness

Despite the comprehensive identification of the human genes that affect cell fitness through genome-scale CRISPR and siRNA screens2730,4749, our understanding of individual exons responsible for such phenotypes is rather limited. To address this knowledge gap, we applied our enhanced gene-segment deletion screening platform to large-scale perturbations of exons in HAP1 and RPE1 cell lines (Figure 1A). We designed a 300,000 hgRNA library targeting 12,221 exons in 2,095 genes for deletion (Table S2). To determine screen efficiency, the library includes the hgRNAs used in our optimization screens (Figure 1D).

We followed a similar hgRNA design approach as used for the optimization library (STAR Methods), but here we focused on frame-preserving exons, whose deletion is expected to generate shorter protein variants. We designed guides to delete all targetable frame-preserving exons in core-essential46 and DepMap common essential genes (86,426 hgRNAs; 5,410 exons), and additional genes (39,806 hgRNAs; 2,412 exons). We also targeted 4,399 frame-disruptive exons in essential and additional genes to evaluate gene inactivation phenotypes (42,560 hgRNAs; 4,399 exons; 2,092 genes). In total, the library targets 12,122 exons, of which 8,484 exons are spliced constitutively and 3,510 exons are alternatively spliced (Figure 2A). On average, each exon is targeted by 13.8 hgRNAs and, each Cas9 and Cas12a gRNA is also paired with an intergenic gRNA control (resulting in a total of 168,792 intronic-intronic and 120,684 intronic-intergenic hgRNAs for exon deletion and controls, respectively).

Figure 2: Large-scale exon deletion screening in human cells.

Figure 2:

(A) Characteristics of the targeted exons and cognate genes in the 300,000 hgRNA exon deletion library.

(B) Visualization of exons with a fitness phenotype. All targeted frame-preserving exons were ranked by mean log2-fold change (LFC) of exon deletion hgRNAs in HAP1 and RPE1 cells. Hit exons are indicated in red (fitness-promoting) and blue (fitness-suppressing) while non-hits are shown in gray.

(C) Volcano plot of HAP1 (top panel) and RPE1 (bottom panel) exon deletion screening data analyzed by MAGeCK using either the “intronic-intronic” exon deletion guides (left panel) or the “intronic-intergenic” single-intronic targeting control hgRNAs (right panel). Significant hits (FDR < 5%) are highlighted in red (fitness-promoting) or blue (fitness-suppressing).

(D) Bar plot showing the fraction of off-target integration sites for Cas9 and Cas12a guides directed at intergenic (control) or intronic (exon-deletion) regions, determined by GUIDE-seq experiments using three independent hgRNAs for both intergenic controls and exon-deletion. All data are represented as mean ± standard deviation. Two-way paired t-test applied.

(E) Pie chart indicating the integration frequencies at on- and off-target genomic regions for intergenic (control) or intronic (exon-deletion) hgRNAs, determined by GUIDE-seq.

(F) Schematic overview of co-culture validation experiments.

(G) mClover3 (Green) to mCherry (Red) ratios 4 days after co-culture set-up (T4) normalized to ratios quantified 24 hours after plating (T1). Exons identified as hits or non-hits in HAP1 and RPE1 exon deletion screens are indicated. In turquoise are hgRNAs that resulted in significantly skewed Green/Red ratios (p < 0.05; two-way ANOVA) in hit exons and hgRNAs that did not result in significant skewed Green/Red ratios in non-hit exons. hgRNAs that induced significant growth changes but are targeting non-hits are labeled in blue (false negative). Non-validated hgRNAs targeting exon screen hits are labeled in gray (false positive). Each exon was validated with 2 or 3 independent hgRNAs as indicated by individual bars. All data are represented as mean ± standard deviation from two to four replicates.

The exon deletion library was cloned into our lentiviral vector and the uniformity of the guide distribution was confirmed (Figure S3A). We performed proliferation-based screens in HAP1 and RPE1 cells expressing Cas9 and opCas12a and quantified the relative abundance of hgRNAs over time (Figure 1A; Table S2). We confirmed agreement between experimental replicates and efficient nuclease activities (Figures S3BE). We then applied MAGeCK50 to identify frame-preserving exons that affect cell fitness using the intergenic hgRNA pairs as controls. By applying a 5% false discovery rate (FDR), we detected 1,621 and 948 frame-preserving exons whose deletion reduces cell fitness (referred to as fitness-promoting exons) in HAP1 and RPE1 cells, respectively. We also identified 129 and 42 exons whose excision results in improved cell fitness (referred to as fitness-suppressing exons) (Figures 2BC; Table S2; STAR Methods). Taken together, our screens revealed that 2,241 out of the 7,822 frame-preserving exons (28.6%) can affect cell fitness in HAP1 or RPE1 cells. Reassuringly, analysis of the single intronic hgRNA control population (i.e., intronic-intergenic hgRNA pairs) only results in a limited number of hits (Figure 2C). Between the two cell lines, there is a significant overlap of fitness-promoting but not -suppressing exons (Figures S3FG; p-value = 1.33e-121; odds ratio > 5.7; Fisher’s exact test). In summary, our systematic exon deletion screens uncover thousands of frame-preserving exons that impact HAP1 and/or RPE1 cellular fitness.

Our genetic screens represent pioneering efforts in conducting large-scale exon-level perturbations. To ensure the reliability of our findings, we conducted thorough validation of the screen results. Initially, we used GUIDE-seq to detect on- and off-target editing events by monitoring the genomic incorporation rates of a co-transfected double-stranded oligonucleotide51,52. Our GUIDE-seq experiments demonstrated negligible off-target editing associated with either nuclease (Figure 2D; STAR Methods) and comparable numbers of potential off-target sites for both intergenic and exon-deletion guides (Figures 2E and S3H; Table S3). Moreover, approximately 90% of identified off-target editing sites are situated in intergenic or non-coding regions, and none of the six profiled hgRNAs generate off-target editing in coding regions (Figure 2E). Collectively, these observations suggest that the phenotypes associated with exon deletion hgRNAs are unlikely to be the result of excessive off-target editing.

To further validate our data, we targeted 43 fitness-associated exons (Table S4) in co-culture experiments in HAP1 and RPE1 cell lines. Cells were individually transduced with exon-deletion hgRNAs coupled to mClover and intergenic hgRNAs coupled to mCherry expression, respectively and co-cultures were established. The ratio of cells expressing mClover/mCherry was quantified over the course of four days to identify hgRNAs that affect cell fitness (Figure 2F; STAR Methods). Overall, 70% of hgRNAs tested across HAP1 and RPE1 cells generated phenotypes corresponding to our screen results (Figure 2G; p-value < 0.05; two-way ANOVA). Among the remaining 30%, we identified 15% false negatives and 15% false positives. Despite conducting the co-culture competition experiments over a shorter duration compared to our screens, this methodology validated 71/86 (83%) tested exons in HAP1 and RPE1 cells with at least one hgRNA, supporting the reliability of our exon deletion screening outcomes.

Orthogonal exon perturbation screens using base editors cross-validate the exon deletion data

To further validate our exon deletion results and overcome concerns due to CRISPR-Cas mediated double-stranded breaks (DSBs) that are associated with uncontrollable repair outcomes and cytotoxicity53,54, we employed an orthogonal exon perturbation approach. Cytosine and adenine base editors effectively induce point mutations at gRNA-directed genomic sites5557, and can be programmed to engineer splice site mutations that induce efficient exon skipping5861. To this end, we generated lentiviral vectors expressing four cytosine (APOBEC162, APOBEC3A63, BEACON264 - a.k.a. APOBEC3AW98Y/W104A/Y130F -, evoCDA165) and two adenine (ABE8e66, ABE8.20-m67) deaminases linked to the N-terminus of nickase (n)SpCas9 modified with 6×NLS (Figure 3A; STAR Methods) under a doxycycline (dox)-inducible promoter to minimize cytotoxicity. In the cytosine base editors, we also incorporated the Rad51 ssDNA binding domain to stimulate activity63.

Figure 3: Applying orthogonal base editor screens to induce exon skipping through splice site mutations.

Figure 3:

(A) Schematic depiction of the base editors cloned in this study. BE: base editor; ABE: adenine base editor; CBE: cytosine base editor; TRE: tetracycline response element; NLS: nuclear localization signal; ssDBD: non-sequence specific single-stranded DNA binding domain from RAD51; UGI: uracil DNA glycosylase inhibitor; eIFα: human elongation factor-1 alpha promoter; T2A: 2A self-cleaving peptide; BlastR: blasticidin resistance gene; rtTA: reverse tetracycline-controlled transactivator.

(B) Western blot analysis of inducible Cas9 base editors (using Cas9 antibody) in stably transduced HAP1 cell lines. Cells were treated with 2 μg/mL doxycycline for 48 hours. GAPDH is used as a loading control.

(C-E) Base editors are programmed to induce 5’ splice site mutations and exon skipping of TAF5 exon-8 and SNAPC5 exon-2. (C) Schematic representation of base editor recruitment to 5’ (GT) and 3’ (AG) splice sites with targetable A & C bases indicated in red. (D-E) HAP1 cells expressing adenine or cytosine Cas9 base editors (see A) were transduced with three independent sgRNAs targeting the 5’ splice site of either TAF5 or SNAPC5 exons or an intergenic control sgRNA. Twenty-four hours after transduction cells were treated for 72 hours with 2 μg/mL puromycin and doxycycline to select successfully transduced cells and induce base editor expression. (D) Percentage of splice site mutation rates as determined by high-throughput sequencing of TAF5 and SNAPC5 amplicons. Error bars indicate standard deviation. (E) RT-PCR assays monitoring endogenous splicing of TAF5 exon-8 and SNAPC5 exon-2 (top) using gRNAs targeting their splice sites. Percent Spliced In (PSI) values are indicated. The bar plots (bottom) summarize the ΔPSI values of the three independent tested sgRNAs. All data are represented as mean ± standard deviation. * p < 0.05, ** p < 0.01; two-way ANOVA.

(F) Characteristics of the guides, exons, and cognate genes targeted by the 27,871 sgRNA base editor library for the large-scale mutation of splice sites. The optimization (top) and exon-deletion validation (bottom) sections of the library are analyzed separately. NT: non-targeting guides.

(G) ROC curve of cell fitness screens using different base editors as described in (A). sgRNAs directing the base editors to mutate splice sites of frame-disruptive exons in core-essential (true positive rates) and non-essential (false positive rates) genes are displayed. Dotted lines indicate random classifier. AUC values are listed for each base editor.

(H) Spearman’s correlation coefficient of log2-fold change (LFC) exon drop-out between the CHyMErA and base editor screening in HAP1 cells. Only shared exons among CHyMErA, ABE8e, and evoCDA1 screens are displayed (n = 3,786). Shared hits from either ABE8e or evoCDA1 are labeled in red and LFC values of ABE8e are used for the scatter plot. Spearman’s correlation index (r) and p-value are indicated.

(I) Overlap of the cell fitness affecting exons as determined by exon deletion (CHyMErA) or splice site mutation (base editors) screening in HAP1 cells. Only frame-preserving exons targeted by both CHyMErA and base editors (either ABE8e or evoCDA1) screens are compared. p-value = 2.05e−28; odds ratio = 3.02; Fisher’s exact test.

Next, we stably transduced the base editors into HAP1 cells and confirmed inducible expression by western blotting (Figure 3B). As expected, cell transduction with sgRNAs targeting the 5’ splice site of alternative exons in TAF5 and SNAPC5 results in splice site mutations (Figures 3CD and S4A). Although efficiency varies across the tested base editors, adenine base editors displayed the best performance. Importantly, splice site mutation efficiency is coupled to stimulation of exon skipping of the corresponding alternative exons (Figures 3E and S4B), consistent with previous studies5861.

To further compare base editing efficiency, we designed a ~ 28,000 sgRNA library and used cytosine or adenine base editors to mutate either 5’ or 3’ splice sites (Figure 3C; Table S5). The library contains an “optimization” set of 10,725 sgRNAs designed to mutate 4,765 frame-disruptive exons in 575 essential (7,536 sgRNAs) and 467 non-essential (3,189 sgRNAs) reference genes. In addition, we included 12,210 sgRNAs targeting splice sites in 5,321 frame-preserving exons from 1,395 genes that were targeted with our large-scale exon deletion library (Figure 3F). Overall, the library contains 27,871 sgRNAs that can inactivate splice sites via adenine (69%) or cytosine (63%) base editors with 34% of sgRNAs being shared between base editor classes (Table S5). In total, the base editor library targets 12,075 exons with an average of 2.2 sgRNAs per exon and a broad editing window of bases −3 to 12 relative to the sgRNA position.

We screened the sgRNA library in HAP1 cell lines expressing APOBEC1, BEACON2, evoCDA1, or ABE8e Cas9 base editors, and computed LFC values to measure sgRNA drop-out (Table S5). The adenine base editor (ABE8e) outperforms all tested cytosine base editors in discriminating true positive and negative exons (Figure 3G). Among the cytosine base editors, the evoCDA1 performed substantially better than APOBEC1 and BEACON2. Thus, ABE8e and evoCDA1 base editor screens were used for further analysis.

Unlike exon deletion screens, base editor screens show no significant difference between intergenic and non-targeting gRNAs, indicating mitigation of DSB-induced toxicity (Figure S4C). We also observe that disrupting the 5’ splice site is more effective for exon skipping than the 3’ site (Figure S4D), confirming previous studies60. Consistent with base editing of splice sites resulting in loss-of-function phenotypes through triggering of exon skipping, we observe that targeting frame-disruptive exons located at the 5’ end of the gene is more effective than at the 3’ end (Figure S4E). Additionally, the ideal editing windows for ABE8e and evoCDA1 are sgRNA positions 2 to 11 and −2 to 11, respectively (Figures S4FG). ABE8e and evoCDA1 show a modest positive LFC correlation (Figure S4H; r = 0.26; p = 3.73e−98) and significant overlap of exons affecting cell fitness (Figure S4I; p = 1.08e−31; odds ratio = 2.2), indicating that despite ABE8e’s higher efficiency, evoCDA1 can still identify relevant exons. Collectively, high-throughput screening with cytosine and adenine base editors reveals effective editors for splice site mutations and provides principles for future gRNA selection.

Finally, we analyzed the impact of mutating the 5’ or 3’ splice sites of frame-preserving exons on HAP1 cell fitness by focusing on sgRNAs targeting within the ideal editing window enhancing screen performance (Figure S4J). Out of the 4,699 frame-preserving exons targeted by either ABE8e or evoCDA1, we identified 1,621 exons (34%) whose skipping results in cell fitness defects (Table S5; STAR Methods). The exon-deletion and splice site mutation strategies display a modest, but significant, positive correlation between observed phenotypes (Figure 3H; r = 0.37; p-value = 1.47e−125; Spearman’s correlation coefficient), despite using different exon perturbation strategies and gRNA libraries. Importantly, the frame-preserving exons affecting cell fitness overlapped significantly across the deletion and base editor screens (Figure 3I; p-value = 2.05e−28; odds ratio = 3.02; Fisher’s exact test). Taken together, through our focused and orthogonal screening validation efforts, we have confirmed the reliability of our experimental strategies and data.

Underlying features of exons impacting cell fitness

We next sought to identify underlying characteristics of the fitness-promoting and -suppressing exons that impact cell fitness as determined by the large-scale exon deletion screens. Consistent with conventional loss-of-function CRISPR screens28,29, fitness-promoting exons reside in genes associated with RNA metabolism, protein synthesis and gene expression (Figure S5A), and are strongly enriched in common essential and highly expressed genes, in contrast to fitness-suppressing exons (Figures 4AB and S5B). Overall, our screens reveal that deletion of ~ 20% of constitutive and 13% of alternative exons affect cell fitness (Figure 4C). Fitness-suppressing exons are more frequently alternative than constitutive (Figure 4C), and have lower inclusion levels relative to fitness-promoting alternative exons (Figure 4D; p-value = 7.84e−9; Wilcoxon rank sum).

Figure 4: Fitness-promoting and -suppressing frame-preserving exons exhibit different characteristics.

Figure 4:

(A-B) Cumulative distribution function plots of fitness-promoting (red), fitness-suppressing (blue), or neutral (gray) exons in relation to (A) essentiality of corresponding genes as determined by the log2-fold change of KO hgRNAs and (B) expression of corresponding genes. p-values are indicated in the figure; Wilcoxon rank sum tests.

(C) Bar plot displaying the fraction of fitness-promoting (red) and -suppressing (blue) exons among investigated alternative and constitutive exons. p-value = 1.03e−31 (fitness-promoting) and p-value = 0.03 (fitness-suppressing); Fisher’s exact test.

(D) Cumulative distribution function plot of fitness-promoting (red), fitness-suppressing (blue) or neutral (gray) exons in relation to exon inclusion levels. p-values are indicated in the figure; Wilcoxon rank sum tests.

(E) ROC curve of random forest model prediction for classifying fitness-promoting, frame-preserving exons. Dashed lines indicate random classifier.

(F) Feature contribution to random forest model.

(G) Bar plot displaying the density (number of events normalized to total exons length) of ClinVar mutations in fitness-promoting, fitness-suppressing, or neutral (non-hits) exons. **** p-value < 0.0001; Fisher’s exact test.

(H) Box plot displaying the disordered prediction scores (IUPred) in fitness-promoting, fitness-suppressing, and neutral exons (non-hits). Boxes show interquartile range (IQR), 25th to 75th percentile, with the median indicated by a horizontal line. Whiskers extend to the quartile ± 1.5 × IQR. * p-value < 0.05, **** p-value < 0.0001; Wilcoxon rank sum test applied.

(I-K) Bar plots displaying the density (number of events normalized to total exons length) of low complexity regions (I), pfam protein domains (J), or reported protein interaction interfaces (3did; K) in fitness-promoting, fitness-suppressing, and neutral (non-hits) exons. ** p-value < 0.01, **** p-value < 0.0001; Proportion (I, J) and Fisher’s exact (K) tests are applied.

(A-K) All plots display combined data of frame-preserving exons from HAP1 and RPE1 cell lines.

To explore features of frame-preserving exons that affect cell fitness, we trained a random forest machine learning classifier to predict impact of different characteristics, including expression of the host gene, exon inclusion levels (i.e., percent spliced in values - PSI), GC content of the exon, splice site strength as well as conservation, and disordered prediction of the encoded polypeptide scores. We also included pathological characteristics such as overlap with disease-related mutations from the TCGA and clinVar databases among other features (STAR Methods; Table S6). Given the low number of fitness-suppressing exons, we performed this analysis exclusively on fitness-promoting exons. The random forest model performed well (Figure 4E; AUC = 0.72), with gene expression being among the most important features for predicting exon essentiality, consistent with our previous observations. In addition, disordered prediction scores, GC content, exon conservation, splice site, and exon size contribute towards the predicting whether exons were essential (Figure 4F). When we trained a random forest classifier model to exclusively predict alternative fitness-promoting exons, predictive performance was similar (Figures S5CD; AUC = 0.73), with PSI being the most important feature.

Focused enrichment analysis of the experimental screening results confirmed that fitness-promoting exons tend to have higher GC content and amino acid conservation than fitness-suppressing exons (Figures S5EF). We also noted an evolutionary conservation within the nucleotide coding sequences of both fitness-promoting and -suppressing alternative exons when juxtaposed with non-hit counterparts (Figure S5G). Conversely, the fitness-promoting alternative exons exhibit reduced conservation within their flanking intronic sequences, aligning with the attributes of typically being highly included exons and thus under less regulatory control (Figure 4D). Consistently, the fitness-promoting exons have stronger 3’ splice site scores (Figure S5H). Furthermore, we observed a modest depletion of post-translation modifications (PTMs) overlapping with fitness-suppressing exons (Figures S5IJ; p-value < 5.4e-6; Fisher’s exact test). Specifically, we found a significant overlap between cell fitness-promoting exons and residues reported to be ubiquitinated (Figures S5IJ; p < 0.0001; Fisher’s exact test). We attempted to mitigate potential biases due to mass spectrometry (MS) detection sensitivity of PTMs by analyzing PSI-matched control exons (Figures S5JK). However, we cannot rule out the possibility that observed enrichments are influenced by exon/gene expression levels.

Focused analyses unveiled a noteworthy increase in disease-associated mutation density (i.e., number of mutations normalized to total exons length) within fitness-suppressing exons, including from ClinVar, TCGA, and COSMIC (Figures 4G and S6A; STAR Methods). Conversely, our investigation revealed that ClinVar, TCGA, and MutationAligner mutations exhibit a modest yet statistically significant depletion in fitness-promoting exons. Overall, these findings align with the hypothesis that fitness-promoting (essential) exons display heightened susceptibility to mutations. To mitigate gene effects influencing our observations, we analyzed matched hit exons and non-hit exons from the same gene. Notably, a strong enrichment of disease-associated mutations in fitness-suppressing exons persists in this analysis (Figure S6B). These results collectively suggest that exon-resolution screens could be utilized to prioritize functional studies of disease-associated mutations.

The random forest model identified the intrinsically unstructured protein prediction score (IUPred) of exons as predictive for their fitness effects (Figure 4F). Consistently, fitness-promoting exons have significantly lower disordered prediction scores, while fitness-suppressing exons display increased IUPred scores (Figure 4H; p-value = 0.014; Wilcoxon rank sum). These observations agree with fitness-promoting exons being strongly depleted from low-complexity regions and enriched for overlapping protein domains (Figures 4IJ; p-value < 3.58e−14; Proportion test). Although these signatures are detected when either all or only alternative exons are analyzed (Figures S6CE), they are only observed for frame-preserving and not frame-disruptive exons (Figures S6FH), indicating that they indeed underlie exon-level phenotypes, which are lost when the cognate gene is inactivated (i.e., through deleting frame-disruptive exons).

Finally, we investigated the enriched protein domains in fitness-promoting exons. We observed several enriched domains implicated in nucleating protein assemblies such as the WD40 repeat and the PCI domains (Figure S6I). WD40 repeat domains adopt a ring shape serving as a binding platform for other proteins and the formation of multi-protein complexes68. Together, these raise the possibility that the fitness-affecting exons operate by mediating protein-protein interactions. Indeed, we detect a significant overlap of both fitness-promoting and -suppressing exons with interfaces reported to mediate protein interactions (Figure 4K; p-value < 7.53e−9; Fisher’s exact test). Consistent with previous findings showing that essential genes have more protein-protein interaction partners and are more centrally located in PPI networks69,70, our results reveal that frame-disruptive exons are enriched in overlapping protein interaction interfaces (Figure S6J). This enrichment is also strong for fitness-promoting alternative exons (Figure S6K; p-value = 2.78e−30; Fisher’s exact test), confirming that a critical role of alternative splicing is to control protein-protein interaction networks7173.

TAF5 alternative exon-8 is required for TFIID complex assembly

To further explore our fitness exon dataset, we studied the functional role of the alternative exon-8 in TAF5, an exon that overlaps with a WD40 repeat domain and is significantly enriched in cell fitness-promoting exons (Figure 5A, Figure S6I). TAF5 is a component of the TFIID general transcription initiation factor, which contains the TATA-binding protein (TBP) and is crucial for pre-initiation complex formation and RNA polymerase II (RNA pol II) recruitment to eukaryotic RNA pol II promoters7477. The inclusion rates of exon-8 in TAF5 transcripts range from 38 to 100% in humans (average 93%; Figure S7A) while PSI values across various human and mouse samples indicate a lack of apparent tissue- or developmental stage-specificity, with both isoforms frequently coexisting and being translated6 (Figures 5A and S7B). We validated the strong effect of exon-8 on cell fitness using focused exon deletion co-culture assays in both HAP1 and RPE1 cells (Figure 2G) and in HEK293T cells using base editors (Figures S7CD).

Figure 5: TAF5 exon-8 inclusion is critical for TFIID complex assembly.

Figure 5:

(A) Schematic representation of the two TAF5 transcript isoforms generated by exon-8 inclusion or skipping and corresponding Ribo-seq reads overlapping splice site junctions. TAF5 domain structure at the top left with AlphaFold predictions of full length (FL) and ΔE8 TAF5 isoforms are depicted on the right. The protein region encoded by exon-8 is highlighted in red in the FL isoform.

(B) Heatmap indicating the number of peptides corresponding to core TFIID subunits as detected through affinity-purification mass-spectrometry (AP-MS) analysis in HEK293 Flp-In cells expressing the indicated constructs as shown in Figure S7E.

(C) Protein-protein interaction network involving TAF5 (gray hexagon) splice isoforms detected by AP-MS. Node color indicates differential association of protein interactions between the two isoforms. Protein-protein interactions extracted by the STRING database (STAR Methods) were used to generate edges between the preys identified by our AP-MS experiment.

(D) Western blot analysis of total cell lysates (input) and FLAG immunoprecipitates (IP: FLAG-M2) from HEK293 Flp-In cells expressing 3xFLAG-tagged TAF5 cDNAs with exon-8 included (FL) or excluded (ΔE8) isoforms using anti-FLAG antibodies and antibodies specific for TAF5, TBP, TAF1, TAF6, TAF12, CCT2, and β-Tubulin as a loading control.

(E-F) Western blot analysis of total cell lysates (input) and TBP immunoprecipitates (IP: TBP) from HEK293 Flp-In cell lines stably expressing doxycycline-inducible siRNA-resistant 3xFLAG-tagged TAF5 cDNAs with exon-8 included (FL) or excluded (ΔE8), and treated with control siRNAs (siNT) or an siRNA that depletes endogenous TAF5 (siTAF5). IgG immunoprecipitation was performed as control. (E) Blots were probed with antibodies specific for TBP, FLAG, TAF5, TAF1, TAF6, TAF10, TAF12, and GAPDH as a loading control. (F) Quantifications of three independent TBP immunoprecipitation experiments. Data are represented as mean ± standard deviation. *** p-value < 0.001; two-way ANOVA.

To elucidate the functional significance of exon-8, we initially explored protein interaction networks associated with TAF5 isoforms. We engineered HEK293 cell lines to achieve stable, dox-inducible expression of FLAG-tagged full-length TAF5 (TAF5-FL; ENST00000369839) or the TAF5 isoform lacking exon-8 (TAF5-ΔE8; ENST00000692195) at equal levels (Figure S7E), and applied affinity purification mass spectrometry (AP-MS) as well as proximity biotin labeling coupled with streptavidin capture (TurboID-MS). As expected, the AP-MS data show that full-length TAF5 co-purifies with most TFIID components and other known interaction partners, such as chaperonins78 (Table S7). Strikingly, TAF5-ΔE8 completely fails to associate with any other TFIID component (Figure 5BC). These observations are consistent with the TurboID data (Figures 5BC and S7EG; Table S8). Immunoprecipitation experiments of FLAG-TAF5 isoforms followed by western blotting confirmed that exon-8 is critical for TAF5 association with other TFIID components, including TBP (Figure 5D). Further supporting the failure of TAF5 to associate with TFIID when exon-8 is skipped, we detect a marginally increased cytoplasmic signal of TAF5-ΔE8 compared to TAF5-FL (Figure S7H).

Given that TAF5 and its WD40 domain are known to nucleate intermolecular TFIID interactions75,77, we investigated if the inclusion of alternative exon-8 can control overall TFIID complex assembly. To assess the ability of TBP to associate with other TFIID components such as TAF1, TAF6, TAF10, and TAF12, we used RNA interference (RNAi) to deplete endogenous TAF5 protein while adding back siRNA-resistant TAF5-FL or TAF5-ΔE8. Depletion of TAF5 results in reduced protein expression of TAF1 and TAF6 (but not TAF10 and TAF12), an effect rescued by TAF5-FL but not TAF5-ΔE8, suggesting that interactions mediated by TAF5 stabilize selected TFIID subunits (Figures 5EF). Indeed, we find that depletion of endogenous TAF5 significantly reduces the association between TBP and TFIID components (i.e., TAF1, TAF6, TAF10, and TAF12) but that TAF5-FL can rescue the association. However, the TAF5-ΔE8 isoform not only fails to interact with TBP but is also unable to rescue TBP association with any other tested TFIID component (Figures 5EF), consistent with exon-8 being important for TAF5 interactions. Overall, these data suggest that the inclusion of TAF5 alternative exon-8 is essential for TFIID complex assembly.

TAF5 alternative exon-8 is regulated by SRSF1 and controls TFIID-dependent gene expression

To test if TAF5 exon-8 impacts gene expression, we performed RNA sequencing in HEK293 cells depleted from endogenous TAF5 by RNAi while expressing either TAF5-FL or TAF5-ΔE8 isoforms at endogenous levels (Figure 6A). As expected, depletion of TAF5 results in widespread gene expression changes that are rescued by TAF5-FL (Figure 6B; Table S9). However, the TAF5-ΔE8 isoform is unable to rescue most of the gene expression changes (492 out of the 683 (72%) genes). The genes regulated by TAF5 exon-8 are enriched in processes related to nucleosome assembly and chromatin organization (Figure S7I) and a selected number of genes were validated using quantitative RT-PCR assays (Figure S7J). Taken together, our biochemical and transcriptomic data suggest that TAF5 alternative exon-8 plays an essential role in TFIID complex assembly and activity.

Figure 6: TAF5 exon-8 inclusion is required for TAF5-dependent gene expression.

Figure 6:

(A-B) Western blot analysis (A) of TAF5 isoform expression in HEK293 Flp-In cell lines stably expressing doxycycline-inducible siRNA-resistant 3×FLAG-tagged TAF5 cDNAs with exon-8 included (FL) or excluded (ΔE8), and treated with control siRNAs (siNT) or an siRNA that depletes endogenous TAF5 (siTAF5). Blots were probed with antibodies specific for TBP, FLAG, TAF5, and GAPDH as a loading control. (B) RNA-seq profiled gene expression changes (Z-score normalized) upon the same treatments. Only genes with significant expression changes upon siTAF5 that are rescued by TAF5-FL are displayed.

(C-D) Metagene analysis of TBP (C) and RNA polymerase II (D) occupancy normalized to input around transcription start sites (TSS) of all expressed, TAF5-suppressed (upregulated upon siTAF5/TAF5-ΔE8 rescue) and TAF5-promoted (downregulated) genes as determined by ChIP-sequencing of HEK293 cells treated as above.

(E) siRNA screen monitoring the impact of 60 splicing regulators on TAF5 exon-8 alternative splicing. All data are represented as mean ± standard deviation from four biological replicates. **** p < 0.0001; two-way ANOVA.

(F) RT-PCR assays monitoring splicing of endogenous TAF5 exon-8 in HAP1 (top) and HEK293 (bottom) cells transfected with three independent siRNAs and an siRNA pool against SRSF1. PSI values are indicated.

(G) RT-PCR assays monitoring splicing of wild-type (WT) and SRSF1-motif mutant minigene reporters of TAF5 exon-8. HEK293T cells treated with three independent siRNAs and an siRNA pool against SRSF1 or non-targeting (NT) siRNA control were transfected with reporters 24 hrs prior to harvesting RNA.

To understand how TAF5 exon-8 impacts gene expression, we performed TBP and RNA pol II chromatin immunoprecipitation coupled to high-throughput sequencing (ChIP-seq). TAF5 depletion significantly reduces TBP occupancy at the transcription start site of expressed RNA pol II genes (Figure 6C and S8A; p-value < 2.2e−16; Wilcoxon rank sum). The TBP occupancy is rescued to wild-type levels upon expression of TAF5-FL but not TAF5-ΔE8. Importantly, the impaired TBP occupancy at transcription start sites (TSS) in the absence of TAF5 exon-8 correlates with reduced RNA pol II recruitment (Figure 6D; p-value < 2.2e−16; Wilcoxon rank sum). TAF5-FL fully rescues the RNA pol II occupancy, while TAF5-ΔE8 displays a significantly lower recruitment compared to control-treated cells (Figure 6D; p-value < 2.2e−16; Wilcoxon rank sum). These findings highlight that TAF5 exon-8 has a crucial role in recruiting TFIID and facilitating the formation of pre-initiation complexes throughout the RNA polymerase II-transcribed genome.

We next asked if genes affected by TAF5 exon-8 show distinct patterns of TBP and RNA pol II occupancy. Indeed, TFIID-promoted genes (i.e., reduced expression upon TAF5-ΔE8 rescue) display substantially lower TBP and RNA pol II promoter occupancy compared to all expressed genes (Figures 6CD; right panels). This is in contrast with genes that appear to have increased expression following TAF5 exon-8 depletion (i.e., TFIID-repressed genes), that display higher occupancy of both TBP and RNA pol II (Figures 6CD; middle panels). These data suggest that TAF5 alternative exon-8 is preferentially required for the expression of RNA pol II genes that normally have limited TBP and RNA pol II at their promoters. Conversely, genes with promoters strongly occupied by TBP and RNA pol II are affected less by TFIID disruption and subsequently appear as upregulated.

To comprehensively understand the regulatory mechanisms governing TAF5 exon-8, we performed an siRNA screen targeting 60 alternative splicing regulators and monitored splicing levels of endogenous TAF5 exon-8 (Figure 6E). The screen identified SRSF1 as the sole significant hit, a finding further validated using siRNAs in both HEK293 and HAP1 cells (Figure 6F), which was rescued by reintroduction of SRSF1 cDNA (Figure S8B). Notably, the SRSF1 binding motif UCAGAGGA79 is a perfect match within TAF5 exon-8. Minigenes designed to monitor exon-8 splicing dynamics confirmed the pivotal role of SRSF1 in TAF5 exon-8 splicing, but also the importance of the SRSF1 motif, as exon-8 inclusion was diminished in its absence (Figures 6G and S8C). Interestingly, we also observe that genotoxic stress results in reduced inclusion of TAF5 exon-8 (Figures S8DE). Taken together, our data suggest that TAF5 exon-8 splicing, regulated by SRSF1, controls TFIID assembly and impacts global gene expression outputs (Figure 7A).

Figure 7: Summary of characteristics of fitness-promoting and -suppressing exons.

Figure 7:

(A) Graphical summary of TAF5 exon-8 dependent TFIID assembly and gene expression regulation.

(B) Table of distinct features of fitness-promoting and -suppressing exons.

DISCUSSION

Advances in genome engineering have enabled the high-throughput phenotypic interrogation of individual transcript isoforms32,37,80,81. Here, we present the most comprehensive exon-targeting screening effort conducted to date. Through massively parallel exon deletion and base editing screens, we have profiled 17,146 exons, of which 8,973 maintain the reading frame, to assess their impact on cell fitness. In the course of these screens, we interrogated thousands of protein truncation mutants expressed from their native genomic locations, establishing a valuable phenotypic dataset of human protein regions. Our systematic exon deletion screens have unveiled 2,071 and 170 frame-preserving exons that positively and negatively affect cell fitness, respectively. Notably, we observed distinct and often opposing characteristics between fitness-promoting and -suppressing exons (Figure 7B). However, fitness-promoting and -suppressing exons both overlap protein domains. Reassuringly, the base editing screening data revealed similar feature enrichment (Figures S9AH). The importance of protein domains for exon essentiality is consistent with studies applying conventional gene KO CRISPR screens82,83, and highlights the possible application of exon perturbation screening as a drug target discovery tool for phenotypically relevant druggable domains.

It is noteworthy that a considerable proportion of the predicted protein products resulting from constitutive or alternative exon deletions may not naturally manifest. To address this, we intersected the annotated alternative exons targeted by our library with proteomic and translatomic datasets8,8492, revealing that 55% (1,328 out of the 2,401) of cassette exons were associated with generation of alternative protein products (Figure S9I). Interestingly, our study has uncovered 498 frame-preserving alternative cassette exons that affect cell fitness in HAP1 and/or RPE1 cells. These represent ~ 21% of all targeted frame-preserving alternative exons, a fraction likely inflated by our library design that enriches for essential genes. After corrections, we estimate that ~ 6–8% of alternative exons might affect cell fitness in cell culture, a fraction that may increase upon profiling of more cell types and culture conditions.

From an evolutionary perspective, our data imply that alternative exons arising through “transition” (i.e., constitutive exons are converted to alternative) are more likely to be phenotypically important in humans cells than those arising through “exonization” events (i.e., non-coding/intronic sequences are converted to exons)5,93. This is consistent with the reduced conservation of intronic regulatory sequences in fitness-promoting alternative exons (Figure S5G) as well as a modest depletion of Alu elements, which have been implicated in exonization events (Figure S9J; p = 0.077; Fisher’s exact test). Our data further suggest that disruption of protein interactions mediated by alternative splicing could be a mechanism triggering consequential cell fitness phenotypes, consistent with previous findings7173. This relationship might be of particular importance in cancer cells given that oncogenic splicing programs alter protein domains, including WD40, and remodel protein-protein interactions that may impact tumorigenesis94,95.

As an example of the variety of insights that our dataset can initiate, we have highlighted a critical role of TAF5 alternative exon-8 for TFIID complex assembly and gene regulation. Our exon deletion screens have revealed an enrichment of fitness-promoting exons within gene expression pathways, suggesting that future inquiries will continue to unravel the intricate relationship between alternative splicing and gene expression96100. In summary, our study illustrates the power of applying exon-resolution functional genomics to uncover cell fitness phenotypes mediated by individual exons in the human genome. Coupled with advances in antisense therapies for splice isoform manipulation in clinical settings101,102, leveraging and further extending such screening strategies holds promise for uncovering novel therapeutic avenues.

Limitations of the study

Although our study provides a valuable resource of human exons that are critical for cell fitness, most of the identified hits comprise fitness-promoting exons with high-inclusion levels in common essential genes. Furthermore, it is crucial to note that our screening methodology does not comprehensively address the extent to which alternative splicing can generate protein products with distinct functional roles. This can be the focus of future studies employing the optimized tools described here to target tissue-specific exons within appropriate experimental systems and employing relevant phenotypic readouts.

The exon-deletion screening platform described here relies on inducing simultaneous DSBs at flanking intronic sites. This approach carries the risk of generating complex chromosomal rearrangements. To mitigate this risk, we optimized base editors to systematically disrupt exons and validated our hits from the exon deletion screen. Base editing also presents challenges such as potential for inducing intron retention or activating cryptic splice sites. While we did not detect induction of appreciable intron retention levels we cannot rule out occurrence of splice-site changes. Future application of emerging RNA-targeting CRISPR technologies hold promise to alleviate some of these concerns103,104.

Exon perturbation screens remain less sensitive than conventional gene loss-of-function genetic screens. Analysis of positive and negative control exons (i.e., frame-disruptive exons in essential and non-essential genes, respectively) reveal false negative rates of ~ 49% and 35% for exon deletion and base editor screens, respectively (Figures 1F and S4J; 10% FDR). Hence, there is scope for further refinement of exon perturbation screening platforms. Moving forward, a critical objective for exon-resolution functional genomics is to explore diverse cellular backgrounds and complex phenotypes beyond cell fitness.

STAR METHODS

RESOURCE AVAILABILITY

Lead contact

For additional details and inquiries regarding resources and reagents, please reach out to the lead contact, Thomas Gonatopoulos-Pournatzis (thomas.gonatopoulos@nih.gov).

Materials availability

All generated key plasmids and libraries have been deposited to Addgene (see Table S4). Any other reagent is available upon reasonable request.

Data and code availability

  • CRISPR screening, GUIDE-seq, RNA-seq, and ChIP-seq data generated by this study have been deposited at GEO and are publicly available. Accession numbers are listed in the key resources table. The TAF5 splice isoform mass spectrometry data associated with this study have been deposited to the ProteomeXchange consortium through partner MassIVE (massive.ucsd.edu). Microscopy data have been deposited to Mendeley. Accession numbers and DOI are listed in the key resources table.

  • This paper does not report original code.

  • Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.

REAGENT or RESOURCE SOURCE IDENTIFIER
Antibodies
TAF1 Cell Signaling Technology Cat#12781S; RRID:AB_2798025
TAF5 ThermoFisher Scientific Cat#MA3–076; RRID:AB_2633321
TAF6 ThermoFisher Scientific Cat#A301–276A-M; RRID:AB_2779789
TAF10 Sigma Cat#MABE1079; RRID:AB_10952566
TAF12 Proteintech Cat#12353–1-AP; RRID:AB_2271582
TBP Proteintech Cat#22006–1-AP; RRID:AB_10951514
CCT2 Proteintech Cat#24896–1-AP; RRID:AB_2879783
SpCas9 Diagenode Cat#C15200229; RRID:AB_2889848#C15310258; RRID:AB_2715516
FLAG M2 Sigma Cat#F3165; RRID:AB_259529
HA tag Sigma Cat#H3663; RRID:AB_262051
Myc tag Proteintech; Sigma-Aldrich Cat#60003–2-Ig; RRID:AB_2734122 Cat#M4439; RRID:AB_439694
GAPDH Proteintech Cat#10494–1-AP; RRID:AB_2263076
β-Tubulin Proteintech Cat#10094–1-AP; RRID:AB_2210695
CD46-BV421 BD Biosciences Cat#743776; RRID:AB_2741744
RNA polymerase II ThermoFisher Scientific Cat#A300653A; RRID:AB_519334
Rabbit IgG Proteintech Cat#30000–0-AP; AB_2819035
Anti-rabbit IgG, HRP-linked Antibody Cell Signaling Technology Cat#7074; RRID:AB_2099233
Anti-mouse IgG, HRP-linked Antibody Cell Signaling Technology Cat#7076; RRID:AB_330924
Goat anti-Rabbit IgG (H+L) Highly Cross-Adsorbed Secondary Antibody, Alexa Fluor Plus 488 ThermoFisher Scientific Cat#A32731; RRID:AB_2633280
Goat anti-Mouse IgG (H+L) Highly Cross-Adsorbed Secondary Antibody, Alexa Fluor Plus 647 ThermoFisher Scientific Cat#A32728; RRID:AB_2633277
Bacterial and virus strains
NEB Stable competent E. coli cells New England Biolabs Cat#C3040H
Endura electrocompetent cells LGC Biosearch Technologies Cat#60242–2
Chemicals, peptides, and recombinant proteins
TRIzol Sigma-Aldrich Cat#T3934
Paraformaldehyde ThermoFisher Scientific Cat#28908
Formaldehyde Pierce Cat#PI28908
Glycine Sigma-Aldrich Cat#G8898
Puromycin ThermoFisher Scientific Cat#A1113803
G418 Sulfate Gibco Cat#10131027
Blasticidin S Gibco Cat#A1113903
Accutase Sigma-Aldrich Cat#A6964
Hygromycin ThermoFisher Scientific Cat#10687010
Doxycycline Sigma Cat#D9891
D-sorbitol Sigma-Aldrich Cat#S1876
Tunicamycin Sigma-Aldrich Cat#SML1287
Hydrogen peroxide Sigma-Aldrich Cat#H1009
Mitoxantrone MedChemExpress Cat#HY-13502
Critical commercial assays
RNeasy Plus Mini Kit Qiagen Cat#74136
RNA loading dye ThermoFisher Scientific Cat#LC6876
10% TBE-Urea gel ThermoFisher Scientific Cat#EC68752BOX
Hybond-N+ membrane Amersham Cat#NV0796
ULTRAhyb-Oligo hybridization buffer ThermoFisher Scientific Cat#AM8663
North2South Chemiluminescent Hybridization and Detection reagents ThermoFisher Scientific Cat#17097
P3 Primary Cell 4D-Nucleofector X Kit S Lonza Cat#V4XP-3032
Wizard Genomic DNA Purification Kit Promega Cat#A1120
NEBNext Ultra II Q5 Master Mix New England Biolabs Cat#M0544X
QIAquick PCR Purification Kit Qiagen Cat#28104
GeneJET PCR purification column ThermoFisher Scientific Cat#K0701
GeneJET Gel Extraction Kit ThermoFisher Scientific Cat#K0692
KAPA HiFi HotStart DNA polymerase Roche Cat# KK2601
PrimeSTAR® Max DNA Polymerase Takara Bio Cat#R045B
SPRIselect beads Beckman Cat#B23318
D1000 ScreenTape Agilent Cat#5067–5582; Cat#5067–5583
High Sensitivity D1000 ScreenTape Agilent Cat#5067–5584
High Sensitivity D1000 Reagents Agilent Cat#5067–5585
Qubit dsDNA HS assay ThermoFisher Scientific Cat#Q32851
PhiX Illumina Cat#FC-110–3001
MiSeq Reagent Micro Kit v2 (300 cycles) Illumina Cat#MS-103–1002
NextSeq 2000 P2 Reagents (200 Cycles) v3 Illumina Cat#20046812
NovaSeq 6000 S1 platform (200 cycles kit) Illumina Cat#20028318
NovaSeq 6000 S1 platform (100 cycles kit) Illumina Cat#20028319
Illumina Stranded mRNA Prep kit Illumina Cat#15031047
In-Fusion Snap Assembly Master Mix Takara Bio Cat#638948
NEBuilder HiFi DNA Assembly Master Mix New England Biolabs Cat#E2621L
X-tremeGENE 9 DNA Transfection Reagent Sigma-Aldrich Cat#6365809001
Zombie NIR viability dye BioLegend Cat#423106
one-step SensiFAST real-time PCR kit Bioline Cat#BIO-72001
Maxima H Minus First Strand cDNA Synthesis Kit ThermoFisher Scientific Cat#K1652
SensiFAST SYBR No-ROX Kit Bioline Cat#BIO-98050
Accel-NGS 2S Plus DNA Library Kit Swift Biosciences Cat#21096
Maxi-prep plasmid purification kit Invitrogen Cat#K210016
BP clonase II ThermoFisher Scientific Cat#11789020
LR clonase II ThermoFisher Scientific Cat#11791020
BveI ThermoFisher Scientific Cat#FD1744
Esp3I ThermoFisher Scientific Cat#FD0454
XhoI New England Biolabs Cat#R0146L
CsiI ThermoFisher Scientific Cat#FD2114
MluI ThermoFisher Scientific Cat#FD0564
FastAP ThermoFisher Scientific Cat#EF0651
T4 DNA ligase New England Biolabs Cat#M0202
Lipofectamine RNAiMax ThermoFisher Scientific Cat#13778150
Streptavidin Sepharose Beads ThermoFisher Scientific Cat#20353
Dynabeads protein G ThermoFisher Scientific Cat#10004D
Proteinase-K ThermoFisher Scientific Cat#EO0492
RNaseA Invitrogen Cat#12091021
Bicinchoninic acid (BCA) assay Pierce Cat#23225
Bradford reagent BioRad Cat#5000006
NuPAGE LDS Sample Buffer (4x) ThermoFisher Scientific Cat# NP0007
cOmplete protease inhibitors Roche Cat#11836145001
4–12% Bis‐Tris gels Life Technologies Cat#NP0323BOX
Immobilon-P PVDF membrane Sigma-Aldrich Cat#IPVH00010
SuperSignal West Pico PLUS chemiluminescence reagent ThermoFisher Scientific Cat#34580
DMEM with high glucose and pyruvate Gibco Cat#11995073
heat-inactivated fetal bovine serum (HI-FBS) Gibco Cat#16140071
penicillin-streptomycin Gibco Cat#15140122
trypsin-EDTA Gibco Cat#25200056
Opti-MEM Gibco Cat#31985062
Deposited data
CRISPR screen sequencing This study GEO: GSE244337
GUIDE-seq data This study GEO: GSE262849
HAP1 and RPE1 RNA-Seq data This study GEO: GSE244340
TAF5 RNA-Seq This study GEO: GSE244357
ChIP-seq data This study GEO: GSE244373
All sequencing data produced in this study This study GEO: GSE244374
TAF5 AP-MS data This study MassIVE: MSV000092798
TAF5 miniTurboID data This study MassIVE: MSV000092798
Microscopy data This study DOI: 10.17632/3sdsc83vsn.2
HAP1 cells ribosome profiling data Malecki et al.84 GEO: GSE93133
Lymphoblastoid cell lines (GM19204 and GM19238) ribosome profiling data Raj et al.85 GEO: GSE75290
Primary human reticulocytes ribosome profiling data Mills et al.86 GEO: GSE85864
hES cells (H1) ribosome profiling data Werner et al.87 GEO: GSE62247 (SRR1610244 to SRR1610259)
BJ fibroblast cell lines (EH, EL and ELR) ribosome profiling data Ji et al.88 GEO: GSE65885 (SRR1802146 to SRR1802148; SRR1802152 to SRR1802154)
HCT116 cells ribosome profiling data Fijałkowska et al.89 GEO: GSE87328
HEK293 cells ribosome profiling data Oh et al.90 GEO: GSE70804
U2OS human osteosarcoma cell line ribosome profiling data Jang et al.91 GEO: GSE56924
RPE1 cells ribosome profiling data Tanenbaum et al.92 GEO: GSE67902
Proteomics data Sinitcyn et al.8 PMID: 36959352
Experimental models: Cell lines
HAP1 Horizon Discovery Cat#C631
HAP1 Cas9 Horizon Discovery Cat#Cas9-011
HAP1 Cas9/Cas12a This study N/A
RPE1 hTERT ATCC Cat# CRL-4000
RPE1 hTERT TP53 −/− Cas9 Hart et al.29 N/A
RPE1 hTERT TP53 −/− Cas9/Cas12a This study N/A
HEK293T ATCC Cat#ACS-4500
HEK293 Flp-In T-REx cells Invitrogen Cat#R78007
Oligonucleotides
See Table S4 This study N/A
Recombinant DNA
See Table S4 & Methods This study N/A
Software and algorithms
AlphaFold2 2.3.1 Jumper et al.121, Varadiet al.122 https://alphafold.ebi.ac.uk/download
BEDTools 2.30.0 Quinlan and Hall109 https://bedtools.readthedocs.io/en/latest/
Biomart Kinsella et al.140 http://useast.ensembl.org/biomart/martview/62cd2a1c43671bf181f1a4d18e66b210
BLAT UCSC Genome Browser https://hgdownload.soe.ucsc.edu/downloads.html#utilities_downloads
bowtie 1.3.1 Langmead et al.128 https://bowtie-bio.sourceforge.net/manual.shtml
bowtie2 2.5.1 Langmead and Salzberg108 https://bowtie-bio.sourceforge.net/bowtie2/manual.shtml
CRISPResso2 Clement et al.113 https://github.com/pinellolab/CRISPResso2
cutadapt 1.18 Martin107 https://cutadapt.readthedocs.io/en/stable/
deepTools 3.5.2 Ramírez et al.120 https://deeptools.readthedocs.io/en/develop/
g:Profiler Kolberg et al.118 https://biit.cs.ut.ee/gprofiler/gost
GuideScan Perez et al.126 https://guidescan.com
MAGeCK mle module Li et al.50 https://sourceforge.net/p/mageck/wiki/Home/
MaxEntScan Yeo and Burge148 http://hollywood.mit.edu/burgelab/maxent/Xmaxentscan_scoreseq.html
MISO 0.5.4 Katz et al.124 https://miso.readthedocs.io/en/fastmiso/
Python 2.7.15 Python Software Foundation https://www.python.org
Python 3.8.5 Python Software Foundation https://www.python.org
R version 4.3.1 R Foundation https://www.r-project.org
R package PRROC Grau et al.130 https://cran.r-project.org/web/packages/PRROC/index.html
RibORF Ji150 https://github.com/zhejilab/RibORF
Rule Set 2 Doench et al.125 https://portals.broadinstitute.org/gpp/public/software/sgrna-scoring-help#rs2
Samtools 1.16.1 Li et al.119 http://www.htslib.org/download/
scikit-learn 1.3.0 Pedregosa et al.131 https://scikit-learn.org/stable/install.html
STAR 2.7.11a Dobin et al.149 https://github.com/alexdobin/STAR/releases
UCSC liftOver UCSC Genome Browser https://genome.ucsc.edu
UCSC Multiz Refseq protein alignment Blanchette et al.141 https://hgdownload.soe.ucsc.edu/goldenPath/hg38/multiz100way/
UCSC table browser UCSC Genome Browser https://genome.ucsc.edu
UCSF ChimeraX 1.6.1 Pettersen et al.123 https://www.cgl.ucsf.edu/chimerax/
VastDB Tapial et al.6 https://vastdb.crg.eu/wiki/Main_Page
Whippet Sterne-Weiler et al.117 https://github.com/timbitz/Whippet.jl
FlowJo software BD Biosciences, version 10.8.1 https://www.flowjo.com/solutions/flowjo
Cytoscape v3.9.1 Shannon et al.114 https://cytoscape.org/
STRING plugin v2.0.1 Doncheva et al.115 https://apps.cytoscape.org/apps/stringapp
Partek Flow software (version 10.0.23.0531) N/A https://www.partek.com/partek-flow/
GraphPad Prism Version 9.0 GraphPad Software Inc. https://www.graphpad.com/
Adobe Illustrator v28.4.1 Adobe https://www.adobe.com/products/illustrator.html
Affinity Designer 2 v2.3.0 Affinity Designer https://affinity.serif.com/en-us/designer/
BioRender BioRender https://www.biorender.com/
Other
Semi-dry transfer Bio-Rad Cat#1703940
UV Crosslinker VWR Cat#89131–484
Lonza 4D-Nucleofector Transfection System Lonza Cat#AAF-1003B, AAF-1003X
M220 Focused Ultrasonicator Covaris Cat#500295
4150 TapeStation System Agilent Cat#G2992AA
BTX Gemini Electroporator BTX Cat#452042
Bioruptor Plus sonicator Diagenode Cat#B01020002
Mini Gel Tank Life Technologies Cat#A25977,
Mini Blot Module Life Technologies Cat#B1000
VeritPro Thermal Cycler Applied Biosystems Cat#A48141
CFX96 Touch Real-Time PCR BioRad Cat#1855195
MiSeq Sequencing System Illumina N/A
NextSeq 2000 Sequencing System Illumina N/A
NovaSeq 6000 Sequencing System Illumina N/A
EVOS M5000 Imaging System Invitrogen Cat#AMF5000
iBright CL1500 Imaging System Invitrogen Cat#A44114
Vi-Cell BLU Cell Viability Analyzer Beckman Coulter Cat#C19196
Incucyte SX1 Sartorius Cat#4837
LSRFortessa cell analyzer BD Biosciences N/A

EXPERIMENTAL MODEL AND STUDY PARTICIPANT DETAILS

Cell culture

HEK293T cells, HEK293 Flp-In cells, HAP1 cells, and RPE1 cells were cultured in Dulbecco’s modified Eagle’s medium (DMEM) with high glucose and pyruvate (Gibco #11995073), supplemented with 10% heat-inactivated fetal bovine serum (HI-FBS) (Gibco #16140071) and 1% penicillin-streptomycin (Gibco #15140122). All cells were grown at 37°C in a humidified atmosphere with 5% CO2. Cells were routinely passaged using 0.25% trypsin-EDTA (Gibco #25200056) and seeded at appropriate densities in tissue culture-treated plates for experimental purposes. Medium was changed every 2–3 days, and cells were monitored for confluency using the EVOS M5000 Imaging System (Invitrogen). Cell viability was assessed using the Vi-Cell BLU Cell Viability Analyzer (Beckman Coulter).

CHyMErA and base editor stable cell line generation

The HAP1 near-haploid human cell line, derived from the KBM-7 chronic myelogenous leukemia cell line, was obtained from Horizon Discovery (#C631), along with a HAP1 clone stably expressing Cas9 from a hEF1α promoter (Horizon Discovery #Cas9-011). An immortalized retinal pigment epithelium-1 (RPE1-hTERT) clone stably expressing FLAG-tagged Cas9 and harboring p53 loss-of-function mutation (referred to as RPE1) has been developed and used for CRISPR screens previously105,106. The HAP1 and RPE1 cell lines were plated at a density of 0.25 million cells per well in 6-well plates. Lentiviral particles containing Cas12a variants or SpCas9 base editors were then transduced into the cells at an MOI of 0.3. Cells transduced with Cas12 variants or SpCas9 base editors were selected with 500 μg/mL of G418 Sulfate (Gibco #10131027) or 10 μg/mL of Blasticidin S (Gibco #A1113903), respectively, and expanded as a polyclonal population. The expression of Cas9 and Cas12a was confirmed through western blot and immunofluorescence analysis using anti-SpCas9 and anti-Myc tag antibodies (see below).

METHOD DETAILS

Cloning of Cas12a variants and base editors

All Cas12a variants (LbCas12a-8×NLS, Addgene #209020; enCas12a, Addgene #209021; opCas12a, Addgene #209022; Cas12a-Ultra, Addgene #209023; opCas12a-Ultra, Addgene #209024) were cloned into lentivirus-based expression vectors using In-Fusion Snap Assembly Master Mix (Takara Bio #638948), and the same vector backbone described previously for plenti-Lb-Cas12a-2×NLS (Addgene #155046) and plenti-As-Cas12a-2×NLS (Addgene #155047)32. The adenine base editors [ABE8e (Addgene #138506) and ABE8.20m (Addgene #136300)] were fused to the N-terminus of SpCas9(D10A)-6×NLS and cloned into the TLCV2 inducible lentiviral plasmid (Addgene #87360) using In-Fusion Snap Assembly Master Mix (pLenti-nSpCas9-NoABE: Addgene #209043; pLenti-nSpCas9-TadA8e: Addgene #209044; pLentin-SpCas9-TadA8.20m: Addgene #209045). Similarly, all cytosine base editors [APOBEC1 (Addgene #157942), APOBEC3A (Addgene #157945), BEACON2 (Addgene #171698), and evoCDA1 (Addgene #122608)] along with the ssDNA binding domain of RAD5163 were fused to the N-terminus of SpCas9(D10A)-6×NLS and cloned into an inducible lentiviral plasmid. Additionally, 2xUGI sequences were introduced between SpCas9(D10A) and 6×NLS for all cytosine base editors (pLenti-nSpCas9-NoCBE: Addgene #209038; pLenti-nSpCas9-APOBEC1: Addgene #209039; pLenti-nSpCas9-APOBEC3A: Addgene #209040; pLenti-nSpCas9-BEACON2: Addgene #209041; pLenti-nSpCas9-evoCDA1: Addgene #209042). For the transient transfection experiments described in Figure 5, the evoCDA1 base editor was further subcloned into pSpCas9(BB)-2A-Puro (PX459) V2.0 plasmid (Addgene #62988) using NEBuilder HiFi DNA Assembly Master Mix (New England Biolabs #E2621L).

Cloning of pLCHKOv2 and pLCHKOv3 hgRNA expression vectors

The CHyMErA optimization library was cloned into pLCHKOv2, a modified version of our previously published pLCHKO hgRNA expression vector (Addgene #155048)32. First, a DNA fragment containing a shortened DNA polymerase (Pol) III transcription termination site, with six instead of eight “T”s, was cloned by TWIST Biosciences into the original pLCHKO vector downstream of the U6 promoter. Next, the stuffer sequence between the Pol III promoter and the transcription terminator was replaced with a ccdB bacterial toxin amplified from the pDONOR 221 vector, resulting in the pLCHKOv2 vector. The pLCHKOv2 vector was further modified by introducing SwaI/SmiI and SbfI/SdaI restriction sites. The pLCHKOv2 vector was then digested with XhoI (New England Biolabs #R0146L), and dsDNA fragments containing the two new restriction sites were cloned into the vector using In-Fusion Snap Assembly Master Mix (Takara Bio #638948) creating pLCHKOv3 (Addgene #209025).

Lentivirus production

Lentivirus was produced as described previously33. Briefly, for library virus production, 8 million HEK293T cells were seeded per 15-cm plate in DMEM + 10% HI-FBS. Twenty-four hours after seeding, cells were transfected with a mix of 6 μg of lentiviral pLCHKO vector containing the hgRNA library, 6.5 μg of packaging vector psPAX2 (Addgene #12260), 4 μg of envelope vector pMD2.G (Addgene #12260), 48 μL of X-tremeGENE 9 DNA Transfection Reagent (Sigma-Aldrich #6365809001), and 1.4 ml of Opti-MEM medium (Gibco #31985062). Twenty-four hours after transfection, the medium was replaced with serum-free, high-BSA growth media (DMEM, 1.1 g/100 mL BSA, 1% penicillin/streptomycin). Virus-containing medium was harvested 48 hours post-transfection, centrifuged at 475 rcf (relative centrifugal field) for 5 min at 4° C, aliquoted, and frozen at −80°C.

For determination of viral titers, cells were transduced with a titration of the lentiviral hgRNA library along with polybrene (8 μg/mL) (Sigma-Aldrich # H9268). After 24 hours, virus-containing medium was replaced with fresh medium containing puromycin (1–2 μg/mL) (ThermoFisher Scientific #A1113803) and cells were incubated for an additional 48 hours. The multiplicity of infection (MOI) of the titrated virus was determined 72 hours post-infection by comparing the percentage survival of puromycin-selected cells to infected but non-selected control cells.

For virus production of focused gRNA constructs, 0.7 million HEK293T cells were seeded in 6-well plates. Twenty-four hours later, cells were transfected with 625 ng of lentiviral pLCHKO or pLCKO vector containing an hgRNA or sgRNA, 725 ng of packaging vector psPAX2, 525 ng of envelope vector pMD2.G, 6 μL of X-tremeGENE 9 DNA Transfection Reagent, and 100 μL of Opti-MEM medium. Virus was harvested as described above.

For virus production of Cas9 and Cas12a nuclease constructs, 0.7 million HEK293T cells were seeded in 6-well plates. Twenty-four hours later, cells were transfected with 1 μg of lentiviral pLCHKO vector containing an hgRNA or sgRNA, 600 ng of packaging vector psPAX2, 400 ng of envelope vector pMD2.G, 6 μL of X-tremeGENE 9 DNA Transfection Reagent, and 100 μL of Opti-MEM medium. Virus was harvested as described above.

Immunoblotting

Cells were washed with DPBS (ThermoFisher Scientific #14190250) and lysed in F buffer (10 mM Tris pH 7.05, 50 mM NaCl, 30 mM Na4 pyrophosphate, 50 mM NaF, 5 μM ZnCl2, 10% glycerol, 0.5% Triton X-100) supplemented with a protease inhibitor cocktail (Roche #11836170001) for 10 minutes on ice. Total cell lysates were cleared by centrifugation at 18,500 rcf for 10 minutes at 4°C, and the pellet was discarded. The protein concentration was determined using Bradford reagent (BioRad #5000006), and an equal amount of protein (10–30 μg) was resolved on 4–12% Bis-Tris gels (Life Technologies #NP0323BOX). Following electrophoresis, proteins were transferred to an Immobilon-P PVDF membrane (Millipore #IPVH00010) at 22 V for 60 minutes using the Mini Blot Module (Life Technologies). The membrane was subsequently blocked with 5% milk for 1 hour at room temperature and incubated overnight at 4°C with primary antibodies: TAF1 (1:1,000, Cell Signaling Technology #12781S), TAF5 (1:2,000, ThermoFisher Scientific #MA3–076), TAF6 (1:2,500, ThermoFisher Scientific #A301–276A-M), TAF10 (1:2,000, Sigma #MABE1079), TAF12 (1:1,000, Proteintech #12353–1-AP), TBP (1:1,000, Proteintech #22006–1-AP), CCT2 (1:2,000, Proteintech #24896–1-AP), SpCas9 (1:2,500, Diagenode #C15200229), FLAG M2 (1:2,500, Sigma #F3165), HA tag (1:2,500, Sigma #H3663), Myc tag (1:2,500, Proteintech #60003–2-Ig), GAPDH (1:2,500, Proteintech #10494–1-AP), and β-Tubulin (1:2,500, Proteintech #10094–1-AP). After washing, the membrane was incubated with HRP-conjugated secondary antibodies (anti-Rabbit, Cell Signaling Technology #7074 and anti-Mouse, Cell Signaling Technology #7076,) at a 1:5,000 dilution for 1 hour at room temperature. After washing, SuperSignal West Pico PLUS chemiluminescence reagent (ThermoFisher Scientific #34580,) was used for protein detection, and images were acquired using the Invitrogen iBright CL1500 Imaging System.

Immunofluorescence

Cells were plated on LAB-TEKII chamber slides (Life Technologies #154534PK) and fixed with 4% paraformaldehyde (ThermoFisher Scientific #28908) for 15 minutes at room temperature. Cells were permeabilized with 0.2% Triton X-100 (Sigma-Aldrich #T8787) for 10 minutes and blocked with 5% BSA (Sigma-Aldrich #A9647) for 1 hour. Cells were incubated with primary antibodies: SpCas9 (1:500, Diagenode #C15310258) and Myc tag (1:400, Sigma-Aldrich #M4439) for 1 hour at room temperature in DPBS supplemented with 5% BSA. Following washing, cells were incubated with Alexa Fluor 488-conjugated goat anti-rabbit antibodies (1:1,000, ThermoFisher Scientific #A32731) or Alexa Fluor 647-conjugated goat anti-mouse antibodies (1:800, ThermoFisher Scientific #A32728) for 1 hour at room temperature in DPBS supplemented with 5% BSA. Nuclear staining was performed with Hoechst 33342 (1:10,000, ThermoFisher Scientific #62249), and the slides were mounted using Fluoromount-G Mounting Medium (ThermoFisher Scientific #00-4958-02). Images were acquired using a 63x oil immersion objective on a Leica TCS SP8 confocal microscope and processed using Fiji (ImageJ) software.

Assessing CD46 Exon Deletion by Agarose-PCR

HAP1 and RPE1 cells stably expressing SpCas9 and different Cas12a variants were transduced with lentivirus derived from pLCHKOv3 vectors expressing either a control hgRNA targeting an intergenic control region or three different hgRNAs targeting CD46 exon-3 for deletion using Lb or AsCas12a direct repeats (Addgene #209026–209033). After 24 hours, transduced cells were selected with 1 μg/mL of puromycin for 48 hours. Selected cells were lifted and reseeded in 1 μg/mL of puromycin for 72 hours, after which gDNA was extracted with the GeneJet Genomic DNA Purification Kit (ThermoFisher Scientific #K0722). CD46 exon-3 deletion was assessed by PCR using PrimeSTAR® Max DNA Polymerase (Takara Bio #R045B) and primers flanking the targeted region (see Table S4) with the following cycling conditions: step 1: 95°C for 1 minute; step 2: 98°C for 20 seconds, 62°C for 15 seconds, 72°C for 1 minute for a total of 34 cycles; step 3: 72°C 2 minutes. The PCR products were resolved on a 1.5% agarose gel by gel electrophoresis. The percent exon deletion was determined with ImageJ software using the original acquired black and white images without any background subtraction, by dividing the intensity of the exon-excluded band by the sum of the exon-included and exon-excluded band intensities. The result was then multiplied by 100 and rounded to the nearest integer.

Analysis of CD46 Cell Surface Expression by Flow Cytometry

HAP1 and RPE1 cells stably expressing SpCas9 and different Cas12a variants were transduced with intergenic control or CD46 exon 3-targeting hgRNAs as described above. Seventy-two hours post-selection, cells were washed with 1x PBS and lifted with accutase (Sigma-Aldrich #A6964) for 3 minutes at room temperature. Cells were then quenched with culture media and pipetted to break the cell monolayer into a single-cell suspension. 0.5 million cells per sample were added to cell strainer cap flow tubes (Falcon #352235) to ensure a single-cell suspension. Cells were washed with chilled flow buffer (1x PBS, 2% FBS) and then spun down for 3 minutes at 170 rcf in a pre-cooled centrifuge at 4°C. HAP1 cells were incubated with CD46-BV421 antibody (1:100, BD Biosciences #743776) on ice for 15 minutes. RPE1 cells were incubated with CD46-BV421 antibody (1:50, BD Biosciences #743776) on ice for 30 minutes. The cells were subsequently washed with 1x PBS and spun down for 3 minutes at 170 rcf at 4°C. Then, the cells were incubated with Zombie NIR viability dye (1:1,000, BioLegend #423106) on ice for 15 minutes. Cells were washed with flow buffer and spun down for 3 minutes at 170 rcf at 4°C. Cells were subsequently fixed in 1% paraformaldehyde (ThermoScientific #28908) for 15 minutes and washed twice with chilled flow buffer. Flow cytometry was performed on a BD LSRFortessa cell analyzer (BD Biosciences) using a violet laser (407 nm emission wavelength; 450/50 emission filter) for BV421 data acquisition, and a red laser (633 nm emission wavelength; 716/40 emission filter) for cell viability data acquisition. The data was processed using FlowJo software (BD Biosciences, version 10.8.1). Gating was applied to samples to exclude doublets and dead cells. The gate to determine CD46 knockout was set using unstained control samples.

Cas12a RNA processing activity

The efficiency of hgRNA processing was assessed as described previously32. Briefly, HAP1 cells expressing both SpCas9 and Cas12a or just SpCas9 alone were transduced separately with four independent lentiviral hgRNA expression cassettes (Table S4). RNA was extracted using TRIzol (ThermoFisher Scientific #15596026). Total and unprocessed Cas9 and Cas12a guides were amplified and quantified by quantitative RT-PCR on a BioRad CFX96 real-time PCR machine using the one-step SensiFAST real-time PCR kit (Bioline #BIO-72001) with the following cycling conditions: step 1: 45°C for 10 minutes; step 2: 95°C for 2 minutes; step 3: 95°C for 5 seconds, 60°C for 20 seconds for a total of 40 cycles. Full-length (unprocessed) hgRNA was amplified using primers annealing to the beginning of the Cas9 tracrRNA and to the junction of the Cas12a DR-guide sequence. To amplify total levels of the Cas9 guide (from both processed and unprocessed transcripts), primers annealing to the beginning and end of the tracrRNA were used (Table S4). Relative Cas12a processing activity was estimated by normalizing the levels of unprocessed hgRNA to total levels of Cas9 gRNA, assuming 0% processing activity in the cell line expressing only SpCas9 (and not any Cas12a variant).

Northern blotting

RNA was isolated using TRIzol (Sigma-Aldrich #T3934) following the manufacturer’s instructions. 5 μg of total RNA was mixed with denaturing loading dye (ThermoFisher Scientific #LC6876) and incubated at 95°C for 5 minutes, followed by cooling on ice for 5 minutes to disrupt RNA structures. The RNA samples were then separated on a 10% TBE-Urea gel (ThermoFisher Scientific #EC68752BOX) by electrophoresis at 200 V for 1.5 hours. Subsequently, the RNA was transferred to a Hybond-N+ membrane (Amersham #NV0796) at 300 mA for 1.5 hours using a semi-dry transfer cell (Bio-Rad #1703940). The RNA was immobilized on the membrane by UV crosslinking (254 nm) at 120,000 μJ/cm2 twice (VWR Crosslinker UV #89131–484) and incubated with ULTRAhyb-Oligo hybridization buffer (ThermoFisher Scientific #AM8663) for an initial prehybridization step of 45 minutes at 42°C with rolling. Next, 4 μL of 10 μM biotin-modified oligos (Table S4) were added to the hybridization tube, and the membrane was incubated overnight at 42°C in a hybridization oven. The blots were developed using North2South Chemiluminescent Hybridization and Detection reagents (ThermoFisher Scientific #17097) and visualized using an iBright CL1500 Imaging System (Invitrogen).

GUIDE-seq assays and analysis

The GUIDE-seq experiment was conducted following an established protocol with minor modifications51,52. In brief, double-stranded oligodeoxynucleotide (dsODN) was generated by annealing two modified oligonucleotides as described in a previous study. 4 × 105 RPE1 cells stably expressing both Cas9 and Cas12a nucleases were nucleofected with 1 μL of dsODN (100 μM) in 20 μl of Solution P3 (Lonza #V4XP-3032) using program EA-114 on a Lonza 4D-Nucleofector Transfection System (Lonza), following the manufacturer’s instructions, and then seeded into a 6-well plate. After a 1.5-hour incubation at 37°C with 5% CO2, cells were transduced with lentivirus carrying specific hgRNAs targeting genomic loci. Twenty-four hours later, puromycin (2 μg/mL) was added to select virus-infected cells, which were then cultured for an additional 48 hours before harvesting for gDNA extraction using the GeneJET Gel Extraction Kit (ThermoFisher Scientific #K0692). A starting material of 300 ng of gDNA was sheared to an average size of 500 nucleotides (nt) using the M220 Focused-ultrasonicator (Covaris #500295) with the following settings: duty factor: 20%, peak: 50, cycles: 200, duration: 45 seconds, and temperature: 20°C. The fragmented gDNA was then purified and enriched with SPRIselect beads (Beckman #B23318), followed by library preparation following the same procedure as previously described52, except for using SPRIselect beads (Beckman #B23318) for purification. The final library was purified using SPRIselect double-sided selection (ratios 0.8–0.61), retaining fragments of 230–660 nt. Library quality and concentration were assessed using the High Sensitivity D1000 ScreenTape (Agilent #5067–5584) with High Sensitivity D1000 Reagents (Agilent #5067–5585) on a 4150 TapeStation System (Agilent #G2992AA). Finally, libraries were pooled, spiked-in with 15% PhiX, and sequenced on a NextSeq 2000 platform (Illumina) using NextSeq 2000 P2 Reagents (200 Cycles) v3 (Illumina #20046812).

To identify integration sites, only reads from Read 1 were analyzed. Before mapping to the genome, all reads underwent filtering to remove sequences containing a tag sequence of either ‘GCTCGCGTTTAATTGAGTTGTCATATGT’ or ‘TCGCGTATACCGTTATTAACATATGACAACTCAA’, corresponding to the integrated dsODN from the plus or minus library, respectively. This was achieved using cutadapt107 with the following parameters: min_overlap=10 and --minimum-length=20. Reads with tags from either the plus or minus library of the same sample were combined. Subsequently, a custom Python script was employed to filter out PCR-introduced duplicate reads based on the UMI sequence. Processed reads were then aligned to the human genome (hg38) using Bowtie 2108 with default parameters. The resulting alignments were converted to BED format, and potential integration sites were extracted from the alignments using a custom Python script. Nearby integration sites were merged into clusters using a sliding window of 10 nt, with the site having the highest read counts defined as the representative integration site for each cluster, and all reads within the cluster assigned to the integration site. Genome regions spanning 25 nt (for Cas9) and 32 nt (for Cas12a) from each side of the defined integration sites were determined, and their genome sequences were extracted. To identify on-target and potential off-target sites, each specific gRNA sequence was aligned to all the extracted genomic sequences from the previous step using BLAT with relaxed parameters (-tileSize=6, -stepSize=1, -oneOff=4, -minIdentity=10, -minScore=0) for both Cas9 and Cas12a gRNAs, to retain all potential gRNA targeting sites. Additionally, the coordinates of potential gRNA targeting sites, including the PAM sequence, were extracted from the BLAT output, and the corresponding sequences were fetched from the genome as reference sequences using BEDTools109. ‘NGG’ was used as the effective PAM for Cas9 gRNAs, while for Cas12a, ‘TTTV’ and all other PAMs from tiers 1 and 2 defined in a previous study39 were considered effective. The extracted reference sequences were compared with the gRNA, and reference regions with no more than 8 mutations out of 20 nt for Cas9 gRNA and 9 mutations out of 23 nt for Cas12a gRNA, respectively, were considered as integration candidate sites. Furthermore, only integration sites with at least 5 reads within the corresponding clusters were defined as effective on- or off-target sites.

Sequencing analysis to determine base editing efficiency

The quantification of base editing efficiency for the cytosine and adenine base editor constructs was performed as described previously110. Briefly, HAP1 base editor cells were transduced with lentivirus derived from pLCHKOv3 vectors to express either a control sgRNA or an sgRNA targeting the 5’ splice sites of TAF5-exon8 or SNAPC5-exon2 (Table S4). After 24 hours, transduced cells were selected with 1 μg/mL of puromycin for 48 hours. The cells were then lysed with 0.5 × Direct-Lyse buffer111 and the region spanning the mutated splice sites was amplified using Q5 High-Fidelity DNA Polymerase (New England Biolabs #M0491L). A fraction of the amplicon was barcoded with a unique i7 index primer. The barcoded products were pooled and purified using SPRI beads prepared in-house, based on Sera-Mag Speedbeads (ThermoFisher Scientific, #6515–2105-050250), as described previously112. The purified products were then subjected to paired-end sequencing using the MiSeq Reagent Micro Kit v2 (300-cycles) (Illumina, #MS-103–1002) on an Illumina MiSeq System.

The MiSeq data from various amplicons were aligned to the reference sequence using CRISPResso2113, with the following specific parameters: --quantification_window_size 27 and --quantification_window_center −3. In other words, we restricted our analysis to 27 nucleotides spanning the predicted Cas9 guide cut site while retaining default values for other parameters. Subsequently, mutation frequencies at splice sites were extracted from the CRISPResso2 output for each sample, providing quantitative measures of editing outcomes. Comparative analysis of mutation frequency data across different samples enabled a comprehensive assessment of variations in splicing site mutations induced by distinct base editors.

Cloning TAF5 isoforms

Full-length and Exon-8-skipped TAF5 isoforms (denoted as TAF5-FL and TAF5-ΔEx8, respectively) were cloned into the pDONR223 destination vector using BP clonase II (ThermoFisher Scientific #11789020), following PCR amplification and purification of the respective sequences. The BP reaction mix was transformed into NEB Stable Competent E. coli cells (New England Biolabs #C3040H) and transformants were selected on LB agar plates with spectinomycin. Subsequently, the TAF5-FL and TAF5-ΔEx8 inserts were transferred from pDONR223 to pcDNA5-miniTurboID and pcDNA5–3xFLAG vectors using LR clonase II (ThermoFisher Scientific #11791020). The LR reaction mix was transformed into NEB Stable Competent E. coli cells, and transformants were selected on LB agar plates supplemented with 100 μg/ml ampicillin. The clones were confirmed by Sanger sequencing. All TAF5-FL and TAF5-ΔEx8 plasmids are submitted to Addgene (pDONR223-TAF5-FL: Addgene #209047; pDONR223-ΔEx8: Addgene #209048; pcDNA5-miniTurboID-TAF5-FL: Addgene #209051; pcDNA5-miniTurboID-TAF5-ΔEx8: Addgene #209052; pcDNA5–3xFLAG-TAF5-FL: Addgene #209049; pcDNA5–3xFLAG-TAF5-ΔEx8: Addgene #209050).

Generation of HEK293 Flp-In cell lines expressing TAF5 variants

HEK293 Flp-In cells were transfected with pOG44 (Flp-recombinase expression vector, Life Technologies #V600520) and pcDNA5 plasmids encoding doxycycline-inducible versions of TAF5-FL and TAF5-ΔEx8 isoforms using X-tremeGENE 9 DNA Transfection Reagent (Sigma-Aldrich #6365809001), following manufacturer’s recommendations. After 24 hours of transfection, cells were selected with hygromycin B (200 μg/mL, ThermoFisher Scientific #10687010). The selected cells were induced with doxycycline to express TAF5-FL and TAF5-ΔEx8 proteins at near endogenous levels, and their expression was confirmed through western blot analysis using anti-FLAG and anti-TAF5 antibodies.

TAF5 isoform rescue experiments

Four different siRNAs against TAF5 (Dharmacon #J-012357–05, CGAGUAUUAUCUAGUCUUA; J-012357–06, CAGAUAAGUUGGAUAAGAU; J-012357–07, GCAUCAGGUUCAAUGGAUA; J-012357–08, GGGUAAAGUUGGAAGUGUU) were tested for depleting endogenous TAF5 expression by quantitative RT-PCR and western blotting. The best-performing siRNA (J-012357–07) was selected for further experiments. HEK293 Flp-In cells were depleted of endogenous TAF5 by transfecting TAF5 siRNA (Dharmacon #J-012357–07) using Lipofectamine RNAiMax (ThermoFisher Scientific #13778150). After 24 hours, the cells were induced with doxycycline (10 pg/mL for TAF5-FL and 100 pg/mL for TAF5-ΔEx8, Sigma #D9891) for 48 hours to rescue with siRNA-resistant TAF5-FL (Addgene #209053/209055) or TAF5-ΔEx8 (Addgene #209054/209056) proteins, prior to cell lysis for chromatin or RNA or protein extraction.

Immunoprecipitation

Cells were washed with PBS and lysed in F buffer (10 mM Tris pH 7.05, 50 mM NaCl, 30 mM Na4 pyrophosphate, 50 mM NaF, 5 μM ZnCl2, 10% glycerol, 0.5% Triton X-100) supplemented with a protease inhibitor cocktail (Roche #11836170001). The lysates were centrifuged at 18,500 rcf for 10 minutes at 4°C to collect the supernatant. Immunoprecipitation was performed using anti-FLAG M2 Magnetic Beads (Sigma-Aldrich #M8823) or Dynabeads protein G (ThermoFisher Scientific #10004D) bound to TBP antibody (2 μg, Proteintech #22006–1-AP). Total cell lysates (1 mg for TBP immunoprecipitation and 500 μg for FLAG immunoprecipitation) were incubated with the antibody-bound beads overnight at 4°C with rotation. The beads were washed 5 times using F buffer and bound protein complexes were eluted by boiling in NuPAGE LDS Sample Buffer (ThermoFisher Scientific # NP0007) at 70°C for 10 minutes. The eluted proteins were then subjected to western blot analysis using the indicated antibodies.

Affinity Purification and miniTurboID mass spectrometry

For affinity-purification mass spectrometry analysis, HEK293 Flp-In cell pellets were lysed in 1 mL of 0.5% NP40 lysis buffer (150 mM NaCl, 50 mM Tris pH 7.0, 0.5% NP40, and protease inhibitors) by gently pipetting the solution until fully homogenous. Cell lysates were flash frozen on dry ice and thawed before being clarified using centrifugation for 10 minutes at 18,500 rcf at 4°C. The protein concentration was estimated using bicinchoninic acid (BCA; Pierce #23225), and an equal amount of protein from each sample (1 mg) was transferred to a new tube, with the volume adjusted to 1 ml using cold lysis buffer. The cell lysate was incubated with 60 μL of a 50% slurry of M2 FLAG beads (ThermoFisher Scientific #A36797) for 2 hours at 4°C with rotation. The beads were washed three times with ice-cold 1x-TBS before being resuspended in 50 mM HEPES (pH 8.0) and stored at −80°C.

For streptavidin affinity purification of miniTurbo-TAF5 expressing cells, HEK293 Flp-In cell pellets were lysed in 1 mL of 1x RIPA buffer (Millipore Sigma #20–188) containing protease inhibitors (Roche cOmplete #11836145001, at 1 tablet in 50 ml of RIPA buffer), by gently pipetting the solution until fully homogenous. This was followed by pulse sonication to disintegrate the DNA before being clarified by centrifugation for 10 minutes at 18,500 rcf at 4°C. The protein concentration was estimated by BCA, and an equal amount of protein from each sample (1 mg) was transferred to a new tube, with the volume adjusted to 1 ml with cold lysis buffer. The protein lysate was incubated with 60 μL of a 50% slurry of Streptavidin Sepharose Beads (ThermoFisher Scientific #20353) overnight at 4°C with rotation. The beads were washed three times with 1 mL of cold lysis buffer, three times with ice-cold 1x-TBS, and once with 50 mM HEPES (pH 8.0) before being resuspended in 50 mM HEPES (pH 8.0) and stored at −80°C.

The suspended beads were thawed on ice, heated at 95°C for 4 minutes, and allowed to cool to room temperature. A total of 2 μg of trypsin (Promega #V5111) was added to each sample and incubated overnight at 37°C on an orbital rocker. The peptides were desalted using Pierce C18 spin columns (ThermoFisher Scientific #89873), eluted from the column using 70% acetonitrile (Fisher Chemical #75–05-8) with 0.1% trifluoroacetic acid (TFA) (Pierce #85183), and dried down in a speedvac centrifuge.

The dried peptides were suspended in 15 μL of 0.1% TFA and analyzed using an EASY-nLC 1200 system (ThermoFisher Scientific) coupled to a Q Exactive HF mass spectrometer (ThermoFisher Scientific) equipped with an EasySpray ion source. The desalted tryptic peptide was loaded onto an Acclaim PepMap 100 (75 μM × 2 cm) C18 trap column (ThermoFisher Scientific), followed by separation on a PepMap RSLC C18 (75 μM × 25 cm) analytical column. The peptides were eluted with a 5–27% gradient of acetonitrile with 0.1% formic acid over 60 minutes, followed by a 27–40% gradient of acetonitrile with 0.1% formic acid over 45 minutes, at a flow rate of 300 nL/min. The MS1 was performed at 60,000 resolution over a mass range of 380 to 1580 m/z, with a maximum injection time of 120 ms and an AGC target of 3e6. The MS2 scans were performed at a resolution of 15,000, with normalized collision energy set at 27, maximum injection time of 50 ms, and an AGC target of 2e5.

MS files were searched with Proteome Discoverer 2.4 using the Sequest node. Data were searched against the Uniprot human database using a full tryptic digest, allowing for a maximum of 2 missed cleavages, a minimum peptide length of 6 amino acids, and a maximum peptide length of 40 amino acids. An MS1 mass tolerance of 10 ppm and an MS2 mass tolerance of 0.02 Da were applied.

The data were normalized using the total intensity of the sample with the highest overall intensity. The median value was calculated for each protein within an experimental group, and p-value was obtained by comparing each experimental group to each of the control groups (i.e., empty vector or EGFP), assuming a one-tailed distribution and two-sample equal variance.

Construction of the protein-protein interaction network

The protein-protein network was constructed using Cytoscape v3.9.1114 and the integrated network query option using STRING plugin v2.0.1115, with a confidence score cutoff of 0.999 and zero maximum additional interactors. The nodes were colored with the log fold change between TAF5-FL and TAF5-ΔEx8 using the style options in Cytoscape.

RNA sequencing and analysis

Total RNA was extracted from HEK293 Flp-In cells of TAF5 isoform rescue experiments using the RNeasy Plus Mini Kit (Qiagen #74136). The library was then prepared using the Illumina Stranded mRNA Prep kit (Illumina, #15031047) and subjected to next-generation sequencing on an Illumina NovaSeq 6000 S1 platform (200 cycles kit). On average, the samples have 114 million pass filter reads, with more than 90% of bases above the quality score of Q30. Raw sequencing reads were trimmed for adapters and low-quality bases using Cutadapt 1.18107. Processed reads were aligned to the human reference genome (hg38) and annotated transcripts using the STAR 2.7.0f alignment tool. Gene expression levels were quantified from read-normalized counts, and differential gene expression analysis was performed using the DESeq2 pipeline in Partek Flow software (version 10.0.23.0531). For identifying dysregulated genes upon TAF5 knockdown, statistical significance was determined by applying a maximum false discovery rate (FDR) threshold of 0.05, in combination with a linear fold change (|FC|) threshold of greater than 1.5. This analysis compared the knockdown of endogenous TAF5 to the control condition of non-targeting siRNA, enabling the identification of genes whose expression is significantly dysregulated in response to TAF5 knockdown compared to the non-targeting siRNA control condition. To identify genes rescued by TAF5-FL or TAF5-ΔEx8, significance thresholds were set at either FDR greater than 0.05 or FDR less than or equal to 0.05, in conjunction with an |FC| threshold of less than 1.25.

To assess gene expression and alternative splicing in HAP1 and RPE1 cell lines, total RNA was extracted using the RNeasy Plus Mini Kit (Qiagen #74136), following the manufacturer’s instructions. RNA-seq libraries were prepared using the Illumina Stranded mRNA Prep kit (Illumina, #15031047) and subjected to paired-end sequencing using an Illumina NovaSeq 6000 S1 platform with a 200-cycles kit. The raw sequencing reads were initially processed to remove adaptors. Reads were aligned to the human genome (hg38) using STAR 2.7.0f. The resulting alignments were utilized as input to compute read counts in genes, using HTseq116. Additionally, RPKM values were calculated to estimate gene expression levels. Whippet117 was used to calculate Percent Spliced In (PSI) values for all targeted exons.

qPCR validations

qPCR validation was performed to confirm the RNA-seq results and evaluate the expression levels of selected genes in the TAF5 isoform rescue experiments. Total RNA was extracted from the cells using the RNeasy Plus Mini Kit (Qiagen #74136). cDNA was synthesized using the Maxima H Minus First Strand cDNA Synthesis Kit (ThermoFisher Scientific #K1652), with 3 μg of total RNA as the starting material and primed with oligo dT and random hexamers. qPCR reactions were prepared using the SensiFAST SYBR No-ROX Kit (Bioline #BIO-98050). The raw qPCR data were analyzed using the comparative Ct method, and gene expression levels were estimated for TM7SF2, WDR31, WAS, ADCK5, SULT1A4, and GSTP1 genes after normalizing to the stable reference gene GAPDH (for primers see Table S4). Statistical analysis (two-way ANOVA) was performed using GraphPad Prism Version 9.0 (GraphPad Software Inc., San Diego) to assess the significance of expression differences between conditions.

Gene ontology enrichment analysis

The Gene Ontology enrichment analysis was performed using g:Profiler118, with a focus on GO biological processes. We applied the default multiple testing correction method of g:SCS and set the significance threshold at an adjusted p-value < 0.05. For the gene ontology enrichment analysis of genes containing fitness-promoting exons, we used the union of screen hits in HAP1 and RPE1 as the query, and all genes targeted in the exon deletion library as background. For the gene ontology enrichment analysis of genes regulated by TAF5 exon-8, we initially identified the differentially expressed genes between TAF5-FL and TAF5-ΔE8 rescue conditions |FC| ≥ 1.5 and FDR ≤ 0.05) and performed gene ontology analysis for these genes using all expressed genes as background.

ChIP sequencing and analysis

HEK293 Flp-In cells were grown in 15 cm plates, transfected with TAF5 or non-targeting siRNAs, and induced with doxycycline to rescue with TAF5-FL or TAF5-ΔEx8 proteins. Cells were fixed with 1% formaldehyde (Pierce #PI28908) for 10 minutes, and the fixation was stopped with 125 mM glycine (Sigma #G8898) for 7 minutes. The cell pellets were resuspended in 45 mL of Farnham lysis buffer (5 mM PIPES pH 8.0, 85 mM KCl, and 0.5% NP-40) and incubated at 4°C for 20 minutes with gentle shaking. Following centrifugation, the supernatant was discarded, and the pellets were once again resuspended in 25 mL of Farnham lysis buffer, then incubated at 4°C for 15 minutes with gentle shaking. The purified nuclei pellets were subsequently resuspended in 0.8 mL of ChIP lysis buffer (50 mM HEPES-KOH pH7.5, 140 mM NaCl, 1 mM EDTA pH8.0, 1% Triton X-100, 0.1% Sodium Deoxycholate, 0.1% SDS, and protease inhibitor cocktail) and sonicated using a BioruptorPlus (Diagenode) at high power for 15 cycles (30 seconds on and 30 seconds off). The sheared chromatin was centrifuged at 18,500 rcf for 5 minutes, and the supernatant was collected.

For each immunoprecipitation, 50 μg of sheared chromatin, 6 μg of the antibodies [rabbit IgG (Proteintech #30000–0-AP), TBP (Proteintech #22006–1-AP), and RNA polymerase II (ThermoFisher Scientific #A300653A)], and 40 μL of Dyna-Protein G beads were used. The immunoprecipitations were performed at 4°C overnight using RIPA buffer. 10% of the sheared chromatin was kept aside as input. The beads were then washed twice with low salt (0.1% SDS, 1% Triton X-100, 2 mM EDTA, 20 mM Tris-HCl pH 8.0, and 150 mM NaCl), high salt (0.1% SDS, 1% Triton X-100, 2 mM EDTA, 20 mM Tris-HCl pH 8.0, and 500 mM NaCl), and LiCl wash buffers (0.25 M LiCl, 1% NP-40, 1% Sodium Deoxycholate, 1 mM EDTA, and 10 mM Tris-HCl pH 8.0). The chromatin was eluted by incubating with 100 μL of elution buffer (1% SDS and 100 mM NaHCO3) for 1 hour at room temperature and 15 minutes at 65°C. The eluted samples and input DNA underwent overnight decrosslinking followed by RNaseA (Invitrogen #12091021, final concentration 0.2 mg/ml) and Proteinase-K (ThermoFisher Scientific #EO0492, final concentration 0.4 mg/ml) treatment for 1 hour. Samples were then purified using a QIAquick PCR Purification Kit (Qiagen #28104). The sequencing library was prepared from 11.5 ng of purified DNA using the Accel-NGS 2S Plus DNA Library Kit (Swift Biosciences # 21096) and was paired-end sequenced on an Illumina NovaSeq 6000 S1 platform (100-cycles kit). On average, the samples have 123 million pass filter reads, with more than 90% of bases above the quality score of Q30.

All raw ChIP-seq reads were trimmed with Cutadapt (version 1.18107) and mapped to hg38 using the default settings of Bowtie 2108 (version 2.3.4.1) for downstream analyses. The mapped (.bam) files of replicates for each condition were merged, sorted, and indexed using Samtools119. Next, we created read coverage profiles (bigwig format) by calculating the log2 of the ratio between ChIP for each condition (TBP, RNA pol II, and IgG) and the corresponding Input sample. This was done using the deepTools suite120 (version 3.5.2) “bamCompare” function with the options -binSize 50, -smoothLength 500, and -effectiveGenomeSize 2913022398. Changes in occupancy levels in expressed genes, and the upregulated and downregulated genes rescued by TAF5-FL (listed in Table S9: sheet 2) around ± 5000 bp of the TSS, were plotted using the deepTools “computeMatrix reference-point” and “plotHeatmap”.

To test whether the occupancy of expressed genes is statistically different for siNT, siTAF5, siTAF5 + TAF5-FL, and siTAF5 + TAF5-ΔE8, bigwig files were converted to the bedGraph format. We calculated the mean occupancy score of bedGraph overlapping the TSS of expressed genes using BEDTools109 map (version 2.27.1) and applied the Wilcoxon rank sum test.

Large-scale siRNA screen to check TAF5 exon-8 splicing in HEK293 cells

A large-scale siRNA screen targeting 60 splicing factors was conducted in HEK293 cells using a reverse transfection approach with Lipofectamine RNAiMAX (ThermoFisher Scientific #13778150). Specifically, cells were transfected with 25nM of siRNA (Dharmacon) against each splicing factor and allowed to incubate for 48 hours. Following knockdown, total RNA was extracted from the cells, and the splicing pattern of TAF5 exon-8 was monitored by RT-PCR analysis. Four independent biological replicates were performed and quantified using ImageJ, as described in the CD46 exon-3 deletion assay.

Cloning of TAF5 exon-8 minigene vectors

A DNA fragment containing TAF5 exon-8 and the native flanking intronic sequences spanning the exon-8 was synthesized (TWIST Biosciences) and subsequently cloned into the ApaI and NotI sites of the Exontrap Cloning Vector pET01 (MoBiTec GmbH; Addgene # 218972) using the NEBuilder HiFi DNA Assembly Master Mix (New England Biolabs #E2621L). Similarly, a DNA fragment of TAF5 exon-8 containing the mutated SRSF1 motif (TCAGAGGA to AGTCTCCT; Addgene # 218973) was cloned.

Splicing of TAF5 exon-8 minigene constructs

To investigate the influence of SRSF1 on the splicing of the TAF5 exon-8 minigene, HEK293 cells were transfected with SRSF1 siRNAs. After 48 hours, the minigene constructs with either the wildtype or the mutated SRSF1 motif were transfected. After 24 hours, RNA was extracted, and the splicing of the minigene constructs was monitored using primers that anneal to the 5’ and 3’ exons of the pET01 vector.

The effect of stress on TAF5 exon-8 splicing

To monitor endogenous TAF5 exon-8 splicing under various stress conditions, HEK293T cells were subjected to distinct stresses, including heat shock (42°C for 4 hours), osmotic stress (600 mM D-sorbitol for 2 hours, Sigma-Aldrich #S1876), ER stress (5 μg/ml tunicamycin for 4 hours, Sigma-Aldrich #SML1287), oxidative stress (250 μM hydrogen peroxide for 4 hours, Sigma-Aldrich #H1009), genotoxic stress (5 μM mitoxantrone for 4 hours, MedChemExpress #HY-13502), and ionic stress (100 mM NaCl for 4 hours). Additionally, to assess the impact of stress on TAF5 exon-8 splicing using minigene constructs, cells were transfected with TAF5 exon-8 minigenes. 20 hours post-transfection, the cells were exposed to the aforementioned stress conditions. Subsequently, RNA was extracted, and the splicing pattern of endogenous TAF5 exon-8 or minigene-derived TAF5 exon-8 was monitored by RT-PCR.

SRSF1 rescue experiments

Four different siRNAs against SRSF1 (Dharmacon #J-018672–09, CGUGGAGUUUGUACGGAAA; J-018672–10, UGACCUAUGCAGUUCGAAA; J-018672–11, UCUCGAAGCCGUAGUCGUA; J-018672–12, CAGGAUUCAUGGAGCGGGA) were tested for depleting endogenous TAF5 expression by quantitative RT-PCR. The best-performing siRNA (J-018672–09) was selected and used for further experiments. HEK293T cells were depleted of endogenous SRSF1 by transfecting SRSF1 siRNA (Dharmacon #J-018672–09) using Lipofectamine RNAiMax (ThermoFisher Scientific #13778150). After 24 hours, the cells were transfected with siRNA-resistant SRSF1 plasmid (Addgene # 218974) to rescue SRSF1 proteins. Subsequently, RNA was extracted, and the splicing pattern of TAF5 exon-8 was monitored. Similarly, for monitoring the splicing of minigene reporters, the minigene constructs were transfected 24 hours before collecting the cells for RNA extraction and splicing analysis.

Structure prediction of the TAF5 isoforms

To explore the effects of exon-8 on TAF5 structure, the protein sequences of full-length TAF5 (ENST00000369839.4, 800 amino acids) and ΔExon-8 (ENST00000692195.1, 745 amino acids) isoforms were downloaded from UCSC. We used AlphaFold2121,122 (version 2.3.1) to generate the structural models from TAF5 sequences. The AlphaFold2 run was performed on the NIH high-performance computing Biowulf cluster with the monomer model and full_dbs preset, and the --max_template_date set to 2006–10-17. For full-length and ΔExon-8 isoforms, we visualized the predicted model, ranked_0.pdb, which has the highest confidence, in UCSF ChimeraX123.

Cloning the optimization hgRNA library

For cloning guide RNA lentiviral libraries, we followed a similar strategy as described previously33, with some library-specific modifications elaborated below. For construction of the CHyMErA optimization libraries, Cas9 and Cas12a gRNA sequences were cloned into a lentiviral vector via two rounds of Golden Gate assembly. A single oligo pool of 113 nucleotides (nt) was designed carrying 20 nt Cas9 and 23 nt Cas12a guide sequences separated by a 32 nt diversified stuffer sequence harboring BsmBI/Esp3I restriction sites, all flanked by short sequences containing BfuAI/BveI restriction sites (see Table S1). The oligo pool containing 18,000 guide sequences was synthesized by TWIST Biosciences. Oligos were amplified by PCR using KAPA HiFi HotStart DNA polymerase (Roche #KK2601). Three 50 μL PCR reactions containing 10 nM oligo pool and 0.35 μM of each primer (Table S4) were performed in an Applied Biosystems Veriti 96-well thermal cycler using the following cycling conditions: step 1: 98°C for 3 minutes; step 2: 98°C for 10 seconds, 62°C for 15 seconds, 72°C for 20 seconds; step 2 was repeated for 10 cycles. Amplified oligos were purified on a PCR purification column (ThermoFisher Scientific #K0701), and an aliquot was run on a 2% agarose gel to check purity. Amplified oligos were digested with BveI (ThermoFisher Scientific #FD1744) and ligated into 2 μg pLCHKOv2 backbone using T4 DNA ligase (New England Biolabs #M0202) in a combined Golden Gate reaction overnight (step 1: 37°C for 30 minutes; step 2: 37°C for 30 minutes, 22°C for 30 minutes; step 2 was repeated for 16 cycles; step 3: 37°C for 15 minutes; step 4: 65°C for 20 minutes) using a 1:9 vector:insert molar ratio. The ligation mix was precipitated using sodium acetate and ethanol. The purified ligation reaction was used to transform Endura competent cells (LGC Biosearch Technologies #60242–2) by electroporation (1 mm cuvette, 25 μF, 200 Ω, 1,600 V), and a sufficient number of cells were plated on 15-cm 100 μg/mL ampicillin Luria–Bertani (LB) agar plates to reach a library coverage of > 5,000-fold. Following overnight incubation at 30°C, bacterial colonies were scraped from the plates, pooled, and bacterial pellets were collected. The Ligation 1 library plasmid was extracted using a Maxi-prep plasmid purification kit (Invitrogen #K210016).

In the second step, the SpCas9 tracrRNA directly followed by the Lb- or As-Cas12a direct repeat was inserted into the pooled library. The Ligation 1 plasmid library preparation was digested for 2 hours using Esp3I (ThermoFisher Scientific #FD0454), dephosphorylated using FastAP (ThermoFisher Scientific #EF0651) for 1 hour at 37°C, and subsequently purified on a PCR purification column. TOPO vectors carrying the SpCas9 tracrRNA and the Cas12a direct repeat (Addgene #155049 and #155050) (either for Lb or As) were digested using Esp3I and subsequently ligated into the digested pLCHKO-Ligation 1 vector in a combined Golden Gate reaction overnight (step 1: 37°C for 30 minutes; step 2: 37°C for 30 minutes, 22°C for 45 minutes; step 2 was repeated for 14 cycles; step 3: 37°C for 15 minutes; step 4: 65°C for 20 minutes) using a pLCHKO:TOPO vector molar ratio of 1:3. The ligation mix was precipitated using sodium acetate and ethanol. The purified ligation reaction was used to transform Endura competent cells by electroporation (1 mm cuvette, 25 μF, 200 Ω, 1,600 V), and a sufficient number of cells were plated on 15-cm 100 μg/mL ampicillin LB agar plates to reach a library coverage of 500 to 1,000-fold. After overnight incubation at 30°C, bacterial colonies were scraped from the plates, pooled, and bacterial pellets were collected. The Ligation 2 library plasmids were extracted using a Maxi-prep plasmid purification kit (Qiagen #12362). The library plasmids have been deposited to Addgene (#209734 – CHyMErA LbCas12a Nuclease Optimization hgRNA Library; #209735 – CHyMErA AsCas12a Nuclease Optimization hgRNA Library).

Cloning of the exon deletion hgRNA library

The large-scale CHyMErA exon deletion hgRNA library was cloned into the pLCHKOv3 lentiviral vector (Addgene #209025) via a single round of Golden Gate assembly. Oligo pools of 187 nucleotides (nt) were designed carrying 20 nt Cas9 guide sequences followed by Cas9 tracrRNA, AsCas12a direct repeat (DR), and 23 nt Cas12a guide sequences (see Table S2). Oligo pools containing 300,000 hgRNAs were synthesized by TWIST Biosciences. Oligos were amplified by PCR using KAPA HiFi HotStart DNA polymerase as described in the “Cloning the optimization hgRNA Library” section, the only differences being that 16× 50 μL reactions containing a 25 nM oligo pool were set up for a total of 5 PCR cycles. Amplified oligos were purified on a PCR purification column, digested with BveI, and ligated into 2 μg of the pLCHKOv3 backbone using T4 DNA ligase in a combined Golden Gate reaction overnight: step 1: 37°C for 30 minutes; step 2: 22°C for 60 minutes; step 3: 37°C for 15 minutes, 22°C for 45 minutes; step 3 was repeated for 12 cycles; step 4: 37°C for 20 minutes; step 5: 65°C for 20 minutes. The purified ligation reaction was electroporated into Endura competent cells, and 500 million bacterial colonies representing > 1,500-fold coverage were scraped from plates, pooled, and the library plasmid was extracted using a Maxi-prep plasmid purification kit as described above. The library plasmid has been deposited to Addgene (#209736–CHyMErA Large-Scale Exon Deletion hgRNA Library).

Cloning of the base editor Cas9 sgRNA library

The base editor splice site mutation Cas9 sgRNA library was cloned into the pLCKOv3 (Addgene #209046) lentiviral vector via a single round of Golden Gate assembly, following a similar protocol as described above. A pool of 27,871 oligos (58 nucleotides (nt)) carrying 20 nt Cas9 guide sequences were produced by TWIST Biosciences (see Table S5). Oligos were amplified by PCR using KAPA HiFi HotStart DNA polymerase as described in the “Cloning the optimization hgRNA library” section, the only differences being that four 50 μL reactions containing a 12.5 nM oligo pool were set up for a total of 7 PCR cycles. Amplified oligos were purified and cloned with BveI and T4 DNA Ligase into the pLCKOv3 vector as described in the above sections. Endura electrocompetent cells were transformed, and library plasmids were recovered from bacterial colonies at > 1,000-fold coverage.

Screening strategy for CHyMErA optimization, exon deletion, and splice site mutation

Our study involved three different screens using various cell lines and guide RNA libraries. For the CHyMErA optimization screen, an hgRNA library was transduced into all six HAP1 and RPE1 CHyMErA variant cell lines. For the exon deletion screen, an exon deletion hgRNA library was transduced into HAP1 and RPE1 CHyMErA cell lines stably expressing SpCas9 and opCas12a. For the exon skipping screen, a splice site mutation sgRNA library was transduced into HAP1 base editor cell lines. All transductions were carried out at an MOI of ~ 0.3 and a library coverage of ~ 200 to 500-fold. 24 hours following transduction, the cells were treated with 2 μg/mL puromycin. For the base editor screens, 24 hours post-transduction, the cells were also treated with 2 μg/mL of doxycycline to induce base editor expression. 48 hours after starting puromycin selection (denoted as T0; 72 hours post-transduction), cell pellets corresponding to at least 200-fold library coverage were collected for genomic DNA (gDNA) purification. The remaining selected cells were split into three replicates, each at 200 to 500-fold library coverage, and were passaged over 20 cell doublings at the same coverage. At the end time point, 10, 60, and 20 million cells (200-fold library coverage) were collected for the CHyMErA optimization, exon deletion, and base editor screens, respectively. Genomic DNA was extracted from the cell pellets collected at the start (T0) and end time points using the Wizard Genomic DNA Purification Kit (Promega #A1120). Genomic DNA libraries were prepared for next-generation sequencing (NGS) analysis using a nested PCR approach, as described previously33. First, the lentiviral integrated hgRNA expression cassettes were amplified from the gDNA equivalent of ~100-fold library coverage using NEBNext Ultra II Q5 Master Mix (New England Biolabs #M0544X). Individual 50 μL PCR reactions containing 3.5 μg of gDNA were performed using PCR1_Hybrid_Outer_F2 and PCR1_Hybrid_Outer_R1 primers for the optimization and exon deletion screens, and primers A265 and PCR1_BE_Rev for the base editor screens, respectively (see Table S4). After pooling the individual PCR-1 reactions, a fraction was purified using a PCR purification column and used as a template for PCR-2, in which each sample was barcoded with unique i5 and i7 index primer combinations. The resulting PCR-2 products were resolved on a 2% agarose gel using SYBR Safe DNA stain (ThermoFisher Scientific #S33102). The desired band was excised and subjected to gel extraction (ThermoFisher Scientific #K0691). The extracted libraries were quantified using the Qubit dsDNA HS assay (ThermoFisher Scientific #Q32851) and TapeStation (Agilent #5067–5582 and #5067–5583). The quantified and validated sequencing libraries were pooled and subjected to paired-end sequencing on either an Illumina NextSeq 1000/2000 or a NovaSeq 6000 platform using 100-cycle kits. The following sequencing strategies were applied: DC:27, R1:31, IR1:8, IR2:8, DC:20, R2:30 (optimization and exon deletion screens); DC:20, R1:30, Index1:8, Index2:8 (base editor screens). DC: dark cycles, R1: read 1, R2: read 2, IR1: index read 1, IR2: index read 2.

Construction of color-coded pLCHKOv3 vectors for co-culture assay

The color-coded pLCHKOv3 vectors were derived from the pLCHKOv3 vector by introducing mClover3 (Addgene #209034) or mCherry (Addgene #209035) open reading frames downstream of the puromycin-resistance gene. To this end, the pLCHKOv3 vector was digested with CsiI and MluI enzymes (ThermoFisher Scientific #FD2114 and #FD0564), and the PCR products coding for the respective fluorescent proteins flanked by a T2A self-cleaving peptide were ligated into the vector using homology-based DNA assembly with NEBuilder (New England Biolabs #E2621S) at 50°C for 45 minutes. The primers used to amplify and clone the fluorescent proteins are listed in Table S4.

Focused validation of fitness exons using co-culture assays

For validation of hits from the exon deletion screens, we cloned hgRNAs targeting an intergenic site (i.e., negative control) or 43 exons interrogated in our exon deletion library into pLCHKOv3 vectors co-expressing fluorescent mCherry and mClover3, respectively. For each exon, we included two to three hgRNAs (total 97 hgRNAs, see Table S4), and the targeted exons are either fitness-promoting - or suppressing screen hits in at least one cell line (27 or 9 exons, respectively) or non-hits (7 exons) in both HAP1 and RPE1 cells. 150,000 HAP1 and RPE1 cells co-expressing SpCas9 and opCas12a were transduced with the color-coded lentiviral pLCHKOv3 vectors in 6-well plates, and 24 hours later were selected with 2 μg/mL puromycin for 48 hours. Co-cultures were then set up by mixing cells stably expressing the intergenic hgRNA along with mCherry with cells expressing an exon deletion hgRNA along with mClover3 at a 1:1 ratio. Co-cultures were set up in triplicates or quadruplicates in 48-well plates at a density of 17,500 cells/well and 10,000 cells/well for HAP1 and RPE1 cells, respectively. Plates were sealed with “Breath-Easy” film (Diversified Biotech #BEM-1) and placed into an Incucyte SX1 live cell analysis instrument (Sartorius) 24 hours after seeding. Cells were imaged every 4 hours for 3 days using the green and red channels, at acquisition times of 300 ms and 400 ms, respectively. Images were analyzed using the Incucyte Top-Hat segmentation feature with a 40 μm radius and a threshold of 0.3 GCU (green signal) and 0.1 RCU (red signal). To remove noise from cell debris, a minimal area filter of 40 and 50 μm2 was applied for HAP1 and RPE1 cells, respectively. Images were normalized to the green:red ratio of the first imaged time point 24 hours post-seeding, and subsequently normalized to an intergenic control set imaged at the same time point. The time course data was used for statistical analysis (two-way ANOVA), with the normalized mean ratio of the end time point visualized on the provided graphs.

CHyMErA optimization library design

To assess the editing efficiency of different Cas12a variants within the CHyMErA platform, we designed an optimization library targeting 482 core essential and 362 non-essential genes46 for gene knockout (KO) or exon deletion. Protein-coding transcripts were extracted from the human genome annotation file (gencode.v36.basic.annotation.gff3), obtained from GENCODE. Only transcripts with annotation supported by the HAVANA group were retained for further processing. Constitutive exons with a length of at least 20 nucleotides (nt) were extracted using the exon_utils tool included in MISO124. Repetitive exons and exons containing untranslated regions (UTRs), including both 5’ and 3’ UTRs, were discarded, and the remaining exon sequences were converted to BED format. Only constitutive protein-coding exons from core essential (CEG2) and non-essential (NEG) genes46 were retained for subsequent analyses. GENCODE-annotated introns (with HAVANA support) were extracted using a published Python script (extract_intron_gff3_from_gff3.py) from MISO124. Duplicated introns were also removed. To identify flanking introns for each exon, the coordinates of all retained exons were extended by 3 nt. Flanking introns for each exon were then identified by intersecting the intron and extended exon files using “intersectBed” from BEDTools109. In cases where exons had multiple introns, only the shortest intron region from each side was retained to avoid selecting spacer sequences targeting overlapping coding regions. Exons with a length not divisible by 3, whose deletion is expected to result in disruption of the open reading frame (ORF), were retained for gRNA spacer sequence selection. Exons were further filtered based on the size of their flanking introns, keeping only introns that were at least 151 nt long. The 75 intronic nt flanking an exon were excluded from the gRNA search to avoid introducing mutations within coding sequences or targeting potential splicing signals at the end of each intron.

To map potential Cas9 and Cas12a spacer sequences (referred to as gRNAs), the selected human genome sequences described above were searched for 20 nt sequences upstream of the NGG (i.e., Cas9 spacer) or 23 nt sequences downstream of TTTV (i.e., Cas12a spacer). For the selection of both Cas9 and Cas12a gRNA, we applied stringent filtering to ensure they met specific criteria. First, gRNAs with 4 or more consecutive “T”s were excluded to avoid early Pol III transcription termination. Additionally, any gRNAs containing the recognition sites, as well as the reverse complement sequences, of the type IIS restriction enzymes BfuAI/BveI (ACCTGC and GCAGGT) and BsmBI/Esp3I (CGTCTC and GAGACG), were removed, as these restriction enzymes were used for the library cloning process. The identified gRNAs were further refined by applying cutoffs for on-target activity and specificity scores. For this, we applied the Rule Set 2125 and CHyMErA-Net32 scores for cutting-efficiency predictions for Cas9 and Cas12a, respectively, as well as GuideScan126 for modelling off-target scores. We selected Cas9 gRNAs with an activity score greater than 10 and a specificity score greater than 0.5, and Cas12a gRNAs with an activity score greater than 0.1 and with no more than 1 and 2 potential off-target sites across the human genome within 2 and 3 Hamming distances, respectively.

To design gRNAs targeting genes for KO, the targetable exons and their corresponding gRNAs were identified by intersecting the extracted exons with the filtered gRNAs using “intersectBed” from BEDTools109. Ten gRNAs that passed all the aforementioned filtering criteria and are located towards the 5’ end of the gene were selected. For hgRNAs used for exon deletion, we randomly paired the selected Cas9 and Cas12a gRNAs located in the introns flanking the targeted exon without repetitive usage of gRNAs. Only hgRNAs with a predicted fragment deletion size not exceeding 2,500 nt were retained. For all gRNAs used for gene KO and exon deletion, single targeting control hgRNAs were designed by paring the Cas9 and Cas12a gRNAs with Cas12a and Cas9 gRNAs targeting intergenic regions in the genome, respectively. In addition, we included 1,503 hgRNAs targeting intergenic regions with both the Cas9 and the Cas12a gRNA. Finally, 100 non-targeting hgRNAs targeting EGFP, mClover, mCherry, LacZ, firefly luciferase, renilla luciferase, and nano luciferase were incorporated as additional non-targeting controls, resulting in an optimization library of 18,000 hgRNAs targeting 482 core essential and 362 non-essential genes (see Table S1).

Exon deletion library design

To identify frame-preserving exons that impact cell fitness, we designed a large-scale exon deletion library targeting 12,126 exons with our enhanced CHyMErA platform. To identify targetable protein-coding exons, the human genome annotation file (gencode.v36.basic.annotation.gff3) obtained from GENCODE was used to extract unique protein-coding exons. Subsequently, any exons overlapping untranslated regions (UTRs) were excluded from further analysis. The flanking introns of each remaining exon were defined using the same procedure as described in the optimization library design. However, in this step, the intron length requirement was set to a minimum of 100 nucleotides (nt), with the 50 nt closest to neighboring exons excluded from the gRNA search using a customized Python script. The library included hgRNAs targeting exons in reference core essential genes (CEG2) and non-essential genes (NEG)46, DepMap common essential genes (https://depmap.org/portal/), genes targeted by our previous exon deletion library32, and additional handpicked genes.

Next, the intron sequences obtained were intersected with potential Cas9 or Cas12a gRNA target sites that passed the filtration criteria described below. Cas9 and Cas12a gRNAs containing the BfuAI/BveI restriction sites (ACCTGC and GCAGGT), Cas9 gRNAs starting with CAGGT or ending with GCAG, and Cas12a guides ending with GCAGG, all of which form BfuAI/BveI target sites within the library oligo, were removed. Furthermore, Cas9 or Cas12a gRNAs containing repeats of more than 3 A/T/C/G nucleotides were excluded. Finally, all gRNAs were required to meet specific on-target and off-target scores. For Cas9, the on- and off-target score thresholds were set at 0.5 (using Rule Set 2125) and 0.16 (using GuideScan126), respectively. The latter threshold has been shown to effectively eliminate promiscuous Cas9 gRNAs127.

For Cas12a, the on- and off-target score thresholds were set to 0.25 using CHyMErA-Net32 and 0.5 using newly calculated Cas12a GuideScan enumerations, respectively. To construct a genome-wide Cas12a off-target score database, enumerations of all Cas12a guide RNAs were computed in the hg38 assembly of the human genome using the GuideScan software126. The 5’ PAM sequence TTTN was used with an enumeration up to a Hamming distance of 3. A Cas12a mutation matrix was constructed and utilized to compute specificity scores for all Cas12a guide RNAs39. Specificity scores were constructed such that a score of 1.0 indicates perfect specificity to a target at a Hamming distance of 3, while a specificity score of 0 indicates poor specificity to a target at a Hamming distance of 3. Intermediate specificity scores indicate intermediate levels of specificity to a target.

For exon deletion, Cas9 and Cas12a gRNAs targeting flanking intronic sites of selected exons were paired. First, all possible Cas9-Cas12a gRNA pairs spanning a targeted exon were generated, and gRNA pairs with a predicted deletion size larger than 1,500 nt were filtered out. Exons that had only one pair of Cas9 and Cas12a gRNAs were also discarded. If there was only one available Cas9 or Cas12a gRNA in one intron, up to six Cas12a or Cas9 gRNAs located in the other intron were paired with that single gRNA. If more than three Cas9 or Cas12a gRNAs were available on each flanking intron, a maximum of three Cas9 and Cas12a gRNAs with the closest distance to the targeted exon were retained, and all combinations of Cas9 and Cas12a gRNAs were generated, allowing for a maximum of 18 pairs of gRNAs for each targetable exon.

The targetable exons were then separated into different subgroups. First, exons covering the translation start codon were extracted and named as “AUG frame-disruptive exons” (AUG_FrameDis). Second, alternative exons were determined using MISO124, and if their length was not a multiple of three, they were classified as “MISO-derived alternative frame-disruptive exons” (misoAltFrameDis). Third, for the remaining exons, if the length of the exon was not a multiple of three and the exon was used by more than half of all transcripts from that gene, it was defined as a “frame-disruptive exon” (FrameDis). Finally, for exons divisible by three, we further determined whether deletion of the specific exon could create a new stop codon through joining upstream and downstream exons. Frame-preserving exons whose deletion is expected to create a stop codon were marked as “frame-disrupting” (FrameDis), while all other exons were labeled as “frame-preserving” (FramePre).

For each gene targeted for exon deletion, we also designed gRNA pairs for gene knockout (KO). Cas9 and Cas12a gRNAs that passed the restriction site, nucleotide repeat, and on- and off-target score filtration criteria described above were intersected with exons using “intersectBed” from BEDTools109. Only exons that were included in more than 50% of annotated transcripts were retained. A maximum of three Cas9 and Cas12a gRNAs with the closest distance to the 5’ end of each gene were retained and paired with each other without repetitive usage of gRNAs. Genes that had only one pair of Cas9 and Cas12a gRNAs available were removed, resulting in 2–3 KO gRNA pairs per gene.

Finally, the library was filtered to ensure presence of required control gRNAs and selected gene categories. Genes that possess frame-preserving exons but lack targetable frame-disruptive exons or gene KO gRNAs were removed. In addition, we included an updated optimization section in the exon deletion library to assess the editing efficiency of our screens. In brief, we incorporated all hgRNAs used for exon deletion from the optimization library described above and added single intronic targeting hgRNAs for exon deletion of non-essential genes in the optimization library as controls. We further redesigned single-targeting KO gRNAs using the same strategy described above for the exon deletion library. For core essential genes, we also designed single-targeting hgRNAs for gene KO. Overall, this resulted in a library size of 300,000 hgRNAs targeting 12,221 exons in 2,095 genes (see Table S2).

Base editing library design

A Cas9 adenine and cytosine base editor sgRNA library was designed to disrupt exon inclusion via mutation of the cognate 5’ and 3’ splice sites. Genomic sequences were extracted from the human genome assembly GRCh38, and targetable exons were identified from the GENCODE database (release 40) by filtering for transcripts annotated as “protein-coding” with a transcript support level of 2 or higher. Additionally, the transcripts included needed to have at least two untranslated regions (UTRs) and two or more exons. For each exon included in the selected transcripts, nucleotides flanking the 3’ and 5’ splice sites were extracted and searched for Cas9 PAM sites (i.e., NGG). To identify Cas9 sgRNAs for editing the adenine of 3’ splice sites (i.e., “AG”), 20-nucleotide (nt)-long sgRNA target sites were selected for which downstream Cas9 PAM sites are located within the sense strand of the targeted exon. To identify Cas9 sgRNAs for editing the cytosine on the antisense strand of 3’ splice sites, sgRNAs were selected for which Cas9 PAM sites are located on the antisense strand of the targeted exon. Finally, to identify Cas9 sgRNAs for editing the adenine or cytosine on the opposite strand of 5’ splice sites (i.e., “AC”), sgRNAs were selected for which Cas9 PAM sites are located within the antisense strand of the targeted exon.

Guide RNAs containing consecutive runs of poly(N) stretches (TTTT, TTATT, TTTCTTT, AAAAAA, GGGGGG, and CCCCCC) were filtered out from the candidate pool. Since the BfuAI/BveI restriction site was utilized for library cloning, any sgRNAs containing ACCTGC or its reverse complementary sequence GCAGGT were not included in the library. Furthermore, sgRNAs starting with CAGGT or ending with GCAG, which form BfuAI/BveI target sites within the library oligo, were discarded. sgRNAs with GuideScan off-target score less than 0.16 were excluded as well126. Subsequently, sgRNAs with the potential to mutate either adenine or cytosine were further filtered. Specifically, only Cas9 sgRNAs that targeted the 3’ or 5’ splice sites within a 15 nt targeting window, ranging from position −3 to 12 of the sgRNA, were retained in the library. Considering the editing capabilities of cytosine and adenine base editors, to ensure the preservation of protein reading frames, sgRNAs predicted to induce stop codon mutations within the 0–20 nt of the coding region were excluded from the library.

To create a comprehensive base editing library, we followed a multi-step approach for sgRNA selection. Adenine and cytosine base editing sgRNAs were designed separately, following the same criteria unless specified otherwise. Firstly, an optimization section was designed to assess base editing performance and screening efficiency. This section exclusively consisted of sgRNAs targeting frame-disruptive exons from both CEG2 and NEG46. Exclusion of these selected exons is predicted to disrupt the reading frame across all transcripts of the respective gene. Next, sgRNAs targeting both frame-preserving or frame-disruptive exons of the genes also targeted in the exon deletion library were included. For frame-disruptive exons, an sgRNA located at the 5’ end of the gene was selected to induce a gene knockout phenotype as a control. Only sgRNAs targeting shared genes with both frame-preserving and frame-disruptive exons were retained. If a gene had already been targeted in the optimization section, sgRNAs targeting frame-disruptive exons of that gene were excluded to avoid redundancy. Furthermore, sgRNAs targeting exons with annotated alternative 5’ or 3’ splice sites were excluded. sgRNAs from frame-disruptive exons of genes in the optimization section, and frame-preserving and frame-disruptive exons from the exon deletion library, were merged. As negative controls, the library includes a total of 926 intergenic and 92 non-targeting sgRNAs. Overall, the library contains 27,871 sgRNAs that can inactivate splice sites of 12,075 exons in 2,185 genes via adenine (19,114 sgRNAs) or cytosine (17,224 sgRNAs) base editors (see Table S5).

QUANTIFICATION AND STATISTICAL ANALYSIS

Optimization and exon deletion screen data analysis

The high-throughput sequencing data was processed and mapped using a methodology consistent with our previous publication34. Briefly, the Cas9 and Cas12a gRNA spacer sequences were extracted from the library to create an index for mapping, utilizing the Bowtie algorithm128. Paired-end sequencing reads were demultiplexed and underwent initial processing employing preprocessReadsPE_2.pl. Subsequently, reads were aligned to the library using the bowtie command: “bowtie -p 6 -v 3 −l 18 -t index --un unmapped.fastq --al mapped.fastq −1 read1 −2 read2 aligned.sam” (6 processors, allowing up to 3 mismatches, and a seed length of 18 bases). The resulting SAM file, obtained from the bowtie alignment, was parsed using parseBowtieOutput.pl to generate a read count file for each hgRNA present in the library.

To ensure comparability between samples, read counts were normalized to the same sequencing depth. A pseudocount of 1 was also added to all gRNA pairs to enable the calculation of log2-fold change (LFC), which represents the normalized read count of the end point divided by the normalized read count of the starting point for each gRNA pair.

To evaluate the performance of both gene KO and exon deletion for different nucleases, receiver operating characteristic (ROC) curve analysis and area under the curve (AUC) calculation were conducted by comparing the LFC of hgRNAs programmed to elicit gene KO or to delete frame-disruptive exons of CEG2 and NEG genes46 using the R package PRROC129, respectively. To control for single cut phenotypes, ROC curves for exon deletion were also generated for single targeting hgRNA, where one of the paired gRNAs was replaced with a gRNA targeting an intergenic region. ROC curves were generated at the exon level, utilizing the average LFC values obtained from all hgRNAs targeting the same exons. A read count cut-off of >30 counts was applied for all ROC analyses.

For the large-scale exon deletion screen, to identify exons affecting cell fitness, the MAGeCK mle module50 was used with default parameters. In brief, all hgRNAs targeting intronic regions for exon deletion were included as input, and intergenic-intergenic targeting hgRNAs were used as negative controls. Additionally, we incorporated single intronic targeting hgRNAs, where one gRNA targets the intronic region and the other targets the intergenic region, to serve as controls. These hgRNAs provided a valuable means of assessing the impact of single intronic targeting gRNAs on cell fitness. Furthermore, negative control gRNAs targeting intergenic regions were included to establish the null distribution for p-value calculations. Raw read counts for each hgRNA were utilized as input for the MAGeCK analysis. To confidently identify hits, we applied stringent criteria, requiring both a MAGeCK false discovery rate (FDR) of less than 0.05 and a Wald FDR of less than 0.05. This analysis was conducted consistently for both HAP1 and RPE1 cells.

Base editing library data analysis

To process the Cas9 gRNA sequences from the base editing library, the sequencing reads were demultiplexed, gRNA sequences were extracted, and an index for mapping the reads to our library was created using the bowtie-build tool from Bowtie128. The high-throughput single-read sequencing data were processed with a customized Python script to retain only the gRNA sequences. These sequences were subsequently mapped to the library using the following command: “bowtie -p 6 -v 3 −l 18 -t index --un unmapped.fastq --al mapped.fastq sample.fastq aligned.sam” (6 processors, allowing up to 3 mismatches, and a seed length of 18 bases). After mapping, read counts were calculated for each gRNA using a customized Python script.

In line with the exon deletion analysis, the read counts of each sgRNA were normalized to mitigate the impact of sequencing depth variations. To facilitate the calculation of log2-fold change (LFC), a pseudocount of 1 was incorporated for all gRNA reads. The LFC denotes the ratio of the normalized read count at the end point to that at the starting point for each gRNA, serving as a measure of the relative change in gRNA abundance.

The effective editing distance was optimized based on the distribution of LFC values for gRNAs targeting frame-disruptive exons from both CEG2 and NEG. Based on these results, for adenine editing, the effective editing window was refined to 2 to 11 nucleotides, while for cytosine editing, it was adjusted to −2 to 11 nucleotides. Single guide RNAs targeting splicing sites outside of the updated effective editing window were excluded from further downstream analysis.

To evaluate the performance of the four different base editors tested, we utilized the PRROC package in R to generate ROC curves and AUC scores130. The averaged LFC values of gRNAs targeting frame-disruptive exons from CEG2 and NEG were compared to assess the editing efficiency after applying a read count cut-off of >30 counts. We further used the ROC curves to set up an LFC threshold to identify fitness-promoting exons. To determine the essentiality of exons in each screen, we applied the LFC threshold at 10% FDR and required at least 2 out of 3 replicates to meet the exon-level threshold at 10% FDR as well.

Random forest classifier and feature importance for predicting essential exons

In order to identify the features that discriminate whether an exon is essential (fitness-promoting) or not, we applied random forest machine-learning implemented in the scikit-learn package131. A total of 14 uncorrelated features (Pearson r <0.5) were collected to train the random forest classifier (Figure 4F). Considering numerical and categorical features in our dataset, we scaled all the features between 0 and 1. The features were compiled for the 7,822 targeted frame-preserving exons in HAP1 and RPE1 cell lines (see Table S2). The class label for each exon was obtained from the MAGeCK screen analysis (Table S2), which identifies positively and negatively selected exon-targeting hgRNAs. We marked the exons as fitness-promoting if MAGeCK assigned Wald FDR < 0.05 and FDR < 0.05 with a negative beta score. Any exons which have Wald FDR ≥ 0.05 or FDR ≥ 0.05 were labeled as non-hits. For more comprehensive predictions, we considered the collective non-hit label from HAP1 and RPE1 cell lines. This results in 2,071 fitness-promoting hits and 5,584 non-hits. To assess feature importance from the random forest model, the dataset was split into 80% train and 20% test sets in a stratified fashion using the function “train_test_split” of the Scikit-learn package. We tuned the Random Forest hyperparameters (max_features: [sqrt, log2, None], min_samples_leaf: [1, 2, 3, 4, 5], n_estimators: [500, 600, 700, 800, 900, 1000], oob_score: [True, False]) by applying stratified 5-fold cross-validation with the GridCVsearch (Scikit-learn package). We trained the final model on the selected optimal parameters of 800 decision trees, oob_score set as True, maximum number of features for the best split set as square root of the 14 features, and minimum of samples required per leaf node = 4. To train the frame-preserving alternative exons, the best parameters determined by the GridCVsearch were: oob score = True, number of trees = 600, maximum features = square root of all the features, and minimum samples required at a leaf node = 5. Considering the class imbalance in our dataset, we also used the parameter “class_weight: balanced_subsample” of RandomForestClassifier (Scikit learn). This parameter reduces the influence of the majority class by computing the weights, inversely correlated to the frequency of the class for each tree132. We used scikit-learn’s default settings for other parameters and “random state = 0” for the reproducibility. Finally, the model’s performance was evaluated with receiver operating characteristic (ROC) using the set-aside test data set. Random forest measures feature importance based on Gini impurity computed within decision trees and normalizes the values so that the importance scores add up to 1.

Fitness Exons Feature Analysis

For all positions in a protein, a score for intrinsic disorder was computed using IUPred (iupred.enzim.hu)133. Amino acid residues with a score greater than 0.4 were considered disordered. For each coding exon, the fraction of disordered residues was estimated. For all positions in a protein, low-complexity regions were calculated using SEG (http://www.dbbm.fiocruz.br/cgc/seg.html). Only amino acids not located within ordered Pfam annotated protein domains (pfam.xfam.org), putative transmembrane domains, signal peptides, and coiled coil regions were considered as low-complexity regions. Annotations for post-translational modifications were extracted from PhosphoSite134, dbPTM135, or UniProt136. Annotations for domains and motifs were collected from the Pfam137, 3DID138, UniProt, and ELM139 databases. Translation of coordinates from genomic to protein coordinates was computed using annotations from Ensembl BioMart140.

To evaluate the conservation at the amino acid level of the frame-preserving exons identified in our screen, we performed quantitative comparisons of the protein sequence across multiple species. We downloaded the Multiz Refseq protein alignment141 of 99 vertebrate species to the human (hg38) assembly from UCSC. The conservation value was calculated as the total amino acid residues of 99 vertebrates that matched the human over the total of the aligned residues, without including gaps. When we found multiple transcript alignments for an exon, we kept the one with lowest conservation score. Each exon was attributed a conservation score between 0 and 1, estimating higher conservation if close to 1.

For the conservation analysis of exons and the flanking intronic region at the nucleotide level, we extracted PhastCons scores for comparisons of nucleotide conservation across 100 species from the Genome Browser. Hits indicative of fitness promotion or suppression were independently identified in both HAP1 and RPE1 datasets and subsequently combined. Non-hit controls were defined as exons that were not identified as hits in either the HAP1 or RPE1 datasets. The conservation scores within exonic and 100 nucleotides of flanking intronic regions, including upstream and downstream introns, were assessed using deepTools120. A metagene analysis was performed to compare the conservation scores between fitness-promoting, -suppressing, and non-hit control exons. The average conservation scores within these regions were calculated separately for hits and controls. Subsequently, the averaged conservation scores were compared between fitness-promoting, -suppressing, and control exons using the Wilcoxon rank-sum test.

To estimate the relative position of exons in the corresponding gene (Figure S4D), we considered only the coding region. To do this, we obtained hg38 exon coordinates and annotations of start and stop codons from Gencode (version 36). For each edited exon, the distance between the start and stop codon of the respective gene was determined, excluding introns. We used bedtools intersect to find the overlap between the edited exon and the Gencode annotated exon, and then calculated the total length of all the exons between the start codon and the overlap. The relative position of the targeted exons was determined by normalizing the start point of the edited exon by the total distance between the start and stop codon.

To discern whether a specific exon exhibited alternative or constitutive splicing, we conducted an assessment considering PSI values from two sources: those estimated in this study based on RNA-seq of HAP1 and RPE1 cells using Whippet117, and those sourced from VastDB142, encompassing PSI values from diverse cell and tissue contexts. Specifically, if a particular exon exhibited a PSI value of no more than 0.9 (as estimated from both HAP1 and RPE1 cells using Whippet), or if it possessed a PSI range greater than 0.1 (as estimated from VastDB), we classified it as an alternative exon. Conversely, exons that did not meet these criteria were classified as constitutive exons. Exons lacking PSI values from both HAP1 and RPE1, as well as VastDB, were categorized as undefined.

Given the difference in exon inclusion level between fitness-promoting and -suppressing exons, we generated PSI-matched (non-hit) controls for comparisons, focusing on alternative exons. In brief, for fitness-promoting exons, we compiled the PSI-matched control from the non-hits having the closest PSI score as the exons annotated as fitness-promoting in either HAP1 or RPE1 cell lines. Maximum non-hits were collected as long as a similar PSI distribution was maintained (p-value = 1; Wilcoxon rank sum). Similarly, PSI-matched controls for fitness-suppressing exons were generated from the non-hits with a similar distribution of PSI scores as the exons which were fitness-suppressing in any of the two cell lines (p-value = 0.95; Wilcoxon rank sum).

To investigate the potential influence of mutation frequency on exon fitness, we annotated the exons using an internal relational database maintained at NCI-Frederick. ClinVar143 (version dated 20210529), COSMIC144 (version 94; cancer.sanger.ac.uk), OncoKB145 (version dated 20210623), and MutationAligner146 (version dated 20200521) mutations were extracted from the corresponding databases. The annotation table has variants and annotations from multiple resources with GRCh37 genomic locations. GRCh38 exonic coordinates from the project were first converted to GRCh37 coordinates using CrossMap version 0.6.4. The annotation table was then used to query and extract counts within the required exonic locations. TCGA (v34.0) mutation data were obtained via the UCSC table browser. A data set with de novo mutations in neurodevelopmental disorders was retrieved from a previous study147. These mutation coordinates were subsequently converted to the human genome annotation (hg38) employed in this study using UCSC liftOver (https://genome.ucsc.edu/cgi-bin/hgLiftOver). Subsequently, we performed an intersection analysis between the mutation number in all the targeted exons involved in the exon deletion screen. This allowed us to calculate the mutation frequency for each exon, defined as the number of mutations divided by the length of the exon.

To assess the influence of splice site strength on exon fitness, we calculated the strength of 3’ and 5’ splice sites for each targeted exon using MaxEntScan148. We then compared the splicing scores across various groups of exons within our screen.

Evidence for translation of alternative exons using proteomics and ribosome profiling data

Deep proteome sequencing data from six human cell lines were obtained from a prior study8 and cross-referenced with exons targeted in this study to detect evidence of translation. Exons were deemed as translated alternative exons only if peptides were detected from both inclusion and exclusion events.

Ribosome profiling sequencing datasets from nine different cell lines8492 were downloaded from NCBI. Adaptor sequences were removed from the 3’ end of each read, and only reads longer than 15 nucleotides (nt) were retained for subsequent analysis using cutadapt107 with default parameters. Trimmed reads were initially aligned to human rRNA sequences using Bowtie 2108 with default parameters. Reads that did not align to rRNA were then mapped to the human genome (hg38) using STAR149, and only uniquely mapped reads were retained for further analysis. Reads from each sample were grouped based on fragment length. To ensure the high quality of data, the distribution of ribosome profiling reads around the start and stop codons of canonical open reading frames (ORFs) was examined, and the 3 nt periodicity of the 5’ end of sequencing reads was calculated for each sample as described previously150. Effective ribosome fragments were defined as reads exhibiting distinct 3 nt periodicity, with a minimum of 50% enrichment within one of the three frames, and originating from select highly abundant fragment size groups. Fragments from the same cells were subsequently combined. Splicing junction reads were extracted from alignments of all samples and converted to BED format. To identify evidence for the translation of alternative exons, all targeted exons from the exon deletion screens were compared with splicing junction reads derived from different cells using BEDTools109 and customized Python scripts. Exons with junctions from both exon skipping and either of the two corresponding exon inclusions were defined as potentially translated.

Supplementary Material

1
2

Table S1. CHyMErA optimization screening data, related to Figure 1. (Sheet 1) hgRNA sequence and annotation information for the CHyMErA optimization library, including targeted genes, guide sequences, cut sites, enumerations of on- and off-target scores, and predicted editing type outcomes.

(Sheet 2) Normalized read counts and log2-fold change (LFC) values of hgRNAs in HAP1 and RPE1 CHyMErA optimization screens with different Cas12a variants.

(Sheet 3) Statistical analysis of different guide categories across the Cas12a variants in HAP1 and RPE1 CHyMErA optimization screens. Wilcoxon rank-sum tests were applied.

3

Table S2. Exon deletion screening data, related to Figure 2. (Sheet 1) hgRNA sequence and annotation information for the CHyMErA exon deletion library.

(Sheet 2) Normalized read counts and log2-fold change (LFC) values of hgRNAs in HAP1 and RPE1 CHyMErA exon deletion screens.

(Sheet 3) Exon-level analysis of exon deletion screen in HAP1 and RPE1 cells using the MAGeCK algorithm.

4

Table S3. GUIDE-Seq data, related to Figure 2. List of off- and on-target incorporation sites of the double-stranded oligonucleotide into the genome for the various tested hgRNAs (Sheet 1 = intergenic 2; Sheet 2 = intergenic 3; Sheet 3 = intergenic 4; Sheet 4 = TAF5 exon-8; Sheet 5 = VPS29 exon-2; Sheet 6 = BIN1 exon-13). The Cas nuclease (column A), integration site of the ssODN (column B), total number of reads corresponding to the integration site (column C), coordinates (D), as well as nucleotide (E) and PAM (F) sequences of potential off-target sites, gRNA sequence (G), number of mismatches between the gRNA sequence, and the off- or on-target site, are indicated.

5

Table S4. Oligos & Addgene plasmids used in this study, related to STAR Methods. (Sheet 1) Sequences of oligos used in this study, including gRNA library oligo structure, primers, gRNAs, and probes for northern blotting.

(Sheet 2) Newly cloned plasmids deposited to Addgene.

(Sheet 3) Newly cloned CHyMErA hgRNA libraries deposited to Addgene.

6

Table S5. Base editor screening data, related to Figure 3. (Sheet 1) sgRNA sequence and annotation information for the splice site mutation Cas9 base editor library.

(Sheet 2) Normalized read counts and log2-fold change (LFC) values of hgRNAs in HAP1 and RPE1 CHyMErA optimization screens with different Cas12a variants.

(Sheet 3) Exon-level analysis of HAP1 base editor screen. Exons that affect cell fitness by the adenine and/or cytosine base editor screens are indicated.

7

Table S6. Exon feature analysis, related to Figure 4. List of fitness-promoting and -suppressing exons in HAP1 and RPE1 cells, and features annotated to exons targeted in the exon deletion library.

8

Table S7. TAF5 AP-MS data, related to Figure 5. List of identified peptides and respective counts in TAF5 affinity purification mass spectrometry (AP-MS) analysis.

(Sheet 1) Unfiltered AP-MS data.

(Sheet 2) Filtered MS data where ≥ 5 peptide-spectrum match (PSM) in median TAF5 full length (FL) or exon-8 deleted (ΔE8) are listed.

9

Table S8. TAF5 miniTurboID data, related to Figure 5. List of identified peptides and respective counts in TAF5 miniTurboID proximity labelling mass spectrometry analysis.

(Sheet 1) Unfiltered miniTurboID MS data.

(Sheet 2) Filtered MS data where ≥ 5 PSM in median TAF5 full length (FL) or exon-8 deleted (ΔE8) are listed.

10

Table S9. TAF5 RNA sequencing data, related to Figure 6. Transcriptomics analysis in HEK293 Flp-In cells depleted of endogenous TAF5 and rescued with full-length (FL) or exon-8 deleted (ΔE8) TAF5 isoforms.

(Sheet 1) Unfiltered DESeq2-analyzed data.

(Sheet 2) DESeq2-analyzed data filtered for genes rescued with full-length (FL) TAF5 isoform (siNT vs siTAF5: |FC| > 1.5 & FDR ≤ 0.05 AND siNT vs siTAF5+TAF5-FL: |FC| < 1.25 & FDR > 0.05).

(Sheet 3) DESeq2-analyzed data filtered for genes rescued with exon-8 deleted (ΔE8) TAF5 isoforms (siNT vs siTAF5: |FC| > 1.5 & FDR ≤ 0.05 AND siNT vs siTAF5+TAF5-ΔE8: |FC| < 1.25 & FDR > 0.05).

Highlights.

  • Optimized orthogonal CRISPR-based tools enable exon-resolution functional genomics

  • Identifying myriad fitness-promoting and -suppressing exons within the human genome

  • Fitness-promoting and -suppressing exons have both distinct and common features

  • TAF5 alternative exon-8 is crucial for TFIID assembly and gene expression regulation

ACKNOWLEDGEMENTS

The authors thank Shridhar Hannenhalli, Misha Kashlev, Dan Larson and Keith Lawson for constructive discussions and Timothy Kung, Uma Mudunuri, Smriti Singh, Jesse Turner, Kayla Eury, Chunmei Shi, Mary Guest and Christine Evans for technical guidance. We also thank members of the Gonatopoulos-Pournatzis and Aregger groups, the RNA Biology Laboratory and Molecular Targets Program at NCI-Frederick. We are also grateful to the CCR Sequencing Facility, the Optical Microscopy and Analysis Laboratory and the Flow Cytometry Facility at NCI-Frederick for technical assistance. Figures 1A, 2F and 7A include elements from BioRender.com. This work was supported by the NCI/NIH Intramural Research Program (Projects ZIA BC012019; ZIA BC012102 and contract No. HHSN261201500003I). A.P. was supported by the NIH grant 5R38AG070171.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

DECLARATION OF INTERESTS

The authors declare no competing interests.

Bibliography

  • 1.Pan Q, Shai O, Lee LJ, Frey BJ, and Blencowe BJ (2008). Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nature genetics 40, 1413–1415. 10.1038/ng.259. [DOI] [PubMed] [Google Scholar]
  • 2.Wang ET, Sandberg R, Luo S, Khrebtukova I, Zhang L, Mayr C, Kingsmore SF, Schroth GP, and Burge CB (2008). Alternative isoform regulation in human tissue transcriptomes. Nature 456, 470–476. 10.1038/nature07509. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Barbosa-Morais NL, Irimia M, Pan Q, Xiong HY, Gueroussov S, Lee LJ, Slobodeniuc V, Kutter C, Watt S, Colak R, et al. (2012). The evolutionary landscape of alternative splicing in vertebrate species. Science 338, 1587–1593. 10.1126/science.1230612. [DOI] [PubMed] [Google Scholar]
  • 4.Merkin J, Russell C, Chen P, and Burge CB (2012). Evolutionary dynamics of gene and isoform regulation in Mammalian tissues. Science 338, 1593–1599. 10.1126/science.1228186. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Wright CJ, Smith CWJ, and Jiggins CD (2022). Alternative splicing as a source of phenotypic diversity. Nat Rev Genet 23, 697–710. 10.1038/s41576-022-00514-4. [DOI] [PubMed] [Google Scholar]
  • 6.Tapial J, Ha KCH, Sterne-Weiler T, Gohr A, Braunschweig U, Hermoso-Pulido A, Quesnel-Vallières M, Permanyer J, Sodaei R, Marquez Y, et al. (2017). An atlas of alternative splicing profiles and functional associations reveals new regulatory programs and genes that simultaneously express multiple major isoforms. Genome Res 27, 1759–1768. 10.1101/gr.220962.117. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Nilsen TW, and Graveley BR (2010). Expansion of the eukaryotic proteome by alternative splicing. Nature 463, 457–463. 10.1038/nature08909. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Sinitcyn P, Richards AL, Weatheritt RJ, Brademan DR, Marx H, Shishkova E, Meyer JG, Hebert AS, Westphall MS, Blencowe BJ, et al. (2023). Global detection of human variants and isoforms by deep proteome sequencing. Nat Biotechnol. 10.1038/s41587-023-01714-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Weatheritt RJ, Sterne-Weiler T, and Blencowe BJ (2016). The ribosome-engaged landscape of alternative splicing. Nat Struct Mol Biol 23, 1117–1123. 10.1038/nsmb.3317. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Tress ML, Abascal F, and Valencia A (2017). Alternative Splicing May Not Be the Key to Proteome Complexity. Trends Biochem Sci 42, 98–110. 10.1016/j.tibs.2016.08.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Blencowe BJ (2017). The Relationship between Alternative Splicing and Proteomic Complexity. Trends Biochem Sci 42, 407–408. 10.1016/j.tibs.2017.04.001. [DOI] [PubMed] [Google Scholar]
  • 12.Lynch KW (2015). Thoughts on NGS, alternative splicing and what we still need to know. Rna 21, 683–684. 10.1261/rna.050419.115. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Ule J, and Blencowe BJ (2019). Alternative Splicing Regulatory Networks: Functions, Mechanisms, and Evolution. Mol Cell 76, 329–345. 10.1016/j.molcel.2019.09.017. [DOI] [PubMed] [Google Scholar]
  • 14.Marasco LE, and Kornblihtt AR (2023). The physiology of alternative splicing. Nat Rev Mol Cell Biol 24, 242–254. 10.1038/s41580-022-00545-z. [DOI] [PubMed] [Google Scholar]
  • 15.Bonnal SC, López-Oreja I, and Valcárcel J (2020). Roles and mechanisms of alternative splicing in cancer - implications for care. Nat Rev Clin Oncol 17, 457–474. 10.1038/s41571-020-0350-x. [DOI] [PubMed] [Google Scholar]
  • 16.Bradley RK, and Anczuków O (2023). RNA splicing dysregulation and the hallmarks of cancer. Nat Rev Cancer. 10.1038/s41568-022-00541-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Quesnel-Vallières M, Weatheritt RJ, Cordes SP, and Blencowe BJ (2019). Autism spectrum disorder: insights into convergent mechanisms from transcriptomics. Nat Rev Genet 20, 51–63. 10.1038/s41576-018-0066-2. [DOI] [PubMed] [Google Scholar]
  • 18.Scotti MM, and Swanson MS (2016). RNA mis-splicing in disease. Nature reviews. Genetics 17, 19–32. 10.1038/nrg.2015.3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Jinek M, Chylinski K, Fonfara I, Hauer M, Doudna JA, and Charpentier E (2012). A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity. Science 337, 816–821. 10.1126/science.1225829. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Cong L, Ran FAFA, Cox D, Lin S, Barretto R, Habib N, Hsu PDPD, Wu X, Jiang W, Marraffini LALA, and Zhang F (2013). Multiplex Genome Engineering Using CRISPR/Cas Systems. Science 339, 819–822. 10.1126/science.1225053. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Mali P, Yang L, Esvelt KM, Aach J, Guell M, DiCarlo JE, Norville JE, and Church GM (2013). RNA-guided human genome engineering via Cas9. Science 339, 823–826. 10.1126/science.1232033. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Wang JY, and Doudna JA (2023). CRISPR technology: A decade of genome editing is only the beginning. Science 379, eadd8643. 10.1126/science.add8643. [DOI] [PubMed] [Google Scholar]
  • 23.Doench JG (2018). Am i ready for CRISPR? A user’s guide to genetic screens. Nature Reviews Genetics 19, 67–80. 10.1038/nrg.2017.97. [DOI] [PubMed] [Google Scholar]
  • 24.Bock C, Datlinger P, Chardon F, Coelho MA, Dong MB, Lawson KA, Lu T, Maroc L, Norman TM, Song B, et al. (2022). High-content CRISPR screening. Nature Reviews Methods Primers 2022 2:1 2, 1–23. 10.1038/s43586-021-00093-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Shalem O, Sanjana NE, and Zhang F (2015). High-throughput functional genomics using CRISPR-Cas9. Nat Rev Genet 16, 299–311. 10.1038/nrg3899. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Gonatopoulos-Pournatzis T, Wu M, Braunschweig U, Roth J, Han H, Best AJ, Raj B, Aregger M, O’Hanlon D, Ellis JD, et al. (2018). Genome-wide CRISPR-Cas9 Interrogation of Splicing Networks Reveals a Mechanism for Recognition of Autism-Misregulated Neuronal Microexons. Molecular Cell 72, 510–524.e512. 10.1016/j.molcel.2018.10.008. [DOI] [PubMed] [Google Scholar]
  • 27.Shalem O, Sanjana NE, Hartenian E, Shi X, Scott DA, Mikkelsen TS, Heckl D, Ebert BL, Root DE, Doench JG, and Zhang F (2014). Genome-scale CRISPR-Cas9 knockout screening in human cells. Science 343, 84–87. 10.1126/science.1247005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Wang T, Birsoy K, Hughes NW, Krupczak KM, Post Y, Wei JJ, Lander ES, and Sabatini DM (2015). Identification and characterization of essential genes in the human genome. Science 350, 1096–1101. 10.1126/science.aac7041. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Hart T, Chandrashekhar M, Aregger M, Steinhart Z, Brown KR, MacLeod G, Mis M, Zimmermann M, Fradet-Turcotte A, Sun S, et al. (2015). High-Resolution CRISPR Screens Reveal Fitness Genes and Genotype-Specific Cancer Liabilities. Cell 163, 1515–1526. 10.1016/j.cell.2015.11.015. [DOI] [PubMed] [Google Scholar]
  • 30.Wang T, Wei JJ, Sabatini DM, and Lander ES (2014). Genetic screens in human cells using the CRISPR-Cas9 system. Science 343, 80–84. 10.1126/science.1246981. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Aguirre AJ, Meyers RM, Weir BA, Vazquez F, Zhang CZ, Ben-David U, Cook A, Ha G, Harrington WF, Doshi MB, et al. (2016). Genomic copy number dictates a geneindependent cell response to CRISPR/Cas9 targeting. Cancer Discovery 6, 914–929. 10.1158/2159-8290.CD-16-0154. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Gonatopoulos-Pournatzis T, Aregger M, Brown KR, Farhangmehr S, Braunschweig U, Ward HN, Ha KCH, Weiss A, Billmann M, Durbic T, et al. (2020). Genetic interaction mapping and exon-resolution functional genomics with a hybrid Cas9–Cas12a platform. Nature Biotechnology 38, 638–648. 10.1038/s41587-020-0437-z. [DOI] [PubMed] [Google Scholar]
  • 33.Aregger M, Xing K, and Gonatopoulos-Pournatzis T (2021). Application of CHyMErA Cas9-Cas12a combinatorial genome-editing platform for genetic interaction mapping and gene fragment deletion screening. Nature Protocols 2021 16:10 16, 4722–4765. 10.1038/s41596-021-00595-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Ward HN, Aregger M, Gonatopoulos-Pournatzis T, Billmann M, Ohsumi TK, Brown KR, Blencowe BJ, Moffat J, and Myers CL (2021). Analysis of combinatorial CRISPR screens with the Orthrus scoring pipeline. Nature Protocols 2021 16:10 16, 4766–4798. 10.1038/s41596-021-00596-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Fonfara I, Richter H, Bratovič M, Le Rhun A, and Charpentier E (2016). The CRISPR-associated DNA-cleaving enzyme Cpf1 also processes precursor CRISPR RNA. Nature 532, 517–521. 10.1038/nature17945. [DOI] [PubMed] [Google Scholar]
  • 36.Zetsche B, Heidenreich M, Mohanraju P, Fedorova I, Kneppers J, DeGennaro EM, Winblad N, Choudhury SR, Abudayyeh OO, Gootenberg JS, et al. (2016). Multiplex gene editing by CRISPR–Cpf1 using a single crRNA array. Nature Biotechnology 35, 31–34. 10.1038/nbt.3737. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Thomas JD, Polaski JT, Feng Q, De Neef EJ, Hoppe ER, McSharry MV, Pangallo J, Gabel AM, Belleville AE, Watson J, et al. (2020). RNA isoform screens uncover the essentiality and tumor-suppressor activity of ultraconserved poison exons. Nature Genetics 52, 84–94. 10.1038/s41588-019-0555-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Zhu S, Li W, Liu J, Chen C-HH, Liao Q, Xu P, Xu H, Xiao T, Cao Z, Peng J, et al. (2016). Genome-scale deletion screening of human long non-coding RNAs using a paired-guide RNA CRISPR-Cas9 library. Nature Biotechnology 34, 1279–1286. 10.1038/nbt.3715. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.DeWeirdt PC, Sanson KR, Sangree AK, Hegde M, Hanna RE, Feeley MN, Griffith AL, Teng T, Borys SM, Strand C, et al. (2021). Optimization of AsCas12a for combinatorial genetic screens in human cells. Nature Biotechnology 39, 94–104. 10.1038/s41587-020-0600-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Liu J, Srinivasan S, Li C-Y, Ho IL, Rose J, Shaheen M, Wang G, Yao W, Deem A, Bristow C, et al. (2019). Pooled library screening with multiplexed Cpf1 library. Nature Communications 10, 3144–3144. 10.1038/s41467-019-10963-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Cetin R, Wegner M, Luwisch L, Saud S, Achmedov T, Süsser S, Vera-Guapi A, Müller K, Matthess Y, Quandt E, et al. (2023). Optimized metrics for orthogonal combinatorial CRISPR screens. Sci Rep 13, 7405. 10.1038/s41598-023-34597-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Zetsche B, Gootenberg Jonathan S., Abudayyeh Omar O., Slaymaker Ian M., Makarova Kira S., Essletzbichler P, Volz Sara E., Joung J, van der Oost J, Regev A, et al. (2015). Cpf1 Is a Single RNA-Guided Endonuclease of a Class 2 CRISPR-Cas System. Cell 163, 759–771. 10.1016/j.cell.2015.09.038. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Kleinstiver BP, Sousa AA, Walton RT, Tak YE, Hsu JY, Clement K, Welch MM, Horng JE, Malagon-Lopez J, Scarfò I, et al. (2019). Engineered CRISPR–Cas12a variants with increased activities and improved targeting ranges for gene, epigenetic and base editing. Nature Biotechnology 37, 276–282. 10.1038/s41587-018-0011-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Gier RA, Budinich KA, Evitt NH, Cao Z, Freilich ES, Chen Q, Qi J, Lan Y, Kohli RM, and Shi J (2020). High-performance CRISPR-Cas12a genome editing for combinatorial genetic screening. Nature communications 11, 3455–3455. 10.1038/s41467-020-17209-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Zhang L, Zuris JA, Viswanathan R, Edelstein JN, Turk R, Thommandru B, Rube HT, Glenn SE, Collingwood MA, Bode NM, et al. (2021). AsCas12a ultra nuclease facilitates the rapid generation of therapeutic cell medicines. Nature Communications 2021 12:1 12, 1–15. 10.1038/s41467-021-24017-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Hart T, Tong AHY, Chan K, Van Leeuwen J, Seetharaman A, Aregger M, Chandrashekhar M, Hustedt N, Seth S, Noonan A, et al. (2017). Evaluation and Design of Genome-Wide CRISPR/SpCas9 Knockout Screens. G3: Genes, Genomes, Genetics 7, g3.117.041277-g041273.041117.041277. 10.1534/g3.117.041277. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Moffat J, Grueneberg DA, Yang X, Kim SY, Kloepfer AM, Hinkle G, Piqani B, Eisenhaure TM, Luo B, Grenier JK, et al. (2006). A Lentiviral RNAi Library for Human and Mouse Genes Applied to an Arrayed Viral High-Content Screen. Cell 124, 1283–1298. 10.1016/j.cell.2006.01.040. [DOI] [PubMed] [Google Scholar]
  • 48.Kampmann M, Bassik MC, and Weissman JS (2014). Functional genomics platform for pooled screening and generation of mammalian genetic interaction maps. Nature Protocols 9, 1825–1847. 10.1038/nprot.2014.103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Gilbert Luke A., Horlbeck Max A., Adamson B, Villalta Jacqueline E., Chen Y, Whitehead Evan H., Guimaraes C, Panning B, Ploegh Hidde L., Bassik Michael C., et al. (2014). Genome-Scale CRISPR-Mediated Control of Gene Repression and Activation. Cell 159, 647–661. 10.1016/j.cell.2014.09.029. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Li W, Xu H, Xiao T, Cong L, Love MI, Zhang F, Irizarry RA, Liu JS, Brown M, and Liu XS (2014). MAGeCK enables robust identification of essential genes from genome-scale CRISPR/Cas9 knockout screens. Genome biology 15, 554–554. 10.1186/s13059-014-0554-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Nobles CL, Reddy S, Salas-McKee J, Liu X, June CH, Melenhorst JJ, Davis MM, Zhao Y, and Bushman FD (2019). iGUIDE: an improved pipeline for analyzing CRISPR cleavage specificity. Genome Biol 20, 14. 10.1186/s13059-019-1625-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Malinin NL, Lee G, Lazzarotto CR, Li Y, Zheng Z, Nguyen NT, Liebers M, Topkar VV, Iafrate AJ, Le LP, et al. (2021). Defining genome-wide CRISPR-Cas genome-editing nuclease activity with GUIDE-seq. Nat Protoc 16, 5592–5615. 10.1038/s41596-021-00626-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Morgens DW, Wainberg M, Boyle EA, Ursu O, Araya CL, Tsui CK, Haney MS, Hess GT, Han K, Jeng EE, et al. (2017). Genome-scale measurement of off-target activity using Cas9 toxicity in high-throughput screens. Nature Communications 8, 15178–15178. 10.1038/ncomms15178. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Meyers RM, Bryan JG, McFarland JM, Weir BA, Sizemore AE, Xu H, Dharia NV, Montgomery PG, Cowley GS, Pantel S, et al. (2017). Computational correction of copy number effect improves specificity of CRISPR-Cas9 essentiality screens in cancer cells. Nature Genetics 49, 1779–1784. 10.1038/ng.3984. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Komor AC, Kim YB, Packer MS, Zuris JA, and Liu DR (2016). Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage. Nature 533, 420–424. 10.1038/nature17946. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Nishida K, Arazoe T, Yachie N, Banno S, Kakimoto M, Tabata M, Mochizuki M, Miyabe A, Araki M, Hara KY, et al. (2016). Targeted nucleotide editing using hybrid prokaryotic and vertebrate adaptive immune systems. Science 102, 553–563. 10.1126/science.aaf8729. [DOI] [PubMed] [Google Scholar]
  • 57.Gaudelli NM, Komor AC, Rees HA, Packer MS, Badran AH, Bryson DI, and Liu DR (2017). Programmable base editing of A•T to G•C in genomic DNA without DNA cleavage. Nature 551, 464–471. 10.1038/nature24644. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Gapinske M, Luu A, Winter J, Woods WS, Kostan KA, Shiva N, Song JS, and Perez-Pinera P (2018). CRISPR-SKIP: programmable gene splicing with single base editors. Genome Biol 19, 107. 10.1186/s13059-018-1482-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Yuan J, Ma Y, Huang T, Chen Y, Peng Y, Li B, Li J, Zhang Y, Song B, Sun X, et al. (2018). Genetic Modulation of RNA Splicing with a CRISPR-Guided Cytidine Deaminase. Molecular Cell. 10.1016/j.molcel.2018.09.002. [DOI] [PubMed] [Google Scholar]
  • 60.Kluesner MG, Lahr WS, Lonetree CL, Smeester BA, Qiu X, Slipek NJ, Claudio Vázquez PN, Pitzen SP, Pomeroy EJ, Vignes MJ, et al. (2021). CRISPR-Cas9 cytidine and adenosine base editing of splice-sites mediates highly-efficient disruption of proteins in primary and immortalized cells. Nat Commun 12, 2437. 10.1038/s41467-021-22009-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Martin-Rufino JD, Castano N, Pang M, Grody EI, Joubran S, Caulier A, Wahlster L, Li T, Qiu X, Riera-Escandell AM, et al. (2023). Massively parallel base editing to map variant effects in human hematopoiesis. Cell. 10.1016/j.cell.2023.03.035. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Koblan LW, Doman JL, Wilson C, Levy JM, Tay T, Newby GA, Maianti JP, Raguram A, and Liu DR (2018). Improving cytidine and adenine base editors by expression optimization and ancestral reconstruction. Nat Biotechnol 36, 843–846. 10.1038/nbt.4172. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Zhang X, Chen L, Zhu B, Wang L, Chen C, Hong M, Huang Y, Li H, Han H, Cai B, et al. (2020). Increasing the efficiency and targeting range of cytidine base editors through fusion of a single-stranded DNA-binding protein domain. Nat Cell Biol 22, 740–750. 10.1038/s41556-020-0518-8. [DOI] [PubMed] [Google Scholar]
  • 64.Wang X, Ding C, Yu W, Wang Y, He S, Yang B, Xiong YC, Wei J, Li J, Liang J, et al. (2020). Cas12a Base Editors Induce Efficient and Specific Editing with Low DNA Damage Response. Cell Rep 31, 107723. 10.1016/j.celrep.2020.107723. [DOI] [PubMed] [Google Scholar]
  • 65.Thuronyi BW, Koblan LW, Levy JM, Yeh WH, Zheng C, Newby GA, Wilson C, Bhaumik M, Shubina-Oleinik O, Holt JR, and Liu DR (2019). Continuous evolution of base editors with expanded target compatibility and improved activity. Nat Biotechnol 37, 1070–1079. 10.1038/s41587-019-0193-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Richter MF, Zhao KT, Eton E, Lapinaite A, Newby GA, Thuronyi BW, Wilson C, Koblan LW, Zeng J, Bauer DE, et al. (2020). Phage-assisted evolution of an adenine base editor with improved Cas domain compatibility and activity. Nature Biotechnology, 1–9. 10.1038/s41587-020-0453-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Gaudelli NM, Lam DK, Rees HA, Solá-Esteves NM, Barrera LA, Born DA, Edwards A, Gehrke JM, Lee SJ, Liquori AJ, et al. (2020). Directed evolution of adenine base editors with increased activity and therapeutic application. Nat Biotechnol 38, 892–900. 10.1038/s41587-020-0491-6. [DOI] [PubMed] [Google Scholar]
  • 68.Schapira M, Tyers M, Torrent M, and Arrowsmith CH (2017). WD40 repeat domain proteins: a novel target class? Nat Rev Drug Discov 16, 773–786. 10.1038/nrd.2017.179. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Rancati G, Moffat J, Typas A, and Pavelka N (2018). Emerging and evolving concepts in gene essentiality. Nat Rev Genet 19, 34–49. 10.1038/nrg.2017.74. [DOI] [PubMed] [Google Scholar]
  • 70.Bartha I, Di Iulio J, Venter JC, and Telenti A (2018). Human gene essentiality. Nature Reviews Genetics 19, 51–62. 10.1038/nrg.2017.75. [DOI] [PubMed] [Google Scholar]
  • 71.Ellis JD, Barrios-Rodiles M, Colak R, Irimia M, Kim T, Calarco J.a., Wang X, Pan Q, O’Hanlon D, Kim PM, et al. (2012). Tissue-specific alternative splicing remodels protein-protein interaction networks. Molecular cell 46, 884–892. 10.1016/j.molcel.2012.05.037. [DOI] [PubMed] [Google Scholar]
  • 72.Buljan M, Chalancon G, Eustermann S, Wagner GP, Fuxreiter M, Bateman A, and Babu MM (2012). Tissue-specific splicing of disordered segments that embed binding motifs rewires protein interaction networks. Mol Cell 46, 871–883. 10.1016/j.molcel.2012.05.039. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Yang X, Coulombe-Huntington J, Kang S, Sheynkman GM, Hao T, Richardson A, Sun S, Yang F, Shen YA, Murray RR, et al. (2016). Widespread Expansion of Protein Interaction Capabilities by Alternative Splicing. Cell 164, 805–817. 10.1016/j.cell.2016.01.029. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Bhuiyan T, and Timmers HTM (2019). Promoter Recognition: Putting TFIID on the Spot. Trends Cell Biol 29, 752–763. 10.1016/j.tcb.2019.06.004. [DOI] [PubMed] [Google Scholar]
  • 75.Patel AB, Greber BJ, and Nogales E (2020). Recent insights into the structure of TFIID, its assembly, and its binding to core promoter. Curr Opin Struct Biol 61, 17–24. 10.1016/j.sbi.2019.10.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Schier AC, and Taatjes DJ (2020). Structure and mechanism of the RNA polymerase II transcription machinery. Genes Dev 34, 465–488. 10.1101/gad.335679.119. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Patel AB, Louder RK, Greber BJ, Grünberg S, Luo J, Fang J, Liu Y, Ranish J, Hahn S, and Nogales E (2018). Structure of human TFIID and mechanism of TBP loading onto promoter DNA. Science 362. 10.1126/science.aau8872. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Antonova SV, Haffke M, Corradini E, Mikuciunas M, Low TY, Signor L, van Es RM, Gupta K, Scheer E, Vos HR, et al. (2018). Chaperonin CCT checkpoint function in basal transcription factor TFIID assembly. Nat Struct Mol Biol 25, 1119–1127. 10.1038/s41594-018-0156-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Anczuków O, Akerman M, Cléry A, Wu J, Shen C, Shirole NH, Raimer A, Sun S, Jensen MA, Hua Y, et al. (2015). SRSF1-Regulated Alternative Splicing in Breast Cancer. Mol Cell 60, 105–117. 10.1016/j.molcel.2015.09.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.Li S, Li X, Xue W, Zhang L, Yang LZ, Cao SM, Lei YN, Liu CX, Guo SK, Shan L, et al. (2021). Screening for functional circular RNAs using the CRISPR-Cas13 system. Nat Methods 18, 51–59. 10.1038/s41592-020-01011-4. [DOI] [PubMed] [Google Scholar]
  • 81.Gabel AM, Belleville AE, Thomas JD, McKellar SA, Nicholas TR, Banjo T, Crosse EI, and Bradley RK (2024). Multiplexed screening reveals how cancer-specific alternative polyadenylation shapes tumor growth in vivo. Nat Commun 15, 959. 10.1038/s41467-024-44931-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 82.Shi J, Wang E, Milazzo JP, Wang Z, Kinney JB, and Vakoc CR (2015). Discovery of cancer drug targets by CRISPR-Cas9 screening of protein domains. Nature biotechnology advance on. 10.1038/nbt.3235. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 83.Bertomeu T, Coulombe-Huntington J, Chatr-aryamontri A, Bourdages KG, Coyaud E, Raught B, Xia Y, and Tyers M (2017). A High-Resolution Genome-Wide CRISPR/Cas9 Viability Screen Reveals Structural Features and Contextual Diversity of the Human Cell-Essential Proteome. Molecular and Cellular Biology 38, MCB.00302–00317. 10.1128/mcb.00302-17. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 84.Malecki J, Aileni VK, Ho AYY, Schwarz J, Moen A, Sørensen V, Nilges BS, Jakobsson ME, Leidel SA, and Falnes P (2017). The novel lysine specific methyltransferase METTL21B affects mRNA translation through inducible and dynamic methylation of Lys-165 in human eukaryotic elongation factor 1 alpha (eEF1A). Nucleic Acids Res 45, 4370–4389. 10.1093/nar/gkx002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 85.Raj A, Wang SH, Shim H, Harpak A, Li YI, Engelmann B, Stephens M, Gilad Y, and Pritchard JK (2016). Thousands of novel translated open reading frames in humans inferred by ribosome footprint profiling. Elife 5. 10.7554/eLife.13328. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 86.Mills EW, Wangen J, Green R, and Ingolia NT (2016). Dynamic Regulation of a Ribosome Rescue Pathway in Erythroid Cells and Platelets. Cell Rep 17, 1–10. 10.1016/j.celrep.2016.08.088. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 87.Werner A, Iwasaki S, McGourty CA, Medina-Ruiz S, Teerikorpi N, Fedrigo I, Ingolia NT, and Rape M (2015). Cell-fate determination by ubiquitin-dependent regulation of translation. Nature 525, 523–527. 10.1038/nature14978. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 88.Ji Z, Song R, Huang H, Regev A, and Struhl K (2016). Transcriptome-scale RNase-footprinting of RNA-protein complexes. Nat Biotechnol 34, 410–413. 10.1038/nbt.3441. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 89.Fijalkowska D, Verbruggen S, Ndah E, Jonckheere V, Menschaert G, and Van Damme P (2017). eIF1 modulates the recognition of suboptimal translation initiation sites and steers gene expression via uORFs. Nucleic Acids Res 45, 7997–8013. 10.1093/nar/gkx469. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 90.Oh S, Flynn RA, Floor SN, Purzner J, Martin L, Do BT, Schubert S, Vaka D, Morrissy S, Li Y, et al. (2016). Medulloblastoma-associated DDX3 variant selectively alters the translational response to stress. Oncotarget 7, 28169–28182. 10.18632/oncotarget.8612. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 91.Jang C, Lahens NF, Hogenesch JB, and Sehgal A (2015). Ribosome profiling reveals an important role for translational control in circadian gene expression. Genome Res 25, 1836–1847. 10.1101/gr.191296.115. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 92.Tanenbaum ME, Stern-Ginossar N, Weissman JS, and Vale RD (2015). Regulation of mRNA translation during mitosis. Elife 4. 10.7554/eLife.07957. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 93.Keren H, Lev-Maor G, and Ast G (2010). Alternative splicing and evolution: diversification, exon definition and function. Nat Rev Genet 11, 345–355. 10.1038/nrg2776. [DOI] [PubMed] [Google Scholar]
  • 94.Climente-González H, Porta-Pardo E, Godzik A, and Eyras E (2017). The Functional Impact of Alternative Splicing in Cancer. Cell Rep 20, 2215–2226. 10.1016/j.celrep.2017.08.012. [DOI] [PubMed] [Google Scholar]
  • 95.Singh A, Rajeevan A, Gopalan V, Agrawal P, Day CP, and Hannenhalli S (2022). Broad misappropriation of developmental splicing profile by cancer in multiple organs. Nat Commun 13, 7664. 10.1038/s41467-022-35322-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 96.Gabut M, Samavarchi-Tehrani P, Wang X, Slobodeniuc V, O’Hanlon D, Sung H-K, Alvarez M, Talukder S, Pan Q, Mazzoni Esteban O., et al. (2011). An Alternative Splicing Switch Regulates Embryonic Stem Cell Pluripotency and Reprogramming. Cell 147, 132–146. 10.1016/j.cell.2011.08.023. [DOI] [PubMed] [Google Scholar]
  • 97.Agosto LM, Mallory MJ, Ferretti MB, Blake D, Krick KS, Gazzara MR, Garcia BA, and Lynch KW (2023). Alternative splicing of HDAC7 regulates its interaction with 14-3-3 proteins to alter histone marks and target gene expression. Cell Rep 42, 112273. 10.1016/j.celrep.2023.112273. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 98.Fiszbein A, Giono LE, Quaglino A, Berardino BG, Sigaut L, von Bilderling C, Schor IE, Steinberg JH, Rossi M, Pietrasanta LI, et al. (2016). Alternative Splicing of G9a Regulates Neuronal Differentiation. Cell Rep 14, 2797–2808. 10.1016/j.celrep.2016.02.063. [DOI] [PubMed] [Google Scholar]
  • 99.Arecco N, Mocavini I, Blanco E, Ballaré C, Libman E, Bonnal S, Irimia M, and Di Croce L (2024). Alternative splicing decouples local from global PRC2 activity. Mol Cell 84, 1049–1061.e1048. 10.1016/j.molcel.2024.02.011. [DOI] [PubMed] [Google Scholar]
  • 100.Linares AJ, Lin CH, Damianov A, Adams KL, Novitch BG, and Black DL (2015). The splicing regulator PTBP1 controls the activity of the transcription factor Pbx1 during neuronal differentiation. Elife 4, e09268. 10.7554/eLife.09268. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 101.Havens MA, and Hastings ML (2016). Splice-switching antisense oligonucleotides as therapeutic drugs. Nucleic Acids Res 44, 6549–6563. 10.1093/nar/gkw533. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 102.Bennett CF, Krainer AR, and Cleveland DW (2019). Antisense Oligonucleotide Therapies for Neurodegenerative Diseases. Annu Rev Neurosci 42, 385–406. 10.1146/annurev-neuro-070918-050501. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 103.Konermann S, Lotfy P, Brideau NJ, Oki J, Shokhirev MN, and Hsu PD (2018). Transcriptome Engineering with RNA-Targeting Type VI-D CRISPR Effectors. Cell 173, 665–676.e614. 10.1016/j.cell.2018.02.033. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 104.Du M, Jillette N, Zhu JJ, Li S, and Cheng AW (2020). CRISPR artificial splicing factors. Nat Commun 11, 2973. 10.1038/s41467-020-16806-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 105.Zimmermann M, Murina O, Reijns MAM, Agathanggelou A, Challis R, Tarnauskaite Ž, Muir M, Fluteau A, Aregger M, McEwan A, et al. (2018). CRISPR screens identify genomic ribonucleotides as a source of PARP-trapping lesions. Nature 559, 285–289. 10.1038/s41586-018-0291-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 106.Olivieri M, Cho T, Álvarez-Quilón A, Li K, Schellenberg MJ, Zimmermann M, Hustedt N, Rossi SE, Adam S, Melo H, et al. (2020). A Genetic Map of the Response to DNA Damage in Human Cells. Cell 182, 1–16. 10.1016/j.cell.2020.05.040. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 107.Martin M (2011). Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.journal 17, 3. 10.14806/ej.17.1.200. [DOI] [Google Scholar]
  • 108.Langmead B, and Salzberg SL (2012). Fast gapped-read alignment with Bowtie 2. Nat Methods 9, 357–359. 10.1038/nmeth.1923. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 109.Quinlan AR, and Hall IM (2010). BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842. 10.1093/bioinformatics/btq033. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 110.Huang TP, Newby GA, and Liu DR (2021). Precision genome editing using cytosine and adenine base editors in mammalian cells. Nature Protocols 16, 1089–1128. 10.1038/s41596-020-00450-9. [DOI] [PubMed] [Google Scholar]
  • 111.Ramlee MK, Yan T, Cheung AM, Chuah CT, and Li S (2015). High-throughput genotyping of CRISPR/Cas9-mediated mutants using fluorescent PCR-capillary gel electrophoresis. Sci Rep 5, 15587. 10.1038/srep15587. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 112.Lundin S, Stranneheim H, Pettersson E, Klevebring D, and Lundeberg J (2010). Increased throughput by parallelization of library preparation for massive sequencing. PLoS One 5, e10029. 10.1371/journal.pone.0010029. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 113.Clement K, Rees H, Canver MC, Gehrke JM, Farouni R, Hsu JY, Cole MA, Liu DR, Joung JK, Bauer DE, and Pinello L (2019). CRISPResso2 provides accurate and rapid genome editing sequence analysis. Nat Biotechnol 37, 224–226. 10.1038/s41587-019-0032-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 114.Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, and Ideker T (2003). Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res 13, 2498–2504. 10.1101/gr.1239303. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 115.Doncheva NT, Morris JH, Gorodkin J, and Jensen LJ (2019). Cytoscape StringApp: Network Analysis and Visualization of Proteomics Data. J Proteome Res 18, 623–632. 10.1021/acs.jproteome.8b00702. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 116.Anders S, Pyl PT, and Huber W (2015). HTSeq--a Python framework to work with high-throughput sequencing data. Bioinformatics 31, 166–169. 10.1093/bioinformatics/btu638. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 117.Sterne-Weiler T, Weatheritt RJ, Best AJ, Ha KCH, and Blencowe BJ (2018). Efficient and Accurate Quantitative Profiling of Alternative Splicing Patterns of Any Complexity on a Laptop. Mol Cell 72, 187–200.e186. 10.1016/j.molcel.2018.08.018. [DOI] [PubMed] [Google Scholar]
  • 118.Kolberg L, Raudvere U, Kuzmin I, Adler P, Vilo J, and Peterson H (2023). g:Profiler-interoperable web service for functional enrichment analysis and gene identifier mapping (2023 update). Nucleic Acids Res 51, W207–w212. 10.1093/nar/gkad347. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 119.Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, and Durbin R (2009). The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079. 10.1093/bioinformatics/btp352. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 120.Ramírez F, Dündar F, Diehl S, Grüning BA, and Manke T (2014). deepTools: a flexible platform for exploring deep-sequencing data. Nucleic Acids Res 42, W187–191. 10.1093/nar/gku365. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 121.Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, Tunyasuvunakool K, Bates R, Žídek A, Potapenko A, et al. (2021). Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589. 10.1038/s41586-021-03819-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 122.Varadi M, Anyango S, Deshpande M, Nair S, Natassia C, Yordanova G, Yuan D, Stroe O, Wood G, Laydon A, et al. (2022). AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Res 50, D439–d444. 10.1093/nar/gkab1061. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 123.Pettersen EF, Goddard TD, Huang CC, Meng EC, Couch GS, Croll TI, Morris JH, and Ferrin TE (2021). UCSF ChimeraX: Structure visualization for researchers, educators, and developers. Protein Sci 30, 70–82. 10.1002/pro.3943. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 124.Katz Y, Wang ET, Airoldi EM, and Burge CB (2010). Analysis and design of RNA sequencing experiments for identifying isoform regulation. Nat Methods 7, 1009–1015. 10.1038/nmeth.1528. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 125.Doench JG, Fusi N, Sullender M, Hegde M, Vaimberg EW, Donovan KF, Smith I, Tothova Z, Wilen C, Orchard R, et al. (2016). Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR-Cas9. Nature Biotechnology 34, 184–191. 10.1038/nbt.3437. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 126.Perez AR, Pritykin Y, Vidigal JA, Chhangawala S, Zamparo L, Leslie CS, and Ventura A (2017). GuideScan software for improved single and paired CRISPR guide RNA design. Nature Biotechnology 35, 347–349. 10.1038/nbt.3804. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 127.Perez AR, Sala L, Perez RK, and Vidigal JA (2021). CSC software corrects off-target mediated gRNA depletion in CRISPR-Cas9 essentiality screens. Nat Commun 12, 6461. 10.1038/s41467-021-26722-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 128.Langmead B, Trapnell C, Pop M, and Salzberg SL (2009). Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10, R25. 10.1186/gb-2009-10-3-r25. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 129.Robin X, Turck N, Hainard A, Tiberti N, Lisacek F, Sanchez JC, and Muller M (2011). pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics 12, 77. 10.1186/1471-2105-12-77. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 130.Grau J, Grosse I, and Keilwagen J (2015). PRROC: computing and visualizing precision-recall and receiver operating characteristic curves in R. Bioinformatics 31, 2595–2597. 10.1093/bioinformatics/btv153. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 131.Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, et al. (2011). Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res 12, 2825–2830. [Google Scholar]
  • 132.Thölke P, Mantilla-Ramos YJ, Abdelhedi H, Maschke C, Dehgan A, Harel Y, Kemtur A, Mekki Berrada L, Sahraoui M, Young T, et al. (2023). Class imbalance should not throw you off balance: Choosing the right classifiers and performance metrics for brain decoding with imbalanced data. Neuroimage 277, 120253. 10.1016/j.neuroimage.2023.120253. [DOI] [PubMed] [Google Scholar]
  • 133.Dosztányi Z (2018). Prediction of protein disorder based on IUPred. Protein Sci 27, 331–340. 10.1002/pro.3334. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 134.Hornbeck PV, Zhang B, Murray B, Kornhauser JM, Latham V, and Skrzypek E (2015). PhosphoSitePlus, 2014: mutations, PTMs and recalibrations. Nucleic Acids Res 43, D512–520. 10.1093/nar/gku1267. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 135.Lu CT, Huang KY, Su MG, Lee TY, Bretaña NA, Chang WC, Chen YJ, Chen YJ, and Huang HD (2013). DbPTM 3.0: an informative resource for investigating substrate site specificity and functional association of protein post-translational modifications. Nucleic Acids Res 41, D295–305. 10.1093/nar/gks1229. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 136.UniProt: the Universal Protein Knowledgebase in 2023. (2023). Nucleic Acids Res 51, D523–d531. 10.1093/nar/gkac1052. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 137.El-Gebali S, Mistry J, Bateman A, Eddy SR, Luciani A, Potter SC, Qureshi M, Richardson LJ, Salazar GA, Smart A, et al. (2019). The Pfam protein families database in 2019. Nucleic Acids Res 47, D427–d432. 10.1093/nar/gky995. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 138.Mosca R, Céol A, Stein A, Olivella R, and Aloy P (2014). 3did: a catalog of domain-based interactions of known three-dimensional structure. Nucleic Acids Res 42, D374–379. 10.1093/nar/gkt887. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 139.Kumar M, Michael S, Alvarado-Valverde J, Mészáros B, Sámano-Sánchez H, Zeke A, Dobson L, Lazar T, Örd M, Nagpal A, et al. (2022). The Eukaryotic Linear Motif resource: 2022 release. Nucleic Acids Res 50, D497–d508. 10.1093/nar/gkab975. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 140.Kinsella RJ, Kähäri A, Haider S, Zamora J, Proctor G, Spudich G, Almeida-King J, Staines D, Derwent P, Kerhornou A, et al. (2011). Ensembl BioMarts: a hub for data retrieval across taxonomic space. Database (Oxford) 2011, bar030. 10.1093/database/bar030. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 141.Blanchette M, Kent WJ, Riemer C, Elnitski L, Smit AF, Roskin KM, Baertsch R, Rosenbloom K, Clawson H, Green ED, et al. (2004). Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res 14, 708–715. 10.1101/gr.1933104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 142.Gohr A, Mantica F, Hermoso-Pulido A, Tapial J, Márquez Y, and Irimia M (2022). Computational Analysis of Alternative Splicing Using VAST-TOOLS and the VastDB Framework. Methods Mol Biol 2537, 97–128. 10.1007/978-1-0716-2521-7_7. [DOI] [PubMed] [Google Scholar]
  • 143.Landrum MJ, Lee JM, Benson M, Brown GR, Chao C, Chitipiralla S, Gu B, Hart J, Hoffman D, Jang W, et al. (2018). ClinVar: improving access to variant interpretations and supporting evidence. Nucleic Acids Res 46, D1062–d1067. 10.1093/nar/gkx1153. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 144.Tate JG, Bamford S, Jubb HC, Sondka Z, Beare DM, Bindal N, Boutselakis H, Cole CG, Creatore C, Dawson E, et al. (2019). COSMIC: the Catalogue Of Somatic Mutations In Cancer. Nucleic Acids Res 47, D941–d947. 10.1093/nar/gky1015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 145.Chakravarty D, Gao J, Phillips SM, Kundra R, Zhang H, Wang J, Rudolph JE, Yaeger R, Soumerai T, Nissan MH, et al. (2017). OncoKB: A Precision Oncology Knowledge Base. JCO Precis Oncol 2017. 10.1200/po.17.00011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 146.Gauthier NP, Reznik E, Gao J, Sumer SO, Schultz N, Sander C, and Miller ML (2016). MutationAligner: a resource of recurrent mutation hotspots in protein domains in cancer. Nucleic Acids Res 44, D986–991. 10.1093/nar/gkv1132. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 147.Prevalence and architecture of de novo mutations in developmental disorders. (2017). Nature 542, 433–438. 10.1038/nature21062. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 148.Yeo G, and Burge CB (2004). Maximum entropy modeling of short sequence motifs with applications to RNA splicing signals. J Comput Biol 11, 377–394. 10.1089/1066527041410418. [DOI] [PubMed] [Google Scholar]
  • 149.Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, and Gingeras TR (2013). STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21. 10.1093/bioinformatics/bts635. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 150.Ji Z (2018). RibORF: Identifying Genome-Wide Translated Open Reading Frames Using Ribosome Profiling. Curr Protoc Mol Biol 124, e67. 10.1002/cpmb.67. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1
2

Table S1. CHyMErA optimization screening data, related to Figure 1. (Sheet 1) hgRNA sequence and annotation information for the CHyMErA optimization library, including targeted genes, guide sequences, cut sites, enumerations of on- and off-target scores, and predicted editing type outcomes.

(Sheet 2) Normalized read counts and log2-fold change (LFC) values of hgRNAs in HAP1 and RPE1 CHyMErA optimization screens with different Cas12a variants.

(Sheet 3) Statistical analysis of different guide categories across the Cas12a variants in HAP1 and RPE1 CHyMErA optimization screens. Wilcoxon rank-sum tests were applied.

3

Table S2. Exon deletion screening data, related to Figure 2. (Sheet 1) hgRNA sequence and annotation information for the CHyMErA exon deletion library.

(Sheet 2) Normalized read counts and log2-fold change (LFC) values of hgRNAs in HAP1 and RPE1 CHyMErA exon deletion screens.

(Sheet 3) Exon-level analysis of exon deletion screen in HAP1 and RPE1 cells using the MAGeCK algorithm.

4

Table S3. GUIDE-Seq data, related to Figure 2. List of off- and on-target incorporation sites of the double-stranded oligonucleotide into the genome for the various tested hgRNAs (Sheet 1 = intergenic 2; Sheet 2 = intergenic 3; Sheet 3 = intergenic 4; Sheet 4 = TAF5 exon-8; Sheet 5 = VPS29 exon-2; Sheet 6 = BIN1 exon-13). The Cas nuclease (column A), integration site of the ssODN (column B), total number of reads corresponding to the integration site (column C), coordinates (D), as well as nucleotide (E) and PAM (F) sequences of potential off-target sites, gRNA sequence (G), number of mismatches between the gRNA sequence, and the off- or on-target site, are indicated.

5

Table S4. Oligos & Addgene plasmids used in this study, related to STAR Methods. (Sheet 1) Sequences of oligos used in this study, including gRNA library oligo structure, primers, gRNAs, and probes for northern blotting.

(Sheet 2) Newly cloned plasmids deposited to Addgene.

(Sheet 3) Newly cloned CHyMErA hgRNA libraries deposited to Addgene.

6

Table S5. Base editor screening data, related to Figure 3. (Sheet 1) sgRNA sequence and annotation information for the splice site mutation Cas9 base editor library.

(Sheet 2) Normalized read counts and log2-fold change (LFC) values of hgRNAs in HAP1 and RPE1 CHyMErA optimization screens with different Cas12a variants.

(Sheet 3) Exon-level analysis of HAP1 base editor screen. Exons that affect cell fitness by the adenine and/or cytosine base editor screens are indicated.

7

Table S6. Exon feature analysis, related to Figure 4. List of fitness-promoting and -suppressing exons in HAP1 and RPE1 cells, and features annotated to exons targeted in the exon deletion library.

8

Table S7. TAF5 AP-MS data, related to Figure 5. List of identified peptides and respective counts in TAF5 affinity purification mass spectrometry (AP-MS) analysis.

(Sheet 1) Unfiltered AP-MS data.

(Sheet 2) Filtered MS data where ≥ 5 peptide-spectrum match (PSM) in median TAF5 full length (FL) or exon-8 deleted (ΔE8) are listed.

9

Table S8. TAF5 miniTurboID data, related to Figure 5. List of identified peptides and respective counts in TAF5 miniTurboID proximity labelling mass spectrometry analysis.

(Sheet 1) Unfiltered miniTurboID MS data.

(Sheet 2) Filtered MS data where ≥ 5 PSM in median TAF5 full length (FL) or exon-8 deleted (ΔE8) are listed.

10

Table S9. TAF5 RNA sequencing data, related to Figure 6. Transcriptomics analysis in HEK293 Flp-In cells depleted of endogenous TAF5 and rescued with full-length (FL) or exon-8 deleted (ΔE8) TAF5 isoforms.

(Sheet 1) Unfiltered DESeq2-analyzed data.

(Sheet 2) DESeq2-analyzed data filtered for genes rescued with full-length (FL) TAF5 isoform (siNT vs siTAF5: |FC| > 1.5 & FDR ≤ 0.05 AND siNT vs siTAF5+TAF5-FL: |FC| < 1.25 & FDR > 0.05).

(Sheet 3) DESeq2-analyzed data filtered for genes rescued with exon-8 deleted (ΔE8) TAF5 isoforms (siNT vs siTAF5: |FC| > 1.5 & FDR ≤ 0.05 AND siNT vs siTAF5+TAF5-ΔE8: |FC| < 1.25 & FDR > 0.05).

Data Availability Statement

  • CRISPR screening, GUIDE-seq, RNA-seq, and ChIP-seq data generated by this study have been deposited at GEO and are publicly available. Accession numbers are listed in the key resources table. The TAF5 splice isoform mass spectrometry data associated with this study have been deposited to the ProteomeXchange consortium through partner MassIVE (massive.ucsd.edu). Microscopy data have been deposited to Mendeley. Accession numbers and DOI are listed in the key resources table.

  • This paper does not report original code.

  • Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.

REAGENT or RESOURCE SOURCE IDENTIFIER
Antibodies
TAF1 Cell Signaling Technology Cat#12781S; RRID:AB_2798025
TAF5 ThermoFisher Scientific Cat#MA3–076; RRID:AB_2633321
TAF6 ThermoFisher Scientific Cat#A301–276A-M; RRID:AB_2779789
TAF10 Sigma Cat#MABE1079; RRID:AB_10952566
TAF12 Proteintech Cat#12353–1-AP; RRID:AB_2271582
TBP Proteintech Cat#22006–1-AP; RRID:AB_10951514
CCT2 Proteintech Cat#24896–1-AP; RRID:AB_2879783
SpCas9 Diagenode Cat#C15200229; RRID:AB_2889848#C15310258; RRID:AB_2715516
FLAG M2 Sigma Cat#F3165; RRID:AB_259529
HA tag Sigma Cat#H3663; RRID:AB_262051
Myc tag Proteintech; Sigma-Aldrich Cat#60003–2-Ig; RRID:AB_2734122 Cat#M4439; RRID:AB_439694
GAPDH Proteintech Cat#10494–1-AP; RRID:AB_2263076
β-Tubulin Proteintech Cat#10094–1-AP; RRID:AB_2210695
CD46-BV421 BD Biosciences Cat#743776; RRID:AB_2741744
RNA polymerase II ThermoFisher Scientific Cat#A300653A; RRID:AB_519334
Rabbit IgG Proteintech Cat#30000–0-AP; AB_2819035
Anti-rabbit IgG, HRP-linked Antibody Cell Signaling Technology Cat#7074; RRID:AB_2099233
Anti-mouse IgG, HRP-linked Antibody Cell Signaling Technology Cat#7076; RRID:AB_330924
Goat anti-Rabbit IgG (H+L) Highly Cross-Adsorbed Secondary Antibody, Alexa Fluor Plus 488 ThermoFisher Scientific Cat#A32731; RRID:AB_2633280
Goat anti-Mouse IgG (H+L) Highly Cross-Adsorbed Secondary Antibody, Alexa Fluor Plus 647 ThermoFisher Scientific Cat#A32728; RRID:AB_2633277
Bacterial and virus strains
NEB Stable competent E. coli cells New England Biolabs Cat#C3040H
Endura electrocompetent cells LGC Biosearch Technologies Cat#60242–2
Chemicals, peptides, and recombinant proteins
TRIzol Sigma-Aldrich Cat#T3934
Paraformaldehyde ThermoFisher Scientific Cat#28908
Formaldehyde Pierce Cat#PI28908
Glycine Sigma-Aldrich Cat#G8898
Puromycin ThermoFisher Scientific Cat#A1113803
G418 Sulfate Gibco Cat#10131027
Blasticidin S Gibco Cat#A1113903
Accutase Sigma-Aldrich Cat#A6964
Hygromycin ThermoFisher Scientific Cat#10687010
Doxycycline Sigma Cat#D9891
D-sorbitol Sigma-Aldrich Cat#S1876
Tunicamycin Sigma-Aldrich Cat#SML1287
Hydrogen peroxide Sigma-Aldrich Cat#H1009
Mitoxantrone MedChemExpress Cat#HY-13502
Critical commercial assays
RNeasy Plus Mini Kit Qiagen Cat#74136
RNA loading dye ThermoFisher Scientific Cat#LC6876
10% TBE-Urea gel ThermoFisher Scientific Cat#EC68752BOX
Hybond-N+ membrane Amersham Cat#NV0796
ULTRAhyb-Oligo hybridization buffer ThermoFisher Scientific Cat#AM8663
North2South Chemiluminescent Hybridization and Detection reagents ThermoFisher Scientific Cat#17097
P3 Primary Cell 4D-Nucleofector X Kit S Lonza Cat#V4XP-3032
Wizard Genomic DNA Purification Kit Promega Cat#A1120
NEBNext Ultra II Q5 Master Mix New England Biolabs Cat#M0544X
QIAquick PCR Purification Kit Qiagen Cat#28104
GeneJET PCR purification column ThermoFisher Scientific Cat#K0701
GeneJET Gel Extraction Kit ThermoFisher Scientific Cat#K0692
KAPA HiFi HotStart DNA polymerase Roche Cat# KK2601
PrimeSTAR® Max DNA Polymerase Takara Bio Cat#R045B
SPRIselect beads Beckman Cat#B23318
D1000 ScreenTape Agilent Cat#5067–5582; Cat#5067–5583
High Sensitivity D1000 ScreenTape Agilent Cat#5067–5584
High Sensitivity D1000 Reagents Agilent Cat#5067–5585
Qubit dsDNA HS assay ThermoFisher Scientific Cat#Q32851
PhiX Illumina Cat#FC-110–3001
MiSeq Reagent Micro Kit v2 (300 cycles) Illumina Cat#MS-103–1002
NextSeq 2000 P2 Reagents (200 Cycles) v3 Illumina Cat#20046812
NovaSeq 6000 S1 platform (200 cycles kit) Illumina Cat#20028318
NovaSeq 6000 S1 platform (100 cycles kit) Illumina Cat#20028319
Illumina Stranded mRNA Prep kit Illumina Cat#15031047
In-Fusion Snap Assembly Master Mix Takara Bio Cat#638948
NEBuilder HiFi DNA Assembly Master Mix New England Biolabs Cat#E2621L
X-tremeGENE 9 DNA Transfection Reagent Sigma-Aldrich Cat#6365809001
Zombie NIR viability dye BioLegend Cat#423106
one-step SensiFAST real-time PCR kit Bioline Cat#BIO-72001
Maxima H Minus First Strand cDNA Synthesis Kit ThermoFisher Scientific Cat#K1652
SensiFAST SYBR No-ROX Kit Bioline Cat#BIO-98050
Accel-NGS 2S Plus DNA Library Kit Swift Biosciences Cat#21096
Maxi-prep plasmid purification kit Invitrogen Cat#K210016
BP clonase II ThermoFisher Scientific Cat#11789020
LR clonase II ThermoFisher Scientific Cat#11791020
BveI ThermoFisher Scientific Cat#FD1744
Esp3I ThermoFisher Scientific Cat#FD0454
XhoI New England Biolabs Cat#R0146L
CsiI ThermoFisher Scientific Cat#FD2114
MluI ThermoFisher Scientific Cat#FD0564
FastAP ThermoFisher Scientific Cat#EF0651
T4 DNA ligase New England Biolabs Cat#M0202
Lipofectamine RNAiMax ThermoFisher Scientific Cat#13778150
Streptavidin Sepharose Beads ThermoFisher Scientific Cat#20353
Dynabeads protein G ThermoFisher Scientific Cat#10004D
Proteinase-K ThermoFisher Scientific Cat#EO0492
RNaseA Invitrogen Cat#12091021
Bicinchoninic acid (BCA) assay Pierce Cat#23225
Bradford reagent BioRad Cat#5000006
NuPAGE LDS Sample Buffer (4x) ThermoFisher Scientific Cat# NP0007
cOmplete protease inhibitors Roche Cat#11836145001
4–12% Bis‐Tris gels Life Technologies Cat#NP0323BOX
Immobilon-P PVDF membrane Sigma-Aldrich Cat#IPVH00010
SuperSignal West Pico PLUS chemiluminescence reagent ThermoFisher Scientific Cat#34580
DMEM with high glucose and pyruvate Gibco Cat#11995073
heat-inactivated fetal bovine serum (HI-FBS) Gibco Cat#16140071
penicillin-streptomycin Gibco Cat#15140122
trypsin-EDTA Gibco Cat#25200056
Opti-MEM Gibco Cat#31985062
Deposited data
CRISPR screen sequencing This study GEO: GSE244337
GUIDE-seq data This study GEO: GSE262849
HAP1 and RPE1 RNA-Seq data This study GEO: GSE244340
TAF5 RNA-Seq This study GEO: GSE244357
ChIP-seq data This study GEO: GSE244373
All sequencing data produced in this study This study GEO: GSE244374
TAF5 AP-MS data This study MassIVE: MSV000092798
TAF5 miniTurboID data This study MassIVE: MSV000092798
Microscopy data This study DOI: 10.17632/3sdsc83vsn.2
HAP1 cells ribosome profiling data Malecki et al.84 GEO: GSE93133
Lymphoblastoid cell lines (GM19204 and GM19238) ribosome profiling data Raj et al.85 GEO: GSE75290
Primary human reticulocytes ribosome profiling data Mills et al.86 GEO: GSE85864
hES cells (H1) ribosome profiling data Werner et al.87 GEO: GSE62247 (SRR1610244 to SRR1610259)
BJ fibroblast cell lines (EH, EL and ELR) ribosome profiling data Ji et al.88 GEO: GSE65885 (SRR1802146 to SRR1802148; SRR1802152 to SRR1802154)
HCT116 cells ribosome profiling data Fijałkowska et al.89 GEO: GSE87328
HEK293 cells ribosome profiling data Oh et al.90 GEO: GSE70804
U2OS human osteosarcoma cell line ribosome profiling data Jang et al.91 GEO: GSE56924
RPE1 cells ribosome profiling data Tanenbaum et al.92 GEO: GSE67902
Proteomics data Sinitcyn et al.8 PMID: 36959352
Experimental models: Cell lines
HAP1 Horizon Discovery Cat#C631
HAP1 Cas9 Horizon Discovery Cat#Cas9-011
HAP1 Cas9/Cas12a This study N/A
RPE1 hTERT ATCC Cat# CRL-4000
RPE1 hTERT TP53 −/− Cas9 Hart et al.29 N/A
RPE1 hTERT TP53 −/− Cas9/Cas12a This study N/A
HEK293T ATCC Cat#ACS-4500
HEK293 Flp-In T-REx cells Invitrogen Cat#R78007
Oligonucleotides
See Table S4 This study N/A
Recombinant DNA
See Table S4 & Methods This study N/A
Software and algorithms
AlphaFold2 2.3.1 Jumper et al.121, Varadiet al.122 https://alphafold.ebi.ac.uk/download
BEDTools 2.30.0 Quinlan and Hall109 https://bedtools.readthedocs.io/en/latest/
Biomart Kinsella et al.140 http://useast.ensembl.org/biomart/martview/62cd2a1c43671bf181f1a4d18e66b210
BLAT UCSC Genome Browser https://hgdownload.soe.ucsc.edu/downloads.html#utilities_downloads
bowtie 1.3.1 Langmead et al.128 https://bowtie-bio.sourceforge.net/manual.shtml
bowtie2 2.5.1 Langmead and Salzberg108 https://bowtie-bio.sourceforge.net/bowtie2/manual.shtml
CRISPResso2 Clement et al.113 https://github.com/pinellolab/CRISPResso2
cutadapt 1.18 Martin107 https://cutadapt.readthedocs.io/en/stable/
deepTools 3.5.2 Ramírez et al.120 https://deeptools.readthedocs.io/en/develop/
g:Profiler Kolberg et al.118 https://biit.cs.ut.ee/gprofiler/gost
GuideScan Perez et al.126 https://guidescan.com
MAGeCK mle module Li et al.50 https://sourceforge.net/p/mageck/wiki/Home/
MaxEntScan Yeo and Burge148 http://hollywood.mit.edu/burgelab/maxent/Xmaxentscan_scoreseq.html
MISO 0.5.4 Katz et al.124 https://miso.readthedocs.io/en/fastmiso/
Python 2.7.15 Python Software Foundation https://www.python.org
Python 3.8.5 Python Software Foundation https://www.python.org
R version 4.3.1 R Foundation https://www.r-project.org
R package PRROC Grau et al.130 https://cran.r-project.org/web/packages/PRROC/index.html
RibORF Ji150 https://github.com/zhejilab/RibORF
Rule Set 2 Doench et al.125 https://portals.broadinstitute.org/gpp/public/software/sgrna-scoring-help#rs2
Samtools 1.16.1 Li et al.119 http://www.htslib.org/download/
scikit-learn 1.3.0 Pedregosa et al.131 https://scikit-learn.org/stable/install.html
STAR 2.7.11a Dobin et al.149 https://github.com/alexdobin/STAR/releases
UCSC liftOver UCSC Genome Browser https://genome.ucsc.edu
UCSC Multiz Refseq protein alignment Blanchette et al.141 https://hgdownload.soe.ucsc.edu/goldenPath/hg38/multiz100way/
UCSC table browser UCSC Genome Browser https://genome.ucsc.edu
UCSF ChimeraX 1.6.1 Pettersen et al.123 https://www.cgl.ucsf.edu/chimerax/
VastDB Tapial et al.6 https://vastdb.crg.eu/wiki/Main_Page
Whippet Sterne-Weiler et al.117 https://github.com/timbitz/Whippet.jl
FlowJo software BD Biosciences, version 10.8.1 https://www.flowjo.com/solutions/flowjo
Cytoscape v3.9.1 Shannon et al.114 https://cytoscape.org/
STRING plugin v2.0.1 Doncheva et al.115 https://apps.cytoscape.org/apps/stringapp
Partek Flow software (version 10.0.23.0531) N/A https://www.partek.com/partek-flow/
GraphPad Prism Version 9.0 GraphPad Software Inc. https://www.graphpad.com/
Adobe Illustrator v28.4.1 Adobe https://www.adobe.com/products/illustrator.html
Affinity Designer 2 v2.3.0 Affinity Designer https://affinity.serif.com/en-us/designer/
BioRender BioRender https://www.biorender.com/
Other
Semi-dry transfer Bio-Rad Cat#1703940
UV Crosslinker VWR Cat#89131–484
Lonza 4D-Nucleofector Transfection System Lonza Cat#AAF-1003B, AAF-1003X
M220 Focused Ultrasonicator Covaris Cat#500295
4150 TapeStation System Agilent Cat#G2992AA
BTX Gemini Electroporator BTX Cat#452042
Bioruptor Plus sonicator Diagenode Cat#B01020002
Mini Gel Tank Life Technologies Cat#A25977,
Mini Blot Module Life Technologies Cat#B1000
VeritPro Thermal Cycler Applied Biosystems Cat#A48141
CFX96 Touch Real-Time PCR BioRad Cat#1855195
MiSeq Sequencing System Illumina N/A
NextSeq 2000 Sequencing System Illumina N/A
NovaSeq 6000 Sequencing System Illumina N/A
EVOS M5000 Imaging System Invitrogen Cat#AMF5000
iBright CL1500 Imaging System Invitrogen Cat#A44114
Vi-Cell BLU Cell Viability Analyzer Beckman Coulter Cat#C19196
Incucyte SX1 Sartorius Cat#4837
LSRFortessa cell analyzer BD Biosciences N/A

RESOURCES