Protocol to perform multiplexed assays of variant effect using curated loci prime editing

Carina G Biar; Nicholas Bodkin; Gemma L Carvill; Jeffrey D Calhoun

doi:10.1016/j.xpro.2025.103851

. 2025 May 25;6(2):103851. doi: 10.1016/j.xpro.2025.103851

Protocol to perform multiplexed assays of variant effect using curated loci prime editing

Carina G Biar ^1,^2,³, Nicholas Bodkin ^1,³, Gemma L Carvill ^1,⁴, Jeffrey D Calhoun ^1,^4,^5,^6,^∗

PMCID: PMC12159904 PMID: 40418630

Summary

Multiplexed assays of variant effect (MAVEs) perform simultaneous characterization of many variants. Here, we present a protocol to perform MAVEs using curated loci prime editing (cliPE), an accessible experimental pipeline that enables prime editing of a target gene. We describe steps for designing prime editing reagents, screening for genome editing efficiency, selecting a pool of cells edited to harbor different genetic variants, and sequencing. Lastly, we detail procedures for performing enrichment analysis to identify variants with normal or aberrant activity.

Subject areas: Sequence analysis, Cell-based Assays, Genomics

Graphical abstract

Highlights

•
Procedures to design prime editing libraries for any gene
•
Instructions for cloning and validating prime editing libraries
•
Guidance on designing the selection step for a new scalable assay
•
Steps for targeted sequencing and enrichment analysis

Publisher’s note: Undertaking any experimental protocol requires adherence to local institutional guidelines for laboratory safety and ethics.

Before you begin

Clinical genetic testing has become one of the first-line diagnostic tests in modern medicine, spanning many specialties. As more and more variants causing genetic disorders are identified, our diagnostic yield continues to improve. However, there is a significant bottleneck which precludes precision genetic diagnostics: variants of uncertain significance (VUS). VUS occupy the gray area between pathogenic and benign variants and are often missense variants, or single amino acid substitutions, which may or may not impact protein function. The number of VUS reported per year is consistently increasing in the ClinVar database, the most comprehensive database of genetic testing results currently available to researchers, genetic counselors, clinicians, and the public. In fact, there are over 1 million VUS present in ClinVar as of 2024.¹ This steady increase in reported VUS has outpaced the scalability of classic, low-throughput functional assays of variant effect. High-throughput multiplexed assays of variant effect (MAVEs) are a relatively new class of technologies which leverage a library of genetic variants, a selection assay based on function of a particular gene, and next-generation sequencing (NGS) to collectively generate functional data on many variants simultaneously. MAVEs are an attractive option to address the VUS problem as they can scale to assess hundreds to thousands of variants, a much higher throughput than previous strategies.

Variant libraries in MAVEs can be made from expression plasmids or genome engineering of endogenous loci. Recently, Erwood S et al. developed saturation prime editing (SPE) which uses prime editing to generate a library of variants for the MAVE.² Prime editing is a modification of CRISPR-Cas9 genome editing which does not require a double-stranded DNA break.³ Instead, prime editing is performed by co-expressing at least two components in a cell: (1) a nicking Cas9 fused to a reverse transcriptase and (2) a prime editing guide RNA (pegRNA), which contains both a spacer for targeting and a reverse transcription template to generate a variant of interest. While typical prime editing experiments use a single pegRNA to generate a single variant, Erwood et al. substituted a pegRNA library to generate a cell pool where each individual cell expresses one variant from the library.² A number of modifications to basic prime editing have improved its efficiency and therefore its scalability. For instance, co-expression of a dominant negative peptide of the MLH1 gene has been shown to boost prime editing efficiency.⁴ In addition, it has been shown that a limiting factor for prime editing is the degradation of pegRNAs; this led to the development of engineered pegRNAs (epegRNAs) which contain a stabilizing structured RNA motif.⁵

The eventual goal for most MAVEs is to catalog the functional effect of every possible missense variant in a particular gene of interest. This provides saturation-level data on functional consequences not only for variants in ClinVar, but also most possible new variants that will emerge as we continue to sequence more individuals. However, saturation-level MAVEs are expensive and require significant expertise. Herein we present curated loci prime editing (cliPE), a method that compromises saturation level data to focus specifically on VUS resolution of variants present in ClinVar. CliPE is a modular protocol adapted from SPE that is relatively inexpensive and has a low barrier to entry.² The data generated by cliPE has utility for variant resolution on its own, as most of the variants tested are reported VUS. We posit that cliPE is further well-suited to proof-of-concept or feasibility studies which can be used in proposals seeking funding for a saturation-level MAVE. We provide a series of internal positive controls that can be used to gauge the success of the prime editing component (Tables S3 and S4). We also developed a user-friendly cliPE epegRNA Designer companion Shiny app (https://design.clipe-mave.org/) that can be used to import variant data from gnomAD and ClinVar and design the library of epegRNAs ready for order from the user’s oligonucleotide synthesis company of choice. It is important to note that the selection portion of cliPE, like any MAVE approach, will be context-dependent with the gene of interest largely influencing the correct selection paradigm. While this protocol was optimized with the HAP1 cell line, high transferability to other cell lines is expected, though optimization may be required. We have validated cliPE with a TSC2 MAVE and have reported that data elsewhere.⁶

Workflow overview

The workflow involves seven modules, each corresponding to a major step in this protocol. In the first module, a pre-programmed Shiny app is utilized to generate and download output design files including candidate epegRNA libraries, archetypal epegRNAs, and nicking gRNAs. The second module involves cloning archetypal epegRNAs into an expression plasmid, transfection into HEK 293T cells, and sequencing amplicons to estimate editing efficiency. For the third module, epegRNA libraries are cloned based on the top-performing archetypal epegRNAs from pre-screening in the second module. Matching nicking gRNAs are also cloned. In the fourth and fifth modules, haploid cells are co-transfected to induce genome editing. The resulting cell libraries are then selected based on the function of the gene of interest. In the final sixth and seventh modules, selected and unselected cells are sequenced using high-depth amplicon sequencing followed by enrichment analysis to identify variants enriched in selected cell pools relative to unselected control cell pools.

Key resources table

REAGENT or RESOURCE	SOURCE	IDENTIFIER
Chemicals, peptides, and recombinant proteins

Luria broth (LB)	Fisher	#L2542
Ampicillin (amp)	Sigma	#A9518
LB + amp plates (agar)	Make with Addgene protocol	https://www.addgene.org/protocols/pouring-lb-agar-plates/
LB + amp broth	Dilute stock amp to 100 μg/mL in LB	N/A
TrypLE	Gibco	#25300062
iProof PCR mastermix	Bio-Rad	#1725310
Plasmid DNA extraction kit for minipreps	QIAGEN	#27106
Plasmid DNA extraction kit for midipreps	Zymo Research	#D4200
T4 DNA ligase	NEB	#M0202S
Golden Gate enzyme mix	NEB	#E1601S
BsaI-HFv2 enzyme	NEB	#R3733S
10X rCutSmart buffer	NEB	#B6004
BsmBI-v2 enzyme	NEB	#R0739S
NEBuffer r3.1	NEB	#B6003
Genomic DNA extraction kit	Invitrogen	#K1820-02
Gel DNA recovery kit	QIAGEN	#28104
Chemically competent One Shot TOP10	Thermo Fisher Scientific	#C404003
AMPure XP size selection beads	Beckman Coulter	#A63881
TurboFectin 8.0 transfection reagent	OriGene	#TF81005

Critical commercial assays

Qubit high-sensitivity dsDNA reagent	Thermo Fisher Scientific	#Q32854

Deposited data

cliPE GitHub repository	This paper	https://github.com/calhoujd/calhoujd.github.io

Experimental models: Cell lines

Human: HEK 293T cells	ATCC	CRL-321
Human: HAP1 cells	Horizon	#C669

Oligonucleotides

See supplemental tables	This paper	N/A

Recombinant DNA

pCMV-PEmax-P2A-GFP	Chen et al.⁴	RRID:Addgene180020
pEF1a-hMLH1dn	Chen et al.⁴	RRID:Addgene174824
pU6-tevopreq1-GG-acceptor	Nelson et al.⁵	RRID:Addgene174038
BPK1520	Kleinstiver et al.⁷	RRID:Addgene65777
pU6-Sp-pegRNA-RNF2_+5GtoT	Anzalone et al.³	RRID:Addgene_135957
pU6-sp-sgRNA-RNF2_+41nick	Anzalone et al.³	RRID:Addgene_135958

Software and algorithms

Jellyfish	Marcais et al.⁸	https://github.com/gmarcais/Jellyfish
cliPE companion Shiny apps	This paper	http://home.clipe-mave.org

Other

Neon electroporator	Invitrogen	#NEON1S
Magnet for AMPure XP bead size selection of next-generation sequencing libraries	Sergi Lab Supplies	#1005a
Qubit fluorometer	Thermo Fisher Scientific	#Q33238
Standard laboratory molecular biology equipment (gel electrophoresis, thermocycler, water bath, centrifuge, etc)	N/A	N/A
Standard tissue culture equipment (incubator, biosafety cabinet, and light microscope)	N/A	N/A
Access to fluorescence-activated cell sorting (FACS) equipment at nearby core facility	N/A	N/A

Open in a new tab

Materials and equipment

HEK 293T cell growth media

Reagent	Final concentration	Amount
DMEM (Gibco #11995-073)	89%	44.5 mL
FBS (R&D Systems; 50-152-7067)	10%	5 mL
PenStrep (Gibco #15140122; 100X stock)	1%	0.5 mL
Total	N/A	50 mL

Open in a new tab

After preparation, growth media should be stored at 4°C for up to 4–6 weeks.

HAP1 cell growth media

Reagent	Final concentration	Amount
IMDM (Gibco #12440061)	89%	44.5 mL
FBS (R&D Systems; 50-152-7067)	10%	5 mL
PenStrep (Gibco #15140122; 100X stock)	1%	0.5 mL
Total	N/A	50 mL

Open in a new tab

After preparation, growth media should be stored at 4°C for up to 4–6 weeks.

Step-by-step method details

Design archetypal epegRNAs, epegRNA libraries, and nicking gRNAs

Timing: 1 h (10 min for step 1, 10 min for step 2, and 40 min for step 3)

All of the prime editing designs will be generated with a companion epegRNA Designer Shiny app.

1.
Download necessary ClinVar missense input file.
Note: This file will be used to incorporate important controls into your cliPE experiment, namely known benign or likely benign (BLB) and pathogenic or likely pathogenic (PLP) variants.
- a.
  Navigate to https://www.ncbi.nlm.nih.gov/clinvar/ and search for gene name.
- b.
  Toggle on the filter for ‘Missense’.
- c.
  Download as tsv.
2.
Download necessary gnomAD file.
Note: This file will be used to incorporate important controls into your cliPE experiment, especially synonymous variants present in the general population.
- a.
  Navigate to https://gnomad.broadinstitute.org/ and search for gene name.
- b.
  Use the checkboxes appearing underneath the gnomAD variants section to filter for ‘Missense/inframe indel’ and ‘Synonymous’ variants.
- c.
  Export variants to csv.
3.
Generate designs with the cliPE epegRNA Designer companion Shiny app.
- a.
  Navigate to the cliPE epegRNA Designer app (https://design.clipe-mave.org/) and input basic information (gene name, RefSeq transcript ID, etc.).
- b.
  Upload ClinVar missense file and gnomAD missense file.
  Note: In the “Include additional variants in editing windows:” section, it is generally recommended to check all four boxes. If any of those variant classes aren’t important for a particular use case, they can be excluded by leaving the box unchecked. Adjust Additional Options section as needed.
  
  CRITICAL: In the Additional Options section, adjust the gnomAD minimal allele count accordingly with your gene of interest to select for BLB (negative control) alleles. For genes with autosomal dominant genetic disorders, generally a relatively low allele count threshold of 3–5 is sufficient. However, in the case of autosomal recessive disorders, this threshold may need to be adjusted significantly. For example, the classic p.F508del variant in CFTR has an allele count of nearly 20,000 in gnomAD.
- c.
  Download output files containing prime editing designs.
  CRITICAL: It is recommended to run the cliPE epegRNA Designer app twice for each gene, once with the “Introduce missense VUS variants” Design Strategy option and once with the “Introduce missense PLP/BLB variants” Design Strategy option. The former prioritizes regions of the gene with the highest density of VUS to maximize the value of each epegRNA library. The latter introduces the maximal number of control ClinVar BLB and PLP missense variants. Please see above the Before You Start section for an important discussion of control variants necessary for a successful cliPE experiment.
- d.
  Confirm regions contain sufficient control variants before proceeding.
  CRITICAL: It is important that the regions targeted also include additional classes of variants, or a truth set which will be key for validation of the MAVE during data analysis. Overall, for most MAVEs, two truth sets comprised of positive and negative controls are used to assess assay validity, which will be referred to as the assay validation truth set and the clinical truth set (see Table 1). The assay validation truth set consists of (1) synonymous and missense variants found in the general population in databases such as gnomAD (negative controls) and (2) premature truncation codon (PTC) variants (positive controls). It is recommended to include at least 20 synonymous and 20 PTC assay validation truth set variants in a cliPE experiment. The clinical truth set similarly consists of negative and positive controls present in the ClinVar database. The negative controls in the clinical truth set are benign or likely benign (BLB) variants and the positive controls are pathogenic or likely pathogenic (PLP) missense variants. It is recommended to include at least 25-30 clinical truth set variants in a cliPE experiment. Further discussion of this topic is available within the “Designing initial set of epegRNA architectures to screen, epegRNA libraries, and nicking gRNAs” subsection of the cliPE homepage (https://home.clipe-mave.org/).
- e.
  Order oligonucleotides from preferred vendor for each archetypal epegRNA (top and bottom strand for each spacer, top and bottom strand of each extension), the epegRNA scaffold (this needs to be phosphorylated; sequence provided in Table S2; can be ordered once and used for many cloning reactions), and primers to screen for editing efficiency (see step 5a for advice on designing these primers).
  Note: It is recommended to screen a minimum of 12 archetypal epegRNAs which will produce on average 3–6 epegRNA libraries which will edit with high enough efficiency for the cliPE workflow. It may be desirable to screen more than 12 archetypal epegRNAs upfront to increase the probability of attaining enough epegRNA designs to proceed with library cloning. Further discussion of this topic is available within the “Designing initial set of epegRNA architectures to screen, epegRNA libraries, and nicking gRNAs” subsection of the cliPE homepage (https://home.clipe-mave.org/). Optionally, it may be advantageous to assess whether any synonymous variants introduced by epegRNAs are predicted to impact splicing using tools such as spliceAI or by visually confirming the variant does not impact a canonical splice site using a browser such as the UCSC genome browser.⁹ We recommend avoiding any epegRNAs that fit this criteria as these same synonymous variants will be used in downstream epegRNA libraries to prevent multiple edits.

Table 1.

Outlining truth sets for cliPE experiments, related to Step 3

	Assay validation truth set	Clinical truth set
Negative	Synonymous and missense variants present in general population database such as gnomAD	Missense variants classified as benign or likely benign in ClinVar
Positive control	Premature truncation variants in the targeted regions of the gene	Missense variants classified as pathogenic or likely pathogenic in ClinVar

Open in a new tab

For most cliPE experiments, the following truth sets are critical to validate the MAVE. For some genes which lack a portion of the truth set, such as PTEN or CHEK2 which have a limited number of BLB variants listed in ClinVar, it is important to consider how the data analysis portion of the workflow may be impacted.

Screen archetypal epegRNAs representative of each candidate library

Timing: 3–4 weeks (cloning in step 4 is expected to take no more than 1 week; screening for prime editing efficiency in step 5 is expected to take 2–3 weeks)

Archetypal epegRNAs which are representative of candidate epegRNA libraries will be cloned and tested for prime editing efficiency. To streamline the cliPE workflow, we do not recommend using nicking gRNAs. This is based on guidance outlined in Figure 6 of Doman et al. where initial screens without nicking gRNAs are sufficient to identify active epegRNAs.¹⁰

4.
Clone individual prime editing constructs.
Note: The archetypal epegRNAs introduce synonymous variants which ablate a PAM site where possible. If such synonymous variants are not feasible, instead synonymous variants are introduced which modify the seed sequence. These strategies are used to minimize the risk of multiple edits being made in the same cell.
- a.
  Set up the restriction digest for the destination vector: (1) 1 μg pU6-tevopreq1-GG-acceptor plasmid, (2) 5 μL 10X rCutSmart Buffer, (3) 1 μL BsaI-HFv2 enzyme, (4) X μL DNA grade water to a final reaction volume of 50 μL.
  Note: It may be advisable to scale this reaction up to generate sufficient digested plasmid. Each GoldenGate reaction here requires 60 ng of digested vector. Downstream cloning of epegRNAs require 50 ng of digested vector per epegRNA library. For example, cloning 10 archetypal epegRNAs and 5 epegRNA libraries will require a total of 850 ng of digested vector.
- b.
  Incubate for a minimum of 3–4 h at 37°C followed by inactivation of enzymatic activity at 80°C for 20 min.
  CRITICAL: While many vendors recommend short (15 min) incubation times for restriction enzymes, our experience is that longer restriction digests (> 2 h) lead to improved cloning efficiency.
- c.
  Gel purify the vector.
  Note: Reserve any leftover digested vector at −20°C for future ligations, including downstream cloning of epegRNA libraries.
- d.
  Pre-anneal components for GoldenGate cloning: (1) spacer duplex, (2) epegRNA extension duplex, and (3) phosphorylated scaffold. For each, add 1 μL of each oligo (100 μM stock) to 23 μL of DNA grade water. Then, heat on a thermocycler to 95°C, incubate for 3 min, and then cool slowly to 25°C (ramp speed: 5°C per min). Finally, add 75 μL to each tube to dilute to proper concentration for GoldenGate cloning.
- e.
  Set up GoldenGate cloning reaction: (1) 2 μL digested vector (30 ng/μL), (2) 2 μL annealed spacer oligos, (3) 2 μL annealed epegRNA extension oligos, (4) 2 μL annealed phosphorylated scaffold, (5) 1 μL GoldenGate enzyme mix, (6) 2 μL 10X T4 DNA ligase buffer, and (7) 9 μL DNA grade water. Incubate on thermocycler at 37°C for 1 h followed by 60°C for 5 min.
- f.
  Transform bacteria.
  - i.
    Thaw bacteria on ice for up to 20 min.
  - ii.
    Add 1 μL of the GoldenGate reaction to 25-50 μL of chemically competent bacteria. Flick gently to mix.
  - iii.
    Incubate on ice up to 20 min.
  - iv.
    Perform heat shock by rapidly transferring to 42°C water bath for 30 s followed by rapid transfer back to ice for at least 2 min.
  - v.
    Add 300 μL SOC media and incubate at 37°C for 30-60 min with shaking at 200-250 rpm.
  - vi.
    Pre-warm 1-2 LB+Amp plates per cloning reaction to for 30 min at 20°C–22°C.
  - vii.
    Plate bacteria onto pre-warmed plates.
    Note: It is recommended to plate low (25–50 μL) and high (100–200 μL) volumes of each cloning reaction to increase likelihood of easily picking single colonies across a range of cloning efficiencies.
  - viii.
    Incubate plates overnight (or about 16 h) at 37°C.
  - ix.
    Pick 3–4 colonies per archetypal epegRNA and sequence by Sanger to QC correct cloning of each construct.

Perform prime editing in HEK 293T cells.

a.
Design and order primers for sequencing of regions targeted for genome editing.
Note: We recommend a first round Sanger sequencing screen of PCR amplicons from DNA isolated from cells treated with each archetypal epegRNA followed by a secondary long read (LR) sequencing screen of candidate active epegRNAs to estimate editing efficiency. This requires designing and ordering four primer pairs for each archetypal epegRNA. Optionally, it is possible to omit the Sanger sequencing and proceed directly to LR sequencing to save time at the expense of the cost of LR sequencing of amplicons from both active and inactive epegRNAs. It is important to ensure that the primers do not bind too close to the site of editing, particularly for primer designs for Sanger sequencing, due to the extra noise in the first 25–35 bases of sequencing data. An amplicon size of 500–700 bp is optimal for Sanger sequencing, while amplicons of 800–1200 bp are optimal for LR sequencing. We recommend using software such as Primer3 (https://primer3.ut.ee/) to aid with primer design.
b.
Seed 100,000 cells 16–24 h before transfection in 24-well plates.
c.
Transiently co-transfect cells using either TurboFectin 8.0 or another suitable transfection reagent (see Table 2 for amounts of each vector).
d.
24–48 h post-transfection, sort GFP+ cells at a flow cytometry core facility.
Note: It is recommended to always plate an extra well of control untransfected cells for the purpose of setting the gate for GFP+ cells and accounting for autofluorescence. It is recommended to sort at least 10,000-50,000 cells per condition.
e.
After sorting, re-plate GFP+ cells under standard culture conditions for at least 48 h.
f.
Collect cell pellets by either scraping or trypsinization with Tryp-LE and centrifugation in 1.5 mL Eppendorf tubes at 300 × g for 5 min at 20–22°C.
Note: Pellets can be either used immediately for genomic DNA (gDNA) extraction or frozen at −20°C prior to gDNA extraction.
g.
Extract gDNA using standard gDNA miniprep kit.

Perform screening by PCR.

Setup the PCR reaction. After aliquoting 22 μL of mastermix in each tube of PCR strips, add 20-60 ng template gDNA (3 μL of 6.67–20 ng/μL).

CRITICAL: It is recommended for each primer pair to include an untransfected control and a no template control in addition to the cells co-transfected for prime editing.

Optional: Run a small aliquot (∼5 μL) on a 1–2% agarose gel to confirm amplification.

PCR Reaction Mastermix

Reagent	Amount
2X iProof Mastermix	12.5 μL
forward primer (10 μM)	1.25 μL
reverse primer (10 μM)	1.25 μL
ddH₂O	To 22 μL total volume

Open in a new tab

PCR Cycling Conditions

Steps	Temperature	Time	Cycles
Initial Denaturation	98 °C	3 min	1
Denaturation	98 °C	20 s	30–35 cycles
Annealing	62 °C	20 s
Extension	72 °C	30 s
Final extension	72 °C	7 min	1
Hold	4 °C	forever

Open in a new tab

ii.
PCR cleanup amplicons to prepare for sequencing.
Note: Quick cleanup with a column-based kit is usually sufficient. If primers produce multiple amplicons which interfere with downstream sequencing, it may be necessary to perform gel electrophoresis and gel purify the specific amplicon for sequencing.
iii.
Submit amplicons for either Sanger or LR sequencing (vendor such as Plasmidsaurus).
CRITICAL: Carefully follow recommendations provided by the sequencing vendor to properly submit samples.

Note: It may be possible to use the Synthego ICE software package (https://www.synthego.com/guide/how-to-use-crispr/ice-analysis-guide) to estimate editing efficiency from Sanger chromatograms. This software package is specifically designed for CRISPR/Cas9 editing analysis, but it may be possible to use for prime editing in the ‘Knock-in’ mode.

i.
Select epegRNAs with sufficient editing (>15%) for downstream cliPE experiment.
Note: Optionally, the user may estimate editing efficiency from Sanger chromatogram peak height but this is not recommended.

CRITICAL: It is optimal to confirm editing rate by LR sequencing. Ensure the LR sequencing provider provides either raw fastq sequencing reads or per-base frequencies across an assembled consensus sequence (Plasmidsaurus provides both, for example). The number of reads matching variant sequence can be divided by the total number of reads to estimate editing rate.
j.
Confirm regions contain sufficient control variants before proceeding.
CRITICAL: Once the low-performing epegRNAs are filtered out, it is important to reassess the presence of control set variants as discussed above in 3d-f. If the libraries do not contain enough control set variants, it is recommended to revisit the epegRNA Designer app, design a second round of prime editing constructs, and do one more round of archetypal epegRNA screening to ensure the final dataset will contain a minimal number of control variants. It is important that a minimum of 20 positive and 20 negative truth set variants are present in total across all of the cliPE libraries which will be used for further experiments. It is important to stress that this is a minimum and exceeding this minimal truth set is preferable when possible.

Table 2.

Plasmid DNA amounts for co-transfection in epegRNA architecture screen, related to Step 5

Plasmid	DNA amount per well of 24-well plate
epegRNA-containing pU6-tevopreq1-GG-acceptor	75 ng
pCMV-PEmax-P2A-GFP	263 ng
pEF1a-hMLH1dn	132 ng

Open in a new tab

We typically use the TurboFectin 8.0 transfection reagent (OriGene #TF81005), but an alternative such as Lipofectamine3000 is suitable if it efficiently transfects HEK 293T cells.

Clone nicking gRNAs and prime editing libraries

Timing: 4–5 weeks (much of this time accounts for shipping time for primers and oligo pools and turnaround time for sequencing QC; hands-on time is 1 week for step 6 and 1 week for step 7)

EpegRNA libraries and nicking gRNA vectors will be cloned and validated.

6.
Clone nicking gRNA constructs.
- a.
  Order oligonucleotides with appropriate sticky ends.
  Note: These are provided in the output from the cliPE epegRNA Designer Shiny app.
  
  Troubleshooting: Cloning of nicking gRNAs into BPK1520 follows a similar principle as standard px458/px459 cloning. See problem 2 for how to avoid a common pitfall in gRNA cloning.
- b.
  Generate linearized vector BPK1520.
  - i.
    Setup the restriction digest: (1) 1 μg BPK1520, (2) 2 μL 10X NEBuffer r3.1, (3) 1 μL BsmBI-HF enzyme, (4) 16 μL DNA grade water. Incubate for a minimum of 3-4 h at 55°C followed by inactivation of enzymatic activity at 80°C for 20 min.
  - ii.
    Purify by gel or column purification.
    Note: It is recommended to examine a small aliquot by gel electrophoresis (0.8–1% agarose) to confirm complete digestion of vector. It is further recommended to considering scaling this reaction up further in anticipation of additional nicking gRNA cloning reactions. Leftover linearized vector can be stored at −20°C when not in use.
- c.
  Setup the phosphorylation and annealing reaction: (1) 1 μL oligo 1 (100 μM), (2) 1 μL oligo 2 (100 μM), (3) 1 μL 10X T4 Ligation Buffer (NEB) (4) 6.5 μL DNA grade water, (5) 0.5 μL T4 PNK (NEB). Please note it is important to use T4 Ligation Buffer rather than PNK Buffer. Incubate in a thermocycler at 37°C for 30 min to anneal oligos. Then, heat to 95°C and then cool slowly to 25°C (ramp speed: 5°C per min).
- d.
  Set up the ligation reaction: (1) 50 ng BbsI-linearized expression vector BPK1520, (2) 1 μL phosphorylated and annealed oligo duplex (1:200 dilution), (3) 1 μL 10X T4 DNA ligase buffer (NEB), (4) X μL ddH2O (to a total of 10 μL), (5) 1 μL T4 DNA Ligase (NEB). Incubate for 15 min at 20-22°C.
- e.
  Transform bacteria.
  - i.
    Thaw bacteria on ice for up to 20 min.
  - ii.
    Add 1-2 μL of ligated plasmid to 25°C–50 μL of chemically competent bacteria. Flick gently to mix.
  - iii.
    Incubate on ice up to 20 min.
  - iv.
    Perform heat shock by rapidly transferring to 42°C water bath for 30 s followed by rapid transfer back to ice for at least 2 min.
  - v.
    Add 300 μL SOC media and incubate at 37°C for 30-60 min with shaking at 200–250 rpm.
  - vi.
    Pre-warm 1-2 LB+Amp plates per cloning reaction for 30 min at 20°C–22°C.
  - vii.
    Plate bacteria onto pre-warmed plates.
    Note: It is recommended to plate low (25-50 μL) and high (100-200 μL) volumes of each cloning reaction to increase likelihood of easily picking single colonies across a range of cloning efficiencies.
  - viii.
    Incubate plates overnight (or about 16 h) at 37°C.
  - ix.
    Pick 3-4 colonies per nicking gRNA and sequence by Sanger to QC correct cloning of each construct.

Clone epegRNA libraries.

a.
Order single-stranded DNA oligo pools and primers to generate double-stranded oligo pool and append BsaI restriction enzyme recognition sites.
Note: These are provided in the output from the cliPE epegRNA Designer Shiny app.
b.
Resuspend pools of single-stranded DNA oligos (IDT oPools) encoding epegRNAs with 50 μL of DNA grade water or 1X TE. Make a working stock by diluting to 20 ng/μL.
Note: For IDT oPools, the stock concentration will be equivalent to the number of oligos in the pool in μM (i.e., a 50x oligo pool will be 50 μM).

Set up PCR reactions to generate double-stranded oligo pool and append BsaI restriction enzyme recognition sites (10 cycles).

PCR Reaction Mastermix

Reagent	Amount
Oligo pool (20 ng/μL)	3 μL
2X iProof Mastermix	12.5 μL
forward primer (10 μM)	1.25 μL
reverse primer (10 μM)	1.25 μL
ddH₂O	7 μL total volume

Open in a new tab

PCR Cycling Conditions

Steps	Temperature	Time	Cycles
Initial Denaturation	98 °C	2 min	1
Denaturation	98 °C	30 s	10 cycles
Annealing	60 °C	10 s
Extension	72 °C	30 s
Final extension	72 °C	7 min	1
Hold	4 °C	forever

Open in a new tab

Note: It may be necessary to adjust the annealing temperature to optimize for some primer pairs.

d.
Use a PCR cleanup kit such as QIAGEN’s QIAquick PCR and gel cleanup kit (#28104) to quickly prepare the amplified DNA for restriction digest. Elute in 30 μL of elution buffer.
e.
Set up the restriction digest to generate sticky ends for cloning: (1) 30 μL PCR-amplified oligo pool, (2) 5 μL 10X NEB rCutSmart Buffer, (3) 1 μL BsaI-HFv2 enzyme (NEB; #R3733S), (4) 14 μL DNA grade water. Incubate for at least 3–4 h at 37°C followed by inactivation of enzymatic activity at 80°C for 20 min. Purify by column purification.
f.
If necessary, set up the restriction digest for the destination vector as outlined above in step 4a-c.
g.
Set up the ligation reaction: (1) 37.5 ng digested and purified epegRNA pool, (2) 50 ng digested and purified destination vector, (3) 2 μL 10X T4 DNA ligase buffer (NEB), (4) X μL DNA grade water (to a total of 20 μL), (5) 1 μL T4 DNA Ligase. Incubate for 15 min at 20–22°C.
h.
Transform 1-2 μL ligated plasmid pool into 50 μL of competent bacteria and plate onto 10 cm LB+Amp plates.
Note: As cliPE libraries are relatively small, typically 40-70 epegRNAs per library, it is not necessary to make significant modifications to bacterial transformation protocols. It is important to note that if larger libraries are attempted, it may be necessary to scale up both the volume of competent bacteria and the size and number of LB+Amp plates used for plating.
i.
Culture plates overnight (about 16 h). Carefully add 1 mL of LB and scrape colonies into a 1.5 mL tube and pellet (8000 × g for 3 min at 20-22°C). Use a midiprep kit to extract plasmid DNA from the pooled colonies.
j.
Determine library concentrations with the Qubit high-sensitivity dsDNA reagent.
k.
Perform a first round of QC using a plasmid LR sequencing service such as Plasmidsaurus. Determine whether epegRNA architecture is correct and if different variants are present as expected.
Optional: Perform a second round of QC using targeted amplicon sequencing of each epegRNA to estimate the frequency of each epegRNA in each plasmid pool. LR sequencing services like Plasmidsaurus are useful for initial QC of an epegRNA library, but it is difficult to accurately estimate the frequency of each epegRNA within the pool with the limited number of reads you routinely receive. Illumina targeted amplicon sequencing provides sufficient read depth to quantify the frequency of each epegRNA. epegRNAs present at low concentrations relative to other epegRNAs are less likely to produce edits in enough cells and the variants encoded by these epegRNAs may need to be excluded from the final analysis. It is recommended to especially check the relative frequency of epegRNAs encoding truth set variants to ensure that as many as possible truth set variants will likely be edited into cells. The following primers are recommended for submitting as part of a large, multiplexed batch of amplicon sequencing libraries with either single or double barcoding (Illumina adaptors are underlined):

tevopreqPCR1_ampSeqF: (5′-TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGATATATCTTGTGGAAAGGACGAAAC-3′)

tevopreqPCR1_ampSeqR: (5′-GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTACCTCGAGCGGCCCA-3′)

Please note, if sending a small batch of sequencing to a service such as GENEWIZ from Azenta Amplicon-EZ or the MGH CCIB DNA Core’s Complete Amplicon Sequencing, it is advised to adjust the primers to be compatible. Please carefully read the sample submission instructions for the particular service.

Prime edit haploid HAP1s

Timing: 1 h (1 h for step 8)

Haploid cells are co-transfected to generate cell pools with each cell containing a single variant in the gene of interest.

8.
Generate cell library containing individual variants with prime editing.
- a.
  Culture enough dishes of HAP1 cells for the number of transfections needed. We recommend using 1 million cells for each epegRNA library transfection.
  Note: Library complexity for cliPE libraries is relatively low (40–70 unique sequences) compared to saturation libraries which often contain thousands of unique sequences. Electroporating 1 million cells typically results in cell pools where each unique variant is present in 0.5% of GFP+ cells, on average. Assuming a successful sort of at least 50,000 GFP+ cells (see below), each variant should remain present in about 250 cells, with some variance. This is somewhat in excess of the minimum number of variant-containing cells necessary to maintain library complexity (at least 10–100 cells per variant). Please see McDade et al. for a detailed discussion of library complexity in the context of pooled screening.¹¹
- b.
  Transiently co-transfect cells using the Invitrogen Neon electroporator using the 100 μL kit (#MPK10096). Please see Table 3 for suggested amounts of each vector.
  Note: If access to a flow cytometry core facility is not possible or practical, it is recommended to substitute a PEmax expression plasmid containing a mammalian cell antibiotic selection cassette (such as PB-PEmax; Addgene: #187893) in the place of the pCMV-PEmax-P2A-GFP plasmid. A short pulse of puromycin for 48 h will likely produce similar selection for transfected cells as the GFP sort outlined herein.
- c.
  24 h post-transfection, sort GFP+ cells at a flow cytometry core facility. It is recommended to sort 50,000-200,000 cells per co-transfected cliPE library.
- d.
  After sorting, re-plate GFP+ cells under standard culture conditions for at least 48 h.
  Optional: Perform a QC check by performing amplicon sequencing of cells to confirm genome editing of target region. In our experience, we have found it useful to validate cell libraries generated from any newly cloned epegRNA libraries prior to proceeding with selection.

Table 3.

Plasmid DNA amounts for co-transfection in epegRNA library selection experiment, related to Step 8

Plasmid	DNA amount per 1 million HAP1 cells
epegRNA library	451 ng
Nicking gRNA (in BPK1520)	180 ng
pCMV-PEmax-P2A-GFP	1579 ng
pEF1a-hMLH1dn	789 ng

Open in a new tab

We use the Neon electroporation system which routinely yields 15–30% GFP+ cells. If substituting transfection methods, we recommend optimizing to achieve a minimum of 15% GFP+ cells. Others have reported successful transfection of HAP1s with TurboFectin 8.0 (OriGene #TF81005), although we typically observe significantly reduced efficiency compared to electroporation. Anecdotally, Xfect (Takara; Cat #631317) works well for transfecting small plasmids into HAP1s, but we have not tested this transfection reagent with larger plasmids like the pCMV-PEmax-P2A-GFP.

Select cells

Timing: Variable (step 9 can be as short as 4–7 days for cell sorting methods but can take up to 3–4 weeks for cell growth selection)

Cell pools generated in Step 9 are selected by a chosen method which will enrich for cells containing a certain class of variants, such as those containing loss-of-function of the gene of interest. Selection of cell pools for MAVEs is highly context-dependent and, as such, it is not possible to provide a single protocol that will work for any gene of interest; instead, we have provided below a number of resources that can guide in selecting an appropriate selection method.

9.
Select cells by growth assay, cell sorting, or other method.

CRITICAL: There are some generalizable methods that are worth considering first as a potential selection strategy for a MAVE. If the gene of interest is an essential gene in HAP1s or another cell type, this can be exploited to perform a MAVE using genome editing followed by culturing the cells for different amounts of time.¹² After a certain amount of time in culture, loss-of-function alleles will be depleted as cells expressing these variants fail to thrive. This is a very popular selection strategy and there are many examples in the literature to use to guide your experimental design.¹³^,¹⁴ If the gene is not an essential gene, it still might be possible to find a cell line where growth is partially dependent on the gene of interest; DepMap is a useful resource with genome-wide data on many cell lines using both gene knockout and mRNA knockdown approaches (https://depmap.org/portal/). If pathogenicity of the gene of interest correlates with reduced protein abundance, VAMPseq is a powerful method to identify pathogenic variants, as demonstrated in previous studies on PTEN and others.¹⁵ Additional generalizable methods include response to drug treatment, single cell transcriptomics, cell surface trafficking of membrane proteins, and cell morphology.¹⁶^,¹⁷^,¹⁸^,¹⁹^,²⁰^,²¹ If none of these more generalizable methods is fitting, there are certainly other gene-specific alternatives you can employ. Sorting cells based on cell signaling activity enabled us to distinguish pathogenic TSC2 missense variants.⁶ The BioGrid ORCS database (https://orcs.thebiogrid.org/) of high-throughput CRISPR screens can identify potential selection approaches to facilitate a MAVE study. We also recommend reading through reviews such as Starita L et al. and Tabet D et al. which provide excellent guidance on selection strategies for high-throughput functional studies.²²^,²³

Note: For new assays, it is recommended to perform pilot experiments with a single variant or small number of variants, using individual epegRNAs designed to generate control variants such as missense BLB and PLP variants, synonymous variants present in the general population, or loss-of-function variants produced by PTC codons.

Sequence regions of genes targeted for genome editing

Timing: Variable (for step 10, library preparation can be performed in 1–2 days; sequencing time varies by sequencing facility and is expected to be 1–3 weeks)

Next-generation targeted amplicon sequencing of each region selected for prime editing is performed in control unselected cells as well as cells which have undergone selection. For example, if a growth assay was performed, libraries might be constructed at an early and a late time point after prime editing.

10.

Prepare single- or dual-barcoded amplicon sequencing libraries.

Note: It is necessary to design primers to amplify each locus targeted for genome editing. Constraining the total amplicon size to be less than 250 bp is recommended but not strictly required as it maximizes read depth of the target region. Alternative designs or sequencing platforms can be used with a note that high-depth sequencing of the region of the amplicon containing the full region of interest (all possible edits from a single cliPE epegRNA library) is necessary. Also, as above, it is important to ensure the primers flank but do not directly overlap the region of interest to detect any small insertions and deletions (indels). If sequencing with a vendor such as MGH CCIB DNA Core’s Complete Amplicon Sequencing service or GENEWIZ from Azenta Amplicon-EZ, it is important to review the sample submission guidelines specific to the respective service.

CRITICAL: This protocol is used for larger batches of many libraries sequenced in multiplex. These libraries are produced by two rounds of PCR, first to amplify the region of interest and append Illumina adaptors, and second to barcode the library with single or dual index sequences. We recommend single barcoding when 24 or fewer libraries are being pooled and sequenced. For the most cost-effective sequencing, we recommend double barcoding to enable pooling more than 24 libraries for a single sequencing run. These libraries are also compatible with multiplexing with many other library types which can reduce cost. If performing small batch library QC or pilot experiments, we recommend a simpler, single PCR protocol. This protocol and cost comparison between single library sequencing multiplexed sequencing is available at the cliPE homepage (https://home.clipe-mave.org/).

a.
Design primers to amplify the region spanning the reverse transcription template of each epegRNA architecture. See step 10A for specific design details. Append Illumina adaptors prior to ordering. If necessary, order primers for the second PCR (barcoding PCR) reaction described below (see Table S1 for primer sequences to order from IDT or preferred vendor).
Note: To sequence in multiplex at a core facility or outside vendor, it is necessary to append the appropriate Illumina adaptors to enable barcoding:

Primer 1: adapter + forward target primer (5′- TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG -forward_primer-3′).

Primer 2: adapter + reverse target primer (5′- GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG -reverse_primer-3′).
b.
Extract gDNA from unselected and selected cell pellets.

Set up PCR reaction to generate initial amplicons for targeted amplicon sequencing. After aliquoting 22 μL of mastermix in each tube of PCR strips, add 20–60 ng template gDNA (3 μL of 6.67–20 ng/μL).

PCR Reaction Mastermix

Reagent	Amount
DNA template	60 ng
2X iProof Mastermix	12.5 μL
ampSeq_FwdPrimer (10 μM)	1.25 μL
ampSeq_RevPrimer (10 μM)	1.25 μL
ddH₂O	To 25 μL total volume

Open in a new tab

PCR Cycling Conditions

Steps	Temperature	Time	Cycles
Initial Denaturation	98 °C	3 min	1
Denaturation	98 °C	20 s	30–35 cycles
Annealing	60 °C–62 °C	20 s
Extension	72 °C	30 s
Final extension	72 °C	7 min	1
Hold	4 °C	forever

Open in a new tab

Use size selection beads to remove primer dimer and prepare libraries for barcoding PCR.

i.
Allow aliquots of AmpureXP beads to sit at 20°C–22°C for at least 30 min prior to use.
ii.
Add 25 μL of DNA grade water to each 25 μL PCR reaction to bring the total volume to 50 μL. Add an equal volume (50 μL) of AmpureXP beads to bring the total volume to 100 μL and mix well by pipetting.
iii.
After a 10 min incubation at 20°C–22°C, place on magnet for 5 min.
iv.
Remove and discard supernatant, followed by two washes with 100–200 μL of 70% ethanol.
v.
Allow beads to air dry for about 5 min.
vi.
Add 21 μL of 1X TE and incubate for 5 min prior to placing tubes back on magnet.

vii.

After 5 min of separation, pipette 20 μL of eluted DNA into a fresh tube.

Optional: Perform a qPCR reaction to optimize the number of cycles for barcoding PCR. Add 2 μL of PCR1 product (1:20 dilution) directly to each well. Select a number of cycles early in the linear phase. Include a step to read SYBR Green fluorescence after each round of PCR. Optimal cycle numbers typically vary between 7 and 15. If omitting this step, it is recommended to use a standard 12 cycles which is typically sufficient for barcoding amplicon sequencing libraries.

PCR Reaction Mastermix

Reagent	Amount
2X iProof Mastermix	12.5 μL
ampSeqPCR2univFwdPrimer (2.5 μM)	3 μL
ampSeqPCR2univRevPrimer (2.5 μM)	3 μL
SybrGreen (100X)	0.25 μL
ddH₂O	To 23 μL total volume

Open in a new tab

PCR Cycling Conditions

Steps	Temperature	Time	Cycles
Initial Denaturation	98 °C	2 min	1
Denaturation	98 °C	30 s	40 cycles
Annealing	60 °C	10 s
Extension	72 °C	30 s
Final extension	72 °C	7 min	1
Hold	4 °C	forever

Open in a new tab

Set up PCR2 barcoding reaction.

CRITICAL: It is important that each sample has a unique combination of i5 and i7 indexes for proper demultiplexing. It is imperative to make a spreadsheet of which samples are receiving which barcodes both for correct pipetting as well as downstream demultiplexing. For single-index multiplexing, up to 24 amplicon sequencing libraries can be pooled together. Select a constant i5 index (one of P5_[…] in Table S1) and up to 24 of the i7 indexes (P7_[…] in Table S1). For dual-barcoded libraries, up to 384 samples can be pooled together. Make separate PCR mastermixes for each unique i5 index. Aliquot 22 μL of mastermix into PCR strip tubes, and directly add 2 μL of PCR1 product (1:20 dilution) and 3 μL of i7 index primer (2.5 μM) to each tube.

PCR Reaction Mastermix

Reagent	Amount
2X iProof Mastermix	12.5 μL
i5 index primer (2.5 μM)	3 μL
ddH₂O	To 20 μL total volume

Open in a new tab

PCR Cycling Conditions

Steps	Temperature	Time	Cycles
Initial Denaturation	98 °C	2 min	1
Denaturation	98 °C	30 s	12 cycles or number determined during optional qPCR
Annealing	60 °C	10 s
Extension	72 °C	30 s
Final extension	72 °C	7 min	1
Hold	4 °C	forever

Open in a new tab

f.
Pool 5–10 μL of each library and use AmpureXP bead purification to clean up amplicon pool prior to sequencing.
- i.
  Allow aliquots of AmpureXP beads to sit at 20°C–22°C for at least 30 min prior to use.
- ii.
  Add 0.9X volume of AmpureXP beads to pooled library. For example, if pooling 10 libraries and using 10 μL per library, add 90 μL of AmpureXP beads. Mix well by pipetting.
- iii.
  After a 10 min incubation at 20°C–22°C, place on magnet for 5 min.
- iv.
  Remove and discard supernatant, followed by two washes with 100–500 μL 70% ethanol. Use enough volume of 70% ethanol to cover beads.
- v.
  Allow beads to air dry for about 5 min.
- vi.
  Add 21 μL of 1X TE and incubate for 5 min prior to placing tubes back on magnet.
- vii.
  After 5 min of separation, pipette 20 μL of eluted DNA into a fresh tube.
g.
Use Qubit to quantify DNA concentration for final amplicon pool, dilute to appropriate concentration, and submit for Illumina short-read sequencing.
Note: Follow the provided sample submission instructions for custom short-read Illumina libraries from sequencing vendor. Request 2x 150 bp (i.e., paired-end) sequencing on an appropriate platform to get a minimum of 200,000 reads per individual amplicon sequencing library in the pool. No custom sequencing primers are necessary, although you will need to provide the indexes used for barcoding (see Table S1). It is critical to let the sequencing facility know that these libraries likely qualify as low sequencing diversity libraries and may require higher than normal spike-in of PhiX, unless they will be multiplexed with other libraries from users with diverse libraries. Please see the cliPE GitHub repository (https://github.com/calhoujd/calhoujd.github.io) for example sample sheets, library kits, and demultiplexing scripts using bcl2fastq.

Analysis of data using random effects modeling

Timing: 1–2 days (1–2 days for step 11)

Allele frequencies are computed from unselected and selected cell pools. The resulting enrichment score, if successful, will distinguish pathogenic from benign variants in the gene of interest.

11.
Perform k-mer counting and enrichment analysis.
Note: This pipeline uses the Jellyfish k-mer counting software to count the number of reads containing each variant using k-mers unique to each variant. We recommend a k-mer length equal to the reverse transcription template length of the epegRNA library. The counts are converted to allele frequencies relative to the total number of reads. Similarly to Enrich2, a software package widely used for analysis of MAVE datasets, random effects modeling is used to generate an enrichment score and standard error using replicates of selected and unselected cell pools.²⁴ Random effects modeling is performed using the metafor R package.²⁵
- a.
  If not installed previously, install the Jellyfish k-mer counting software (https://github.com/gmarcais/Jellyfish).⁸
- b.
  Unzip read1 fastqs:
  >gunzip read1.fastq.gz
  Note: It is recommended to write a script which combines this and the following several steps to run with one command using either sh, msub, or sbatch, depending on whether you are running this locally or on a high-performance compute cluster. Please see the cliPE GitHub repository (https://github.com/calhoujd/calhoujd.github.io) for an example script written for sbatch for Slurm scheduler (https://github.com/calhoujd/calhoujd.github.io/blob/gh-pages/docs/jellyfish_script_e37_v1.sh).
- c.
  Run the Jellyfish bc command on read1 fastqs:
  >jellyfish bc -m <k-mer length> -s 1G -t <threads> -o read1.bc -C read1.fastq
- d.
  Run Jellyfish count command:
  >jellyfish count -m <k-mer length> -s 1G -t <threads> -o read1_MERcounts.jf --bc read1.bc --if kmerLibrary.fasta -C read1.fastq
  Note: The fasta library is generated previously by the Shiny cliPE epegRNA Designer app.
- e.
  Run the Jellyfish dump command:
  >jellyfish dump read1_MERcounts.jf > mer_counts_dump_read1.fa
- f.
  Repeat steps b through e for read 2 for the same sample, and then repeat this process for each pair of fastq files.
  Note: It is important to note this can be streamlined using standard file naming conventions and running all jobs simultaneously using programmatic job submission such as job arrays. Please see demo data fastq files provided on the cliPE github repository (such as https://github.com/calhoujd/calhoujd.github.io/blob/gh-pages/docs/CB1_S1_R1_001.fastq).
- g.
  Run the cliPEr_app1_fasta2csv app (https://calhoujd12.shinyapps.io/cliPEr_app1_fasta2csv/) to convert fasta file to csv file. Repeat for all read1 and read2 fasta files.
  Note: Please see demo data fasta files provided on the cliPE github repository (such as https://github.com/calhoujd/calhoujd.github.io/blob/gh-pages/docs/CB1_R1_kmerCount.fa ).
- h.
  Annotate the k-mer count file with the variant name by uploading the output from cliPEr_app1_fasta2csv as well as the dictionary generated previously by the epegRNA Designer Shiny app (Step 3) to the Shiny cliPEr_app2_kmers2variants app (https://calhoujd12.shinyapps.io/cliPEr_app2_kmers2variants/).
  CRITICAL: If the Jellyfish output contains the reverse complement k-mer relative to the dictionary generated by the epegRNA Designer app, there is an option to reverse complement within the cliPEr_app2_kmers2variants Shiny app to ensure successful annotation.
  
  Note: Please see demo data files provided on the cliPE github repository (demo csv: https://github.com/calhoujd/calhoujd.github.io/blob/gh-pages/docs/example_input_CB1_R1.csv; demo dictionary: https://github.com/calhoujd/calhoujd.github.io/blob/gh-pages/docs/TSC2_e37_kmer_dictionary.csv).
- i.
  Generate a final csv for input into the third R-based Shiny app.
  CRITICAL: Each epegRNA architecture will have a separate file. Column1 will be the name of each variant. Column2 will be ‘counts_control_rep1’, the counts for the variant in cells without selection. Column3 will be ‘counts_selected_rep1’, the counts for the variant in cells after selection. Additional replicates will each be represented by additional pairs of columns. It is recommended to include 3-4 biological replicates for each epegRNA library.
  
  Note: Examples are provided for both 3 (https://github.com/calhoujd/calhoujd.github.io/blob/gh-pages/docs/Book2_e17_3xreps.csv) and 4 (https://github.com/calhoujd/calhoujd.github.io/blob/gh-pages/docs/Book2_e17_4xreps.csv) biological replicates.
- j.
  Upload csv to the cliPEr_app3_random_effects_modeling companion Shiny app (https://calhoujd12.shinyapps.io/cliPEr_app3_random_effects_modeling/).
  Note: It is necessary to specify the number of biological replicates. For each variant, the allele frequency will be determined relative to the depth of sequencing. After the analysis is complete, download the output csv file which will contain two additional columns to the original input: (1) the beta, or functional enrichment score, and (2) the standard error for each variant. Alternatively, other tools are available which can perform random effects modeling for MAVE data, such as Enrich2 (https://github.com/FowlerLab/Enrich2) or CountESS (https://github.com/CountESS-Project/CountESS).²⁴
- k.
  It is recommended to filter out variants which have a low count in unselected cells due to either low editing efficiency or low abundance of certain epegRNAs in the final epegRNA pool.
  Note: In our experience, filtering out variants in unselected cells below 0.1% allele frequency is necessary as variants below this cutoff tend to have enrichment scores outside of the ranges for variants in the same class, as well as high variability between replicates. A further discussion of this phenomenon can be found in Rubin et al.²⁴ This threshold may need to be adjusted somewhat for each gene:selection pair and can be done by assessing the enrichment scores and SE for internal assay validation variants with varying cutoff thresholds. Optionally, it may be of interest to normalize functional scores across epegRNA libraries, similar to the normalization in Buckley et al.²⁶
  
  CRITICAL: It is important at this step to critically assess whether the functional scores are performing as intended on included internal assay controls. Namely, are synonymous and PTC variants well-separated? How do the functional scores for ClinVar BLB variants compare to those of ClinVar PLP variants? At what level of evidence can this data be used in a variant classification framework? We recommend using guidelines outlined in Brnich et al. for calibrating cliPE MAVE data.²⁷

Expected outcomes

The epegRNA Designer Shiny app enables users to quickly generate cliPE designs for a gene of interest. Archetypal epegRNAs are screened to filter out epegRNAs which do not produce robust genome editing. It is recommended to screen a minimum of 12 archetypal epegRNAs, as the dropout rate at this stage is estimated to be 50–75%. We provide an example archetypal epegRNA screen in Figure 1. As machine learning predictions of prime editing efficiency improve, it may become possible to reduce the dropout rate at this stage. Before proceeding to cloning epegRNA libraries and matching nicking gRNAs, we encourage users to confirm that the epegRNA libraries corresponding to the validated archetypal epegRNAs contain sufficient variants, including the different classes of truth set variants discussed above (with additional context provided on https://home.clipe-mave.org/). It may be necessary to screen an additional set of archetypal epegRNAs to fill in gaps in the truth set. The epegRNA Designer Shiny app allows prioritization for either VUS or truth set variants, which may be useful if the regions of the gene with highest VUS density do not contain sufficient truth set variants.

Example of archetypal epegRNA screen

(A) Schematic representation of the entire *TSC2* gene. Arrowheads indicate targeted regions for screen (B) Excerpt of screen highlighting representative Sanger chromatograms in co-transfected and control untransfected HEK 293T cells. Arrowhead color indicates screening result as follows: light or dark green = efficient editing, red = low/no editing, white = not data not shown. Exons in Rap-GAP domain depicted in purple.

After completing cloning Steps 6–7, it is anticipated that users will obtain epegRNA libraries and nicking gRNAs and can proceed directly to co-transfection of haploid cells (Step 8). The optional QC step of targeted amplicon sequencing of epegRNA libraries is useful to check for potential variant dropout due to particular reverse transcription template oligos cloning inefficiently. Library composition is usually not a major confounding factor for cliPE as each library is relatively small (on the order of 40-60 unique epegRNAs). Optionally, the QC step outlined in Step 8 is the most useful step to check for variant dropout by amplicon sequencing of the target locus in GFP+, co-transfected cells. Based on our previous work using cliPE on the TSC2 gene, it is anticipated that overall variant dropout rate will be ∼33%. We provide an example of prime editing with an epegRNA library in Figure 2.

Example of cliPE editing in HAP1 cells

Amplicon sequencing was used to validate editing of HAP1 cells in the targeted region of *TSC2* exon 17. Fastq files were aligned to the human hg38 reference and bam files were viewed in IGV software.

The selection portion of the cliPE workflow is highly context-dependent, and we anticipate cliPE will be compatible with a wide variety of selection paradigms. The most important consideration is to ensure that the selection method is compatible with collection of sufficient gDNA from selected and unselected cell pools. The yield of gDNA from selected cell pools needs to be compatible with PCR amplification of the target locus for downstream amplicon sequencing. This can be successful even with relatively low yields in the 0.2–1.0 ng/μL range.

In Step 10, targeted amplicon sequencing libraries are generated in order to compare allele frequencies of variants in selected and unselected cell pools. One multiplexed sequencing run can generate data for a full cliPE experiment with many libraries and biological replicates as each individual libraries requires only 200,000 read depth and many instruments are available which produce >100 million reads at reasonable cost. We have provided detailed protocols for Illumina 150 bp paired end reads, but other sequencing platforms should be compatible with cliPE. The analysis in Step 11 is streamlined and requires installation of only a single package, the Jellyfish software for k-mer counting. The Jellyfish outputs are then fed into a series of Shiny apps to generate enrichment scores by random effects modeling, similar to the Enrich2 software package.²⁴ The code underlying these Shiny apps is shared on the cliPE Github page and users are welcome to modify as necessary to suit the needs of their experiment.

Estimating the cost of cliPE

With a small number of epegRNA libraries (5-10), cliPE is anticipated to cost about $1300 per library. Among the biggest expenses are the oligo pools themselves, at around $300 per pool. Another significant expense is the final amplicon sequencing of all replicates, estimated to be less than $2,000. As sequencing is performed using a 150 bp paired-end protocol with standard Illumina adaptors, it does not require custom primers and can be multiplexed with other libraries where appropriate. Another cost worth briefly mentioning is the initial investment in low-passage HAP1 cells from Horizon and HEK 293T from ATCC.

Quantification and statistical analysis

Much of the data that validates this protocol is available in the Supplementary Information of a biorxiv preprint.⁶ We have made examples available in the docs subfolder on the cliPE Github repository (https://github.com/calhoujd/calhoujd.github.io) to help users understand the expected output of Shiny apps and the correct data structure of any necessary input files. The .fastq files can be used to test the Jellyfish commands in Step 11. […]_kmerCount.fa fasta files can be used as example input into the cliPEr_app1_fasta2csv Shiny app. The example_input_CB1_R1.csv and TSC2_e37_kmer_dictionary.csv files can be used as example inputs into the cliPEr_app2_kmers2variants Shiny app. No reverse complementation is required to merge these dataframes and annotate the kmers with variant names. The Book2_e17_3xreps.csv and Book2_e17_4xreps.csv can be used as example inputs into the cliPEr_app3_random_effects_modeling Shiny app. The cliPE epegRNA Designer tool has a built-in default example based on the gene TSC2 that can be run and users can inspect the different output files.

Limitations

CliPE is retrospective in nature

CliPE generates variants in a region of interest based on variants present in databases of variation, particularly gnomAD and ClinVar. As such, cliPE provides retrospective data based on what variants are present in these databases at the time of designing prime editing reagents. Saturation data such as that generated by other MAVE platforms like deep mutational scanning or saturation genome editing are both retrospective and prospective, at the cost of additional expense, required expertise, and increased hands-on time.

Variability in prime editing efficiency

It is important to note that while archetypal epegRNA screening (Step 5) will filter out poorly performing epegRNA architectures, there is still some variability observed within an epegRNA library. There are numerous factors at play, most notably (1) the distance between the PAM site and the edit and (2) the relative proportion of a particular epegRNA within the epegRNA library. The optional QC steps are useful to assess library composition and editing efficiency of each variant. Our prior experience with TSC2 suggests dropout of ∼33% of targeted variants either due to low cloning efficiency or editing efficiency.⁶ Design of independent epegRNAs which target overlapping genomic regions is one possibility for alleviating variant dropout, though this would result in an increase in cost per variant overall.

cliPE requires endogenous expression of the gene of interest in a haploid context

The method herein is based around prime editing of endogenous loci and therefore requires the following: (1) a cell line which can be co-transfected reasonably efficiently with multiple plasmids, (2) a haploid genome or haploidized locus, and (3) endogenous expression of the gene of interest. Genes endogenously expressed in HAP1 cells are the primary candidates for cliPE and the most likely use case. Genes expressed endogenously in other cell lines may be candidates for cliPE after haploidization of the locus of interest.² This can be accomplished using genome engineering tools capable of making large chromosomal deletions.²⁸ Modifications such as using viral vectors in place of plasmids may be useful for targeting cell lines that are difficult to transfect.

Troubleshooting

Problem 1: Incomplete vector digestion (related to multiple cloning steps)

Incomplete DNA digestion is a common issue that causes either cloning failure or reduced cloning efficiency.

Potential solution

Any time linearized vector is prepared, it is recommended to run a small (∼3 μL) aliquot of vector on a 0.8–1% agarose gel to check for linearization and lack of supercoiled species. It is also recommended to perform an additional ligation reaction with vector only to check for the presence of undigested supercoiled DNA which may confound the subsequent transformation. Minimal outgrowth of vector will be observed in the vector-only ligation transformation if the initial digestion reaction was complete. Significant outgrowth usually suggests incomplete restriction digest of the destination vector. If this is observed, repeat the restriction digest of the backbone vector.

Problem 2: Failure cloning of nicking gRNAs due to improper design of spacer oligos (related to step 6a)

Cloning of nicking gRNAs follows a classic method widely utilized in genome engineering with CRISPR/Cas9. When this method is done properly, it results in very efficient cloning and hundreds to thousands of clones with the correct plasmid. However, a few common issues can cause issues with this cloning. One of the most common issues is incorrect design of the spacer oligonucleotides.

Potential solution

Cloning of nicking gRNAs into BPK1520 follows a similar principle as standard px458/px459 cloning. It is important to order oligonucleotides that conform to the following sticky ends:

ngRNAP_TS: 5’ – CACCG – (20 bp spacer) – 3’

ngRNAP_BS: 3’ – C – (20 bp spacer) - CAAA – 5.

Remember to use the reverse complement of the spacer sequence for the bottom strand oligo. If the gRNA natively begins with a guanine nucleotide, it is not necessary to add the extra guanine preceding the spacer:

ngRNAP2_TS: 5’ – CACC- (20 bp spacer) – 3’

ngRNAP2_BS: 3’ – (20 bp spacer)- CAAA – 5.

Problem 3: Failure cloning of nicking gRNAs due to not diluting preannealed oligos (related to step 6d)

The step in which preannealed spacer oligos are diluted down before the subsequent ligation reaction is often skipped, which often causes failure of the ligation and downstream bacterial transformation.

Potential solution

Repeat the cloning protocol with dilution of preannealed spacers.

Problem 4: Amplification of DNA in the no-template control when preparing targeted amplicon sequencing libraries (related to step 10c, 11c, or 11e)

The amplicon sequencing protocol is robust and typically requires minimal optimization or troubleshooting. Rarely, amplicons are observed no template control conditions, suggesting contamination of one of the PCR mastermix components with a trace amount of amplicon.

Potential solution

In most cases, discarding DNA grade water aliquots and preparing fresh working stocks (10 μM) forward and reverse primers from 100 μM master stocks corrects this issue.

Problem 5: HAP1 cells spontaneously become diploid in culture (related to step 5)

Spontaneous diploidization of HAP1 cells has been reported previously.²⁹ We have observed this phenomenon, especially during the generation of clonal cell lines derivative of parental HAP1s.

Potential solution

There are several strategies that can be employed to alleviate this issue. First, it is advised to work with low-passage cells to minimize the number of cells with diploid genomes. Second, it may be necessary to utilize cell sorting to enrich for cells with haploid genomes.²⁹ Finally, another strategy involves a small molecule, 10-Deacetyl-baccatin-III (DAB), that emerged from an unbiased screen for compounds which selected for haploid cells within mixed cultures.³⁰ Anecdotally, some labs have reported success with DAB treatment, while others have found little to no effect of drug treatment. Pilot studies with DAB treatment may be useful to determine if treatment enriches for haploid cells.

Resource availability

Lead contact

Further information and requests for resources and reagents should be directed to and will be fulfilled by the lead contact, Jeffrey Calhoun (jeffrey.calhoun@northwestern.edu).

Technical contact

Technical questions on executing this protocol should be directed to and will be answered by the technical contact, Jeffrey Calhoun (jeffrey.calhoun@northwestern.edu).

Materials availability

Backbone plasmids are available on Addgene (https://www.addgene.org/).

Data and code availability

Full datasets will be made available upon publication of the TSC2 MAVE research manuscript. Sufficient excerpts of this data are provided to act as positive controls for cloning steps, prime editing, and data analysis. All Shiny apps, code, examples, etc., are available via the cliPE landing page (http://home.clipe-mave.org) and the cliPE GitHub repository (https://github.com/calhoujd/calhoujd.github.io). Zenodo versions of record are available for cliPE GitHub repository (https://doi.org/10.5281/zenodo.15324535) and the GitHub repository for the epegRNA Designer Shiny app (https://doi.org/10.5281/zenodo.15328525).

Acknowledgments

This work was sponsored by an American Epilepsy Society Junior Investigator Award (J.D.C.). This work was supported by the Northwestern University – Flow Cytometry Core Facility supported by a Cancer Center Support Grant (NCI CA060553). Flow cytometry cell sorting was performed on a BD FACSMelody, purchased through the support of NIH 1S10OD011996-01 and 1S10OD026814-01. We thank David Liu and the Liu lab for sharing their prime editing reagents. The authors would like to acknowledge Addgene for its invaluable service, which facilitated this study. We further thank the Plasmidsaurus sequencing team for making quality control and pilot experiments feasible and accessible. The graphical abstract was created in BioRender: Calhoun, J. (2025) https://BioRender.com/c57w266.

Author contributions

J.D.C., C.G.B., and G.L.C. wrote the manuscript. J.D.C. and C.G.B. developed the wet lab methods. J.D.C., C.G.B., and N.B. developed the dry lab methods. N.B. developed the cliPE epegRNA Designer Shiny app. All authors read and approved the manuscript.

Declaration of interests

The authors declare no competing interests.

Footnotes

Supplemental information can be found online at https://doi.org/10.1016/j.xpro.2025.103851.

Supplemental information

Table S1. Nextera ampSeq barcoding primer information to multiplex samples

It is possible to single barcode up to 24 samples using AMPSEQ1-24 with any P5 primer. For double barcoding, use different combinations of P5 primers (AMPSEQ25-40) with AMPSEQ1-24.

mmc1.xlsx^{(12.1KB, xlsx)}

Table S2. Sequence of oligonucleotides used during cliPE, general primers related all steps

mmc2.xlsx^{(9.9KB, xlsx)}

Table S3. Positive controls from TSC2 study for archetypal epegRNA screen and nicking gRNA for co-transfection with epegRNA library, related to Step 8

mmc3.xlsx^{(9.4KB, xlsx)}

Table S4. Oligo pool from TSC2 study which can be used to generate a positive control epegRNA library, related to Step 8

mmc4.xlsx^{(10.7KB, xlsx)}

References

1.Landrum M.J., Chitipiralla S., Brown G.R., Chen C., Gu B., Hart J., Hoffman D., Jang W., Kaur K., Liu C., et al. ClinVar: improvements to accessing data. Nucleic Acids Res. 2020;48:D835–D844. doi: 10.1093/nar/gkz972. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Erwood S., Bily T.M.I., Lequyer J., Yan J., Gulati N., Brewer R.A., Zhou L., Pelletier L., Ivakine E.A., Cohn R.D. Saturation variant interpretation using CRISPR prime editing. Nat. Biotechnol. 2022;40:885–895. doi: 10.1038/s41587-021-01201-1. [DOI] [PubMed] [Google Scholar]
3.Anzalone A.V., Randolph P.B., Davis J.R., Sousa A.A., Koblan L.W., Levy J.M., Chen P.J., Wilson C., Newby G.A., Raguram A., Liu D.R. Search-and-replace genome editing without double-strand breaks or donor DNA. Nature. 2019;576:149–157. doi: 10.1038/s41586-019-1711-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Chen P.J., Hussmann J.A., Yan J., Knipping F., Ravisankar P., Chen P.F., Chen C., Nelson J.W., Newby G.A., Sahin M., et al. Enhanced prime editing systems by manipulating cellular determinants of editing outcomes. Cell. 2021;184:5635–5652.e29. doi: 10.1016/j.cell.2021.09.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Nelson J.W., Randolph P.B., Shen S.P., Everette K.A., Chen P.J., Anzalone A.V., An M., Newby G.A., Chen J.C., Hsu A., Liu D.R. Engineered pegRNAs improve prime editing efficiency. Nat. Biotechnol. 2022;40:402–410. doi: 10.1038/s41587-021-01039-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Biar C.G., Pfeifer C., Carvill G.L., Calhoun J.D. Multimodal framework to resolve variants of uncertain significance in <em>TSC2</em&gt. bioRxiv. 2024 doi: 10.1101/2024.06.07.597916. Preprint at. [DOI] [Google Scholar]
7.Kleinstiver B.P., Prew M.S., Tsai S.Q., Topkar V.V., Nguyen N.T., Zheng Z., Gonzales A.P.W., Li Z., Peterson R.T., Yeh J.R.J., et al. Engineered CRISPR-Cas9 nucleases with altered PAM specificities. Nature. 2015;523:481–485. doi: 10.1038/nature14592. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Marcais G., Kingsford C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics. 2011;27:764–770. doi: 10.1093/bioinformatics/btr011. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Jaganathan K., Kyriazopoulou Panagiotopoulou S., McRae J.F., Darbandi S.F., Knowles D., Li Y.I., Kosmicki J.A., Arbelaez J., Cui W., Schwartz G.B., et al. Predicting Splicing from Primary Sequence with Deep Learning. Cell. 2019;176:535–548.e24. doi: 10.1016/j.cell.2018.12.015. [DOI] [PubMed] [Google Scholar]
10.Doman J.L., Sousa A.A., Randolph P.B., Chen P.J., Liu D.R. Designing and executing prime editing experiments in mammalian cells. Nat. Protoc. 2022;17:2431–2468. doi: 10.1038/s41596-022-00724-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.McDade J.R., Waxmonsky N.C., Swanson L.E., Fan M. Practical Considerations for Using Pooled Lentiviral CRISPR Libraries. Curr. Protoc. Mol. Biol. 2016;115:31.5.1–31.5.13. doi: 10.1002/cpmb.8. [DOI] [PubMed] [Google Scholar]
12.Blomen V.A., Májek P., Jae L.T., Bigenzahn J.W., Nieuwenhuis J., Staring J., Sacco R., van Diemen F.R., Olk N., Stukalov A., et al. Gene essentiality and synthetic lethality in haploid human cells. Science. 2015;350:1092–1096. doi: 10.1126/science.aac7557. [DOI] [PubMed] [Google Scholar]
13.Findlay G.M., Daza R.M., Martin B., Zhang M.D., Leith A.P., Gasperini M., Janizek J.D., Huang X., Starita L.M., Shendure J. Accurate classification of BRCA1 variants with saturation genome editing. Nature. 2018;562:217–222. doi: 10.1038/s41586-018-0461-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Radford E.J., Tan H.K., Andersson M.H.L., Stephenson J.D., Gardner E.J., Ironfield H., Waters A.J., Gitterman D., Lindsay S., Abascal F., et al. Saturation genome editing of DDX3X clarifies pathogenicity of germline and somatic variation. Nat. Commun. 2023;14:7702. doi: 10.1038/s41467-023-43041-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Matreyek K.A., Starita L.M., Stephany J.J., Martin B., Chiasson M.A., Gray V.E., Kircher M., Khechaduri A., Dines J.N., Hause R.J., et al. Multiplex assessment of protein variant abundance by massively parallel sequencing. Nat. Genet. 2018;50:874–882. doi: 10.1038/s41588-018-0122-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Baglaenko Y., Curtis M., Suqri M.A., Agnew R., Nathan A., Mire H.M., Mah-Som A.Y., Liu D.R., Newby G.A., Raychaudhuri S. Defining the function of disease variants with CRISPR editing and multimodal single cell sequencing. bioRxiv. 2024 doi: 10.1101/2024.03.28.587175. Preprint at. [DOI] [Google Scholar]
17.Muhammad A., Calandranis M.E., Li B., Yang T., Blackwell D.J., Harvey M.L., Smith J.E., Daniel Z.A., Chew A.E., Capra J.A., et al. High-throughput functional mapping of variants in an arrhythmia gene, KCNE1, reveals novel biology. Genome Med. 2024;16:73. doi: 10.1186/s13073-024-01340-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Ursu O., Neal J.T., Shea E., Thakore P.I., Jerby-Arnon L., Nguyen L., Dionne D., Diaz C., Bauman J., Mosaad M.M., et al. Massively parallel phenotyping of coding variants in cancer with Perturb-seq. Nat. Biotechnol. 2022;40:896–905. doi: 10.1038/s41587-021-01160-7. [DOI] [PubMed] [Google Scholar]
19.Giacomelli A.O., Yang X., Lintner R.E., McFarland J.M., Duby M., Kim J., Howard T.P., Takeda D.Y., Ly S.H., Kim E., et al. Mutational processes shape the landscape of TP53 mutations in human cancer. Nat. Genet. 2018;50:1381–1387. doi: 10.1038/s41588-018-0204-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Simon J.J., Fowler D.M., Maly D.J. Multiplexed profiling of intracellular protein abundance, activity, interactions and druggability with LABEL-seq. Nat. Methods. 2024;21:2094–2106. doi: 10.1038/s41592-024-02456-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Lacoste J., Haghighi M., Haider S., Reno C., Lin Z.Y., Segal D., Qian W.W., Xiong X., Teelucksingh T., Miglietta E., et al. Pervasive mislocalization of pathogenic coding variants underlying human disorders. Cell. 2024;187:6725–6741.e13. doi: 10.1016/j.cell.2024.09.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Starita L.M., Ahituv N., Dunham M.J., Kitzman J.O., Roth F.P., Seelig G., Shendure J., Fowler D.M. Variant Interpretation: Functional Assays to the Rescue. Am. J. Hum. Genet. 2017;101:315–325. doi: 10.1016/j.ajhg.2017.07.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Tabet D., Parikh V., Mali P., Roth F.P., Claussnitzer M. Scalable Functional Assays for the Interpretation of Human Genetic Variation. Annu. Rev. Genet. 2022;56:441–465. doi: 10.1146/annurev-genet-072920-032107. [DOI] [PubMed] [Google Scholar]
24.Rubin A.F., Gelman H., Lucas N., Bajjalieh S.M., Papenfuss A.T., Speed T.P., Fowler D.M. A statistical framework for analyzing deep mutational scanning data. Genome Biol. 2017;18:150. doi: 10.1186/s13059-017-1272-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Viechtbauer W. Conducting Meta-Analyses in R with the metafor Package. J. Stat. Softw. 2010;36:1–48. doi: 10.18637/jss.v036.i03. [DOI] [Google Scholar]
26.Buckley M., Kajba C.M., Forrester N., Terwagne C., Sawyer C., Shepherd S.T.C., Jonghe J.D., Dace P., Turajlic S., Findlay G.M. Saturation Genome Editing Resolves the Functional Spectrum of Pathogenic <em>VHL</em> Alleles. bioRxiv. 2023 doi: 10.1101/2023.06.10.542698. Preprint at. [DOI] [Google Scholar]
27.Brnich S.E., Abou Tayoun A.N., Couch F.J., Cutting G.R., Greenblatt M.S., Heinen C.D., Kanavy D.M., Luo X., McNulty S.M., Starita L.M., et al. Recommendations for application of the functional evidence PS3/BS3 criterion using the ACMG/AMP sequence variant interpretation framework. Genome Med. 2019;12:3. doi: 10.1186/s13073-019-0690-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Li J., Zhao D., Zhang T., Xiong H., Hu M., Liu H., Zhao F., Sun X., Fan P., Qian Y., et al. Precise large-fragment deletions in mammalian cells and mice generated by dCas9-controlled CRISPR/Cas3. Sci. Adv. 2024;10 doi: 10.1126/sciadv.adk8052. [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Beigl T.B., Kjosas I., Seljeseth E., Glomnes N., Aksnes H. Efficient and crucial quality control of HAP1 cell ploidy status. Biol. Open. 2020;9 doi: 10.1242/bio.057174. [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Olbrich T., Vega-Sendino M., Murga M., de Carcer G., Malumbres M., Ortega S., Ruiz S., Fernandez-Capetillo O. A Chemical Screen Identifies Compounds Capable of Selecting for Haploidy in Mammalian Cells. Cell Rep. 2019;28:597–604.e4. doi: 10.1016/j.celrep.2019.06.060. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Table S1. Nextera ampSeq barcoding primer information to multiplex samples

It is possible to single barcode up to 24 samples using AMPSEQ1-24 with any P5 primer. For double barcoding, use different combinations of P5 primers (AMPSEQ25-40) with AMPSEQ1-24.

mmc1.xlsx^{(12.1KB, xlsx)}

Table S2. Sequence of oligonucleotides used during cliPE, general primers related all steps

mmc2.xlsx^{(9.9KB, xlsx)}

Table S3. Positive controls from TSC2 study for archetypal epegRNA screen and nicking gRNA for co-transfection with epegRNA library, related to Step 8

mmc3.xlsx^{(9.4KB, xlsx)}

Table S4. Oligo pool from TSC2 study which can be used to generate a positive control epegRNA library, related to Step 8

mmc4.xlsx^{(10.7KB, xlsx)}

Data Availability Statement

[bib1] 1.Landrum M.J., Chitipiralla S., Brown G.R., Chen C., Gu B., Hart J., Hoffman D., Jang W., Kaur K., Liu C., et al. ClinVar: improvements to accessing data. Nucleic Acids Res. 2020;48:D835–D844. doi: 10.1093/nar/gkz972. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib2] 2.Erwood S., Bily T.M.I., Lequyer J., Yan J., Gulati N., Brewer R.A., Zhou L., Pelletier L., Ivakine E.A., Cohn R.D. Saturation variant interpretation using CRISPR prime editing. Nat. Biotechnol. 2022;40:885–895. doi: 10.1038/s41587-021-01201-1. [DOI] [PubMed] [Google Scholar]

[bib3] 3.Anzalone A.V., Randolph P.B., Davis J.R., Sousa A.A., Koblan L.W., Levy J.M., Chen P.J., Wilson C., Newby G.A., Raguram A., Liu D.R. Search-and-replace genome editing without double-strand breaks or donor DNA. Nature. 2019;576:149–157. doi: 10.1038/s41586-019-1711-4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib4] 4.Chen P.J., Hussmann J.A., Yan J., Knipping F., Ravisankar P., Chen P.F., Chen C., Nelson J.W., Newby G.A., Sahin M., et al. Enhanced prime editing systems by manipulating cellular determinants of editing outcomes. Cell. 2021;184:5635–5652.e29. doi: 10.1016/j.cell.2021.09.018. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib5] 5.Nelson J.W., Randolph P.B., Shen S.P., Everette K.A., Chen P.J., Anzalone A.V., An M., Newby G.A., Chen J.C., Hsu A., Liu D.R. Engineered pegRNAs improve prime editing efficiency. Nat. Biotechnol. 2022;40:402–410. doi: 10.1038/s41587-021-01039-7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib6] 6.Biar C.G., Pfeifer C., Carvill G.L., Calhoun J.D. Multimodal framework to resolve variants of uncertain significance in <em>TSC2</em&gt. bioRxiv. 2024 doi: 10.1101/2024.06.07.597916. Preprint at. [DOI] [Google Scholar]

[bib7] 7.Kleinstiver B.P., Prew M.S., Tsai S.Q., Topkar V.V., Nguyen N.T., Zheng Z., Gonzales A.P.W., Li Z., Peterson R.T., Yeh J.R.J., et al. Engineered CRISPR-Cas9 nucleases with altered PAM specificities. Nature. 2015;523:481–485. doi: 10.1038/nature14592. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib8] 8.Marcais G., Kingsford C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics. 2011;27:764–770. doi: 10.1093/bioinformatics/btr011. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib9] 9.Jaganathan K., Kyriazopoulou Panagiotopoulou S., McRae J.F., Darbandi S.F., Knowles D., Li Y.I., Kosmicki J.A., Arbelaez J., Cui W., Schwartz G.B., et al. Predicting Splicing from Primary Sequence with Deep Learning. Cell. 2019;176:535–548.e24. doi: 10.1016/j.cell.2018.12.015. [DOI] [PubMed] [Google Scholar]

[bib10] 10.Doman J.L., Sousa A.A., Randolph P.B., Chen P.J., Liu D.R. Designing and executing prime editing experiments in mammalian cells. Nat. Protoc. 2022;17:2431–2468. doi: 10.1038/s41596-022-00724-4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib11] 11.McDade J.R., Waxmonsky N.C., Swanson L.E., Fan M. Practical Considerations for Using Pooled Lentiviral CRISPR Libraries. Curr. Protoc. Mol. Biol. 2016;115:31.5.1–31.5.13. doi: 10.1002/cpmb.8. [DOI] [PubMed] [Google Scholar]

[bib12] 12.Blomen V.A., Májek P., Jae L.T., Bigenzahn J.W., Nieuwenhuis J., Staring J., Sacco R., van Diemen F.R., Olk N., Stukalov A., et al. Gene essentiality and synthetic lethality in haploid human cells. Science. 2015;350:1092–1096. doi: 10.1126/science.aac7557. [DOI] [PubMed] [Google Scholar]

[bib13] 13.Findlay G.M., Daza R.M., Martin B., Zhang M.D., Leith A.P., Gasperini M., Janizek J.D., Huang X., Starita L.M., Shendure J. Accurate classification of BRCA1 variants with saturation genome editing. Nature. 2018;562:217–222. doi: 10.1038/s41586-018-0461-z. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib14] 14.Radford E.J., Tan H.K., Andersson M.H.L., Stephenson J.D., Gardner E.J., Ironfield H., Waters A.J., Gitterman D., Lindsay S., Abascal F., et al. Saturation genome editing of DDX3X clarifies pathogenicity of germline and somatic variation. Nat. Commun. 2023;14:7702. doi: 10.1038/s41467-023-43041-4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib15] 15.Matreyek K.A., Starita L.M., Stephany J.J., Martin B., Chiasson M.A., Gray V.E., Kircher M., Khechaduri A., Dines J.N., Hause R.J., et al. Multiplex assessment of protein variant abundance by massively parallel sequencing. Nat. Genet. 2018;50:874–882. doi: 10.1038/s41588-018-0122-z. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib16] 16.Baglaenko Y., Curtis M., Suqri M.A., Agnew R., Nathan A., Mire H.M., Mah-Som A.Y., Liu D.R., Newby G.A., Raychaudhuri S. Defining the function of disease variants with CRISPR editing and multimodal single cell sequencing. bioRxiv. 2024 doi: 10.1101/2024.03.28.587175. Preprint at. [DOI] [Google Scholar]

[bib17] 17.Muhammad A., Calandranis M.E., Li B., Yang T., Blackwell D.J., Harvey M.L., Smith J.E., Daniel Z.A., Chew A.E., Capra J.A., et al. High-throughput functional mapping of variants in an arrhythmia gene, KCNE1, reveals novel biology. Genome Med. 2024;16:73. doi: 10.1186/s13073-024-01340-5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib18] 18.Ursu O., Neal J.T., Shea E., Thakore P.I., Jerby-Arnon L., Nguyen L., Dionne D., Diaz C., Bauman J., Mosaad M.M., et al. Massively parallel phenotyping of coding variants in cancer with Perturb-seq. Nat. Biotechnol. 2022;40:896–905. doi: 10.1038/s41587-021-01160-7. [DOI] [PubMed] [Google Scholar]

[bib19] 19.Giacomelli A.O., Yang X., Lintner R.E., McFarland J.M., Duby M., Kim J., Howard T.P., Takeda D.Y., Ly S.H., Kim E., et al. Mutational processes shape the landscape of TP53 mutations in human cancer. Nat. Genet. 2018;50:1381–1387. doi: 10.1038/s41588-018-0204-y. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib20] 20.Simon J.J., Fowler D.M., Maly D.J. Multiplexed profiling of intracellular protein abundance, activity, interactions and druggability with LABEL-seq. Nat. Methods. 2024;21:2094–2106. doi: 10.1038/s41592-024-02456-7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib21] 21.Lacoste J., Haghighi M., Haider S., Reno C., Lin Z.Y., Segal D., Qian W.W., Xiong X., Teelucksingh T., Miglietta E., et al. Pervasive mislocalization of pathogenic coding variants underlying human disorders. Cell. 2024;187:6725–6741.e13. doi: 10.1016/j.cell.2024.09.003. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib22] 22.Starita L.M., Ahituv N., Dunham M.J., Kitzman J.O., Roth F.P., Seelig G., Shendure J., Fowler D.M. Variant Interpretation: Functional Assays to the Rescue. Am. J. Hum. Genet. 2017;101:315–325. doi: 10.1016/j.ajhg.2017.07.014. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib23] 23.Tabet D., Parikh V., Mali P., Roth F.P., Claussnitzer M. Scalable Functional Assays for the Interpretation of Human Genetic Variation. Annu. Rev. Genet. 2022;56:441–465. doi: 10.1146/annurev-genet-072920-032107. [DOI] [PubMed] [Google Scholar]

[bib24] 24.Rubin A.F., Gelman H., Lucas N., Bajjalieh S.M., Papenfuss A.T., Speed T.P., Fowler D.M. A statistical framework for analyzing deep mutational scanning data. Genome Biol. 2017;18:150. doi: 10.1186/s13059-017-1272-5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib25] 25.Viechtbauer W. Conducting Meta-Analyses in R with the metafor Package. J. Stat. Softw. 2010;36:1–48. doi: 10.18637/jss.v036.i03. [DOI] [Google Scholar]

[bib26] 26.Buckley M., Kajba C.M., Forrester N., Terwagne C., Sawyer C., Shepherd S.T.C., Jonghe J.D., Dace P., Turajlic S., Findlay G.M. Saturation Genome Editing Resolves the Functional Spectrum of Pathogenic <em>VHL</em> Alleles. bioRxiv. 2023 doi: 10.1101/2023.06.10.542698. Preprint at. [DOI] [Google Scholar]

[bib27] 27.Brnich S.E., Abou Tayoun A.N., Couch F.J., Cutting G.R., Greenblatt M.S., Heinen C.D., Kanavy D.M., Luo X., McNulty S.M., Starita L.M., et al. Recommendations for application of the functional evidence PS3/BS3 criterion using the ACMG/AMP sequence variant interpretation framework. Genome Med. 2019;12:3. doi: 10.1186/s13073-019-0690-2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib28] 28.Li J., Zhao D., Zhang T., Xiong H., Hu M., Liu H., Zhao F., Sun X., Fan P., Qian Y., et al. Precise large-fragment deletions in mammalian cells and mice generated by dCas9-controlled CRISPR/Cas3. Sci. Adv. 2024;10 doi: 10.1126/sciadv.adk8052. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib29] 29.Beigl T.B., Kjosas I., Seljeseth E., Glomnes N., Aksnes H. Efficient and crucial quality control of HAP1 cell ploidy status. Biol. Open. 2020;9 doi: 10.1242/bio.057174. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib30] 30.Olbrich T., Vega-Sendino M., Murga M., de Carcer G., Malumbres M., Ortega S., Ruiz S., Fernandez-Capetillo O. A Chemical Screen Identifies Compounds Capable of Selecting for Haploidy in Mammalian Cells. Cell Rep. 2019;28:597–604.e4. doi: 10.1016/j.celrep.2019.06.060. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Protocol to perform multiplexed assays of variant effect using curated loci prime editing

Carina G Biar

Nicholas Bodkin

Gemma L Carvill

Jeffrey D Calhoun

Summary

Graphical abstract

Highlights

Before you begin

Workflow overview

Key resources table

Materials and equipment

Step-by-step method details

Design archetypal epegRNAs, epegRNA libraries, and nicking gRNAs

Table 1.

Screen archetypal epegRNAs representative of each candidate library

Table 2.

Clone nicking gRNAs and prime editing libraries

Prime edit haploid HAP1s

Table 3.

Select cells

Sequence regions of genes targeted for genome editing

Analysis of data using random effects modeling

Expected outcomes

Figure 1.

Figure 2.

Estimating the cost of cliPE

Quantification and statistical analysis

Limitations

CliPE is retrospective in nature

Variability in prime editing efficiency

cliPE requires endogenous expression of the gene of interest in a haploid context

Troubleshooting

Problem 1: Incomplete vector digestion (related to multiple cloning steps)

Potential solution

Problem 2: Failure cloning of nicking gRNAs due to improper design of spacer oligos (related to step 6a)

Potential solution

Problem 3: Failure cloning of nicking gRNAs due to not diluting preannealed oligos (related to step 6d)

Potential solution

Problem 4: Amplification of DNA in the no-template control when preparing targeted amplicon sequencing libraries (related to step 10c, 11c, or 11e)

Potential solution

Problem 5: HAP1 cells spontaneously become diploid in culture (related to step 5)

Potential solution

Resource availability

Lead contact

Technical contact

Materials availability

Data and code availability

Acknowledgments

Author contributions

Declaration of interests

Footnotes

Supplemental information

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases