Summary
Protein mutagenesis is essential for unveiling the molecular mechanisms underlying protein function in health, disease, and evolution. In the past decade, deep mutational scanning methods have evolved to support the functional analysis of nearly all possible single-amino acid changes in a protein of interest. While historically these methods were developed in lower organisms such as E. coli and yeast, recent technological advancements have resulted in the increased use of mammalian cells, particularly for studying proteins involved in human disease. These advancements will aid significantly in the classification and interpretation of variants of unknown significance, which are being discovered at large scale due to the current surge in the use of whole-genome sequencing in clinical contexts. Here, we explore the experimental aspects of deep mutational scanning studies in mammalian cells and report the different methods used in each step of the workflow, ultimately providing a useful guide toward the design of such studies.
Keywords: deep mutational scanning, mammalian, variants of unknown significance, structure/function analysis, saturation mutagenesis, genome editing
Deep mutational scanning approaches in mammalian cells are rapidly evolving and provide important insights into structure-function relationships and disease variants of human proteins. In this review, Maes et al. give an overview of basic concepts and recent technological advancements, providing a guide toward the design of mammalian deep mutational scanning studies.
Introduction
As the sequence of a protein dictates its structure and function, mutagenesis can aid in understanding different functional properties such as stability, interactions, and enzymatic activity, as well as mechanisms underlying disease and evolution. Mutagenesis studies of proteins have, for example, aided in the identification of protein-protein interaction (PPI) interfaces1,2,3,4 and contributed to the study of ion channel functioning.5 Technologies collectively referred to as deep mutational scanning (DMS) have advanced this field by allowing the simultaneous and unbiased evaluation of large protein variant libraries,6 reflected in the recent expansion of this research field.
While variant screening has also been performed for regulatory DNA sequences,7 or in targeted studies for a limited number of protein variants discovered through genome sequencing,8 DMS studies mainly focus on the comprehensive evaluation of single-site protein variants (also called single-site saturation analysis). More specifically, they analyze all possible non-synonymous substitutions of each individual amino acid within the region of interest. Of note, some studies evaluate the effects of nonsense and/or synonymous substitutions as additional controls. A select number of studies have evaluated the effect of multiple amino acid substitutions per variant in combinatorial DMS studies,9,10,11,12 where comprehensive screening can only be achieved for very small regions.13
DMS studies have been performed for identifying the determinants underlying different functional properties,14,15 for protein and antibody engineering,16,17,18 for interaction interface and epitope mapping,19,20 and for researching antibiotic resistance.21 Furthermore, DMS has proven its utility in understanding viral infections, such as in the ongoing COVID-19 pandemic.17,22,23,24 The technology can provide insights into the evolutionary arms races of host antiviral proteins,25 expose the genetic variation in host infectivity genes underlying susceptibility,26 reveal the mutational landscape of viral proteins toward increased infectivity and antibody escape,19,24 and assess the role of viral proteins in replication fitness.27,28
The past years have seen a vast increase in the use of next-generation sequencing (NGS) for genetic testing in inherited diseases and cancer.29 While patient benefit is abundantly clear, NGS also reveals an increasing number of variants with unknown phenotypic consequences, referred to as variants of unknown significance (VUSs). DMS can provide unbiased and comprehensive variant evaluation, potentially enabling the classification of observed VUSs.30,31,32 Such an assessment will be instrumental in unlocking the full translational potential of genomics, creating new opportunities for diagnosis and therapy.
Although DMS is widely performed in lower organisms such as E. coli or yeast, there has been a clear increase of DMS studies of human proteins in mammalian cells, as they provide the normal physiological environment with correct protein spatiotemporal organization, folding, post-translational modifications, and co-factors. The processing of endogenous protein substrates by the chaperones TAPBPR and Tapasin is a fitting example of a DMS study that requires a mammalian context.33 The study of therapeutic proteins can also benefit from the use of mammalian cells by directly providing the final therapeutic format and context.17,34
DMS studies apply a general workflow35 (Figure 1) of pooled in vitro mutagenesis and the introduction of the resulting plasmid variant library into the expression system (referred to here as exogenous DMS). Alternatively, large-scale CRISPR-based genome editing can directly generate variants at the endogenous locus (referred to here as CRISPR-based DMS). The expression system allows the screening of a selectable phenotype that is linked to variant function either with a proliferation assay or by fluorescence-activated cell sorting (FACS). NGS (e.g., Illumina sequencing) of DNA isolated from the population(s) allows the quantification of variant abundances, which are used to calculate a functional score for each variant as a measure of variant effect, typically visualized on a heatmap.
In this review, we focus on the application of DMS on target proteins in mammalian cells. In Tables 1 and 2, we provide an extensive but non-exhaustive overview of recent DMS studies, making a distinction between the exogenous approaches (Table 1) and the CRISPR-based approaches (Table 2). As one can appreciate, the focus of DMS studies has been on disease-related proteins.36,37,38,39 Exogenous DMS studies additionally focus on proteins involved in host-virus interactions20,21 and therapeutic proteins.34,40,41 Of note, DMS studies of viral proteins to understand replication fitness, where viruses expressing protein variants are selected through passaging in tissue culture,42 are not covered. Within Tables 1 and 2, the studies are broken down into the main steps of the DMS workflow, allowing a detailed comparison between the different reported approaches. Furthermore, we provide an overview of recent technological advances in these different steps of the DMS workflow, particularly emphasising single-site saturation analysis given the focus on it within the field. We also discuss the current challenges in the field and provide a perspective on future developments.
Table 1.
Target | Search space (aa) | Library size: Coverage of designed variants | Scored protein variants (% total possible) | Oligonucleotide-directed mutagenesis | Assay | Sequencing |
---|---|---|---|---|---|---|
Transfection of Epstein-Barr virus-derived episomal plasmids: Dilution with a large excess of carrier DNA so that on average no more than one variant is acquired per cell | ||||||
CXCR420 | 351 | NR | 6,995 (99.6) | overlap extension PCR | PPI with intrinsic surface display | Illumina with PCR amplicons |
PDGFRα40 | 189 | NR | 3,780 (100) | overlap extension PCR | PPI with intrinsic surface display | Illumina with PCR amplicons |
ACE217 | 117 | NR | 2,340 (100) | overlap extension PCR | PPI with intrinsic surface display | Illumina with PCR amplicons |
HIV gp16043 | 826 | NR | 16,332 (98.9) | overlap extension PCR | PPI with intrinsic surface display | Illumina with PCR amplicons |
HLA-A44 | 180 | NR | 3,524 (97.9) | overlap extension PCR | PPI with BiFC | Illumina with PCR amplicons |
T1R245 | 815 | >10× coverage of 26,080 codon variants | 16,177 (99.2) | overlap extension PCR | co-trafficking of heterodimeric partner | Illumina with PCR amplicons |
TAPBPR33 | 12 | NR | 240 (100) | overlap extension PCR | surface trafficking of interaction partner | Illumina |
Retroviral transduction: Low multiplicity of infection so that on average no more than one variant is acquired per cell | ||||||
p5346 | 393 | ≥1,000× coverage of 8,274 protein variants (8,000,000) | 8,258 (99.8) | cassette ligation by in vitro recombination | drug-induced loss of anti-proliferative activity | Illumina with Nextera amplicons |
ADAR216 | 261 | ≥400× coverage of 4,959 protein variants | 4,931 (99.4) | cassette ligation by in vitro recombination | RNA editing activity and specificity | Illumina |
ZNF1014 | 71 | ≥30× coverage of 5,731 protein variantsa | NR | cassette ligation by IIs restriction | transcription repression | Illumina |
CD8647 | 27 | ≥200× coverage of 1,728 codon variants | 542 (95.6), 519 (91.5) | overlap extension PCR | PPI with intrinsic surface display | Illumina |
TRIM5α25 | 11 | ≥100× coverage of 352 codon variants (≥35,200) | 213 (96.8) | overlap extension PCR | antiviral activity | Illumina |
Thrombopoietin receptor48 | 29 | NR | 580 (100) | overlap extension PCR | activity through survival in the absence of growth factors | Illumina |
Murine PDPN epitope18 | 32 | >100× coverage of 1,024 codon variants (>100,000) | 640 (100) | overlap extension PCR | PPI with intrinsic surface display | Illumina |
p5349 | 191 | ∼40× coverage of 9,833 DNA variants (385,300) | 9,516b (96.8) | inverse PCR | anti-proliferative activity | Illumina |
MSH250 | 934 | ≥10× coverage of 59,776 codon variants (840,000) | 16,749 (94.4) | megaprimer extension | survival upon DNA damage | Illumina |
SHOC251 | 581 | ≥1,000× coverage of 11,952 protein variants (20,000,000) | 11,952 (100) | commercial synthesis | drug sensitivity/resistance | Illumina with Nextera amplicons |
VHL52 | 75 | 135× coverage of 1,500 protein variants | 1,442 (96.1) | commercial synthesis | resistance to small molecule degrader cytotoxicity | Illumina with Nextera amplicons |
SARS-CoV-2 nucleocapsid protein53 | 418 | 19× coverage of 7,942 protein variants (150,000) | 7,893 (99.4), 7,901 (99.5) | commercial synthesis | PPI with mammalian surface display | PacBio CCSc |
EGFR38 | 188 | NR | 3,914a (99.3) | cassette ligation by in vitro recombination | drug sensitivity/resistance | Illumina with PCR amplicons |
Site-specific recombination: Single copy of the recombination site stably integrated into an endogenous locus so that only one variant is acquired per cell | ||||||
p5AD 31 and AD215 | 61 | 270× coverage of 14,998 barcodesd (4,000,000) | 2990e (∼100) | cassette ligation by in vitro recombination | transcription activation | Illuminac |
β-2 adrenergic receptor54 | 412 | >100× coverage of 7,828 protein variants | 7,800 (99.6) | cassette ligation with IIs restriction | Signal transduction | Subassemblyc |
Murine Kir2.132 | 391 | >300× coverage of 7,429 protein variants (>2,400,000) | 6,898 (92.9), 6,944 (93.5) | cassette ligation with IIs restriction | cell surface expression and cell depolarization | Illumina with Nextera amplicons |
SARS-CoV-2 spike protein (receptor-binding domain)55 | 288 | ≥54× coverage of 9,216 codon variants (≥500 000) | 3,999 (69.4) | overlap extension PCR | PPI with intrinsic surface display | Illumina with PCR amplicons |
BRCA156 | 191 | 8× coverage of 6,112 codon variants (50,000) | 1,056 (27.6) | inverse PCR | homology-directed repair | PacBio CCSc |
PTEN57 | 402 | 3× coverage of 12,864 codon variants (35,200) | 4,112 (53.8) | inverse PCR | intracellular protein abundance | PacBio CCSc |
CYP2C939 | 489 | NR | 6,821 (66.4) | inverse PCR | intracellular protein abundance | PacBio CCSc |
PTEN58 | 192 | 3× coverage of 6,144 codon variants (∼17 500) | 4,186e (68.1) | inverse PCR | intracellular protein abundance | PacBio CCSc |
VKOR59 | 162 | NR | 2,695 (87.6) | inverse PCR | intracellular protein abundance | Subassemblyc |
SCN5A60 | 12 | NR | 249 (98.8) | inverse PCR | triple-drug treatment sensitivity | Subassemblyc |
KCNH237 | 77 | NR | 1,603 (99.1) | inverse PCR | trafficking to the cell surface | Subassemblyc |
KCNH261 | 11 | NR | 220 (95.2) | inverse PCR | trafficking to the cell surface | Subassemblyc |
Rhodopsin62 | 47 | NR | 808 (90.5) | inverse PCR | cell surface expression | Illumina |
Rhodopsin63 | 47 | NR | 700 (78.4) | inverse PCR | cell surface expression | Illumina |
Sindbis virus structural polyprotein64 | 122 | NR | 4,530f (58.9) | nicking mutagenesis | −1 programmed ribosomal frameshifting | Illumina |
Mitochondrial ATP synthase65 | 21 | NR | 404 (96.2) | nicking mutagenesis | drug resistance | Illumina with Nextera amplicons |
NUDT1566 | 163 | NR | 2923 (94.4), 2935 (94.8) | commercial synthesis | intracellular protein abundance and drug sensitivity | PacBio CCSc |
aa, amino acid; NR, not reported; PPI, protein-protein interaction; BiFC, bimolecular fluorescence complementation; PacBio CCS, Pacific Biosciences circular consensus sequencing.
Additional single or multiple consecutive substitutions or deletions.
DNA variants.
Barcoded library.
5 barcodes for each of the 2,991 designed protein variants, 30 barcodes for wild-type AD1, and 25 barcodes for wild-type AD2 → 14,998 barcodes.
Additional rational mutation.
Codon variants.
Table 2.
Target | Search space | Variants | Assay | Sequencing |
---|---|---|---|---|
Saturation genome editing: Homology-directed repair with repair template library (repair template sublibraries based on gRNA; haploid target locusa) | ||||
Murine anti-HEL antibodies HEL23/HEL24 CDRH1/2/334 | 6–10 aa | 108–189 | PPI with mammalian surface display | Illumina sequencing of short variable region in endogenous locus (per sublibrary) |
Anti-HER2 CAR CDRH341 | 10 aa | 189 | PPI with intrinsic surface display and CAR T cell signaling | Illumina sequencing of short variable region in endogenous locus (per sublibrary) |
BRCA1 RING and BRCT domains67 | exons 2–5, 15–23 (1,342 bp) | 3,893 | survival | Illumina sequencing of short variable region in endogenous locus (per sublibrary) |
CARD11 CARD, LATCH, and N-terminal coiled-coil domains68 | exons 3–5 (143 aa) | 2,542 | survival and drug resistance | Illumina sequencing of short variable region in endogenous locus (per sublibrary) |
Saturation prime editing: Prime editing with pegRNA library (pegRNA sublibraries based on target region; haploid target locusb) | ||||
BRCA269 | 10 sites (140 bp) | 426 | survival | Illumina sequencing of short variable region in endogenous locus (per sublibrary) |
NPC169 | 16 sites (523 bp) | 978 | cholesterol accumulation | Illumina sequencing of short variable region in endogenous locus (per sublibrary) |
Base editing with tiling gRNA library: lentiviral transduction gRNA library at low multiplicity of infection | ||||
BRCA170 | endogenous locus | 660 | survival (drug-induced) | Illumina sequencing of genome-integrated gRNA representative for mutation introduced in endogenous gene |
BRCA1, BRCA271 | endogenous locus | NR | survival | Illumina sequencing of genome-integrated gRNA representative for mutation introduced in endogenous gene |
MCL1, BCL2L171 | endogenous locus | NR | drug sensitivity/resistance | Illumina sequencing of genome-integrated gRNA representative for mutation introduced in endogenous gene |
19 drug target genes71 | endogenous locus | NR | drug resistance | Illumina sequencing of genome-integrated gRNA representative for mutation introduced in endogenous gene |
3584 genes71 | endogenous locus | 52,034 | survival | Illumina sequencing of genome-integrated gRNA representative for mutation introduced in endogenous gene |
86 DNA damage response genes72 | endogenous locus | NR | survival upon DNA damage | Illumina sequencing of genome-integrated gRNA representative for mutation introduced in endogenous gene |
BRCA1, BRCA236 | endogenous locus | NR | survival | Illumina sequencing of genome-integrated gRNA representative for mutation introduced in endogenous gene |
MAP2K1, KRAS, NRAS73 | endogenous locus | NR | drug resistance | Illumina sequencing of genome-integrated gRNA representative for mutation introduced in endogenous gene |
Library generation through pooled in vitro mutagenesis
Oligonucleotide-directed mutagenesis
Variant libraries generated through error-prone PCR are largely inefficient in generating codon substitutions involving more than one nucleotide change. In addition, only a fraction of the library members contains a single codon substitution.7 Recent DMS studies17,32,58 therefore exclusively use oligonucleotide-directed mutagenesis (examples in Table 1), where variants result from the incorporation of variant oligonucleotides into the wild-type sequence. Single-site saturation variant libraries are achieved through randomization of the corresponding single codons in degenerate oligonucleotides17,25,50 or through specific single-codon substitutions in a designed oligonucleotide library synthesized on an array.32,49,65 In degenerate oligonucleotides, different modes of codon randomization influence the representation of each amino acid substitution linked to the degeneracy of the genetic code.74,75 In oligonucleotide arrays, each amino acid substitution is usually equally represented through one codon substitution, and additionally, codon usage can guide the choice of this codon.32 However, it could be more informative and comprehensive to look at all codon substitutions in studies related to translation, folding, and stability.76 Another advantage of oligonucleotide arrays involves the option to introduce insertions and deletions.15,36,49 Because oligonucleotide synthesis is prone to error with increasing length (100–200 bp), longer variable regions need to be subdivided into multiple oligonucleotide arrays that are usually separately synthesized and integrated.32 In addition, this allows for the generation of sublibraries covering partial variable regions compatible with Illumina sequencing read length. Degenerate oligonucleotides and oligonucleotide arrays require extensive experimental design and are typically commercially synthesized, making oligonucleotide-directed mutagenesis currently still relatively time consuming and expensive, but these hurdles will likely diminish in view of current evolutions in DNA synthesis.
PCR-based methods such as overlap extension PCR,17,55 megaprimer PCR,50 inverse PCR,49,57 nicking mutagenesis,64 and POPCode77 rely on the oligonucleotide library as mutagenic primers to amplify variants from the wild-type template, which is usually removed afterward using gel purification or an enzymatic treatment that is able to distinguish between mutant PCR products and wild-type templates. PCR amplification can result in representational biases in the final library linked to the difference in the annealing efficiency of codon substitutions with one, two, or three nucleotide changes. In addition, amplification errors can introduce secondary mutations. Therefore, some DMS studies introduce oligonucleotide libraries through cloning, referred to as cassette assembly, by type IIs restriction enzymes32 or by homology-directed in vitro recombination,15,16 avoiding interruption of the remaining wild-type sequence. Mutagenesis by integrated tiles (MITE) allows the pooled introduction of oligonucleotides covering non-overlapping regions through multiplexed in vitro recombination.46,78
Library size
After the introduction of the oligonucleotide library into the wild-type sequence and cloning the generated variants into the appropriate backbone, a transformation step in E. coli is required to generate the final plasmid variant library ready for delivery into the mammalian expression system. The number of colony-forming units retrieved after transformation dictates the eventual number of library members, referred to as the library size. Library sizes in oligonucleotide-directed mutagenesis are chosen based on the prerequisite to cover all possible (degenerate oligonucleotides) or designed (oligonucleotide array) variants a predefined number of times. This selected coverage influences whether all variants are present and represented sufficiently in the library to detect statistically significant effects after selection and sequencing. It has been observed from data available in exogenous DMS studies (Table 1) that coverages lower than 10 can lead to the absence or underrepresentation of a portion of the variants, reflected by the requirement to generate complementary fill-in libraries.58,57 Coverages higher than 10 allow the analysis of (nearly) all variants, depending on the lowest accepted variant representation (here referred to as representation threshold) in the library. One can appreciate that a higher coverage leads to the inclusion of more variants when a high representation threshold is strictly applied.55 However, it must also be kept in mind that a very high coverage can lead to redundancy, unnecessarily increasing the cost, time, and labor of downstream steps. This underscores the importance of careful experimental design in DMS studies.
Library introduction of a single variant per cell
Exogenous DMS studies rely on the pooled screening of plasmid variant libraries introduced in mammalian cells, where the variant function is linked to a selectable phenotype. Being able to confidently discern the functional consequence of each variant requires a tight genotype-phenotype link, requiring that the phenotype within a cell is caused by the presence of only one particular variant. In addition, the variants need to be stably expressed over the course of the experiment, and variants in different cells should have similar variant-independent (e.g., genomic context) expression profiles to allow proper comparison. Lastly, the introduction efficiency needs to be high to ensure cost-effective and practically feasible evaluation while ensuring sufficient coverage of the entire variant library. In general, the absence of the wild-type protein is required to properly evaluate the functional effect of variants. As the use of expression systems that endogenously express the protein of interest increases biological relevance, the wild-type protein needs to be inactivated or removed in knockout mammalian cell lines.36 Evidently, this strategy is not compatible with DMS studies of essential proteins. In contrast, when studying dominant-negative variant effects, co-expression of the wild-type protein is required. This can be achieved either through endogenous expression46 or through co-transfection79 of the wild-type protein. For mammalian cells, obtaining a single, stable variant per cell is not straightforward at all, and different approaches have been employed (Table 1). We discuss these below.
Epstein-Barr virus (EBV)-derived episomal plasmid transfection and retroviral transduction
Plasmid transfection leads to the transient introduction of large numbers of plasmids in each cell. The earliest DMS studies in mammalian cells cloned the variant library in EBV-derived episomal plasmids, which are maintained in the cell.80,81 Plasmid diversity was lost over time through antibiotic selection and passaging after transfection. This rather cumbersome approach was later adapted, where the variant library was diluted with a large excess of empty (EBV-derived episomal) plasmids, to acquire on average no more than one variant per cell20,43 (Figure 2A). Selection is typically performed 24 h post-transfection by FACS. This approach is easy, time and cost effective, and avoids the potential disruption of essential genomic sequences due to integration. However, the presence of one variant per cell is not guaranteed, as the transfection conditions are very hard to optimize, leading to noisy results or library underrepresentation.20
A better choice would be retroviral (lentiviral or ecotropic retroviral) transduction at a low multiplicity of infection (MOI), resulting in the stable integration of maximally one variant into each cell25,47,48,51 (Figure 2B). Transduced cells can be positively selected with antibiotics51 or sorted based on a co-expressed fluorescent protein.47 The pseudo-random nature of the integration82 leads to heterogeneous expression due to locus-specific effects,83 hampering proper comparison between variants. In addition, insertional mutagenesis can disrupt endogenous gene activity.84 The pseudo-diploid nature of retroviruses, where two RNA genomes are packaged, of which only one is integrated, implies that two variants can be co-packaged, leading to significant recombination and rendering retroviral transduction less compatible with barcoded libraries.85
Site-specific recombination
Heterogeneous variant expression and insertional mutagenesis can be prevented with targeted integration into a predefined locus through site-specific recombination, which is highly compatible with DMS platforms because of its high efficiency, specificity, and accuracy.86 A cassette containing the site-specific recombination site and, in most cases, other components such as a selection marker, referred to as the landing pad, first needs to be stably integrated into the host genome. Variant library integration is then accomplished by recombination between the recombination site in the landing pad and the recombination site on a donor plasmid carrying the variant library (Figure 2C). The one-variant-per-cell requirement is ensured by the presence of only one copy of the landing pad so that only one recombination event can occur. Although Cre and Flp are able to catalyze the site-specific integration of exogenous DNA into the mammalian genome, re-excision is kinetically favored,87 requiring antibiotic selection for stable integrative recombination.
Re-excision can be entirely avoided by recombination-mediated cassette exchange (RMCE), where the presence of two recombination site variants in a genomic locus allows the replacement of the intermediate sequence with a cassette flanked by the same sites while avoiding re-excision of this cassette due to incompatibility between these sites.88 RMCE suffers from lower efficiencies, as two recombination events are required for successful integration. Alternatively, the commercial Flp-In system uses a Flp mutant with limited activity in time to avoid the occurrence of re-excision and multiple integrations, but it also suffers from low recombination efficiencies due to thermolability.89 Low recombination efficiencies of both Cre-mediated RMCE and the Flp-In system necessitate antibiotic selection to recover the rare recombinants and require the transfection of high cell numbers for the introduction of large variant libraries in DMS studies to ensure that all variants are present and sufficiently represented.15,56
Serine phage integrases have relatively simple recombination sites, are independent of host-encoded co-factors, and mediate irreversible integration in the absence of their recombination directionality factor.87 It has been shown that Bxb1 is the most accurate and efficient integrase for the genome engineering of mammalian cells.90 Multiple groups have generated landing pad cell lines for Bxb1-mediated integration of large variant libraries in DMS studies.54,91,92 Recombinants are usually recovered through antibiotic selection, although the high recombination efficiencies eliminate the absolute requirement.93 In the absence of antibiotic selection, recombinants can be enriched before selection through sorting based on the loss of the expression of a first fluorescent protein in the landing pad and the gain of the expression of a second fluorescent protein in the recombination plasmid.91,93
Where to land?
The generation of a landing pad cell line for site-specific recombination of variant libraries in DMS studies raises two important questions: where to stably integrate the landing pad into the genome and how to ensure the presence of only one copy of the landing pad. The landing pad locus influences both variant expression and risk for insertional mutagenesis.94 Landing pad cell lines can be generated through random integration with a limiting amount of DNA56 or through low-MOI lentiviral transduction93,95 to favor one-copy integration. Antibiotic selection of the integrated landing pad via a selection marker is followed by clonal expansion. The clones are first tested for the presence of one copy and subsequently for high expression of a reporter gene from the landing pad.
An alternative approach is the more cumbersome targeted integration of the landing pad via homologous recombination, either spontaneous or induced via genome-editing strategies such as zinc-finger nucleases (ZFNs), transcription activator-like effector nucleases (TALENs), and CRISPR-Cas9 in so-called safe harbor loci. These intra- or intergenic regions in the human genome allow high, robust, and stable (no silencing) transgene expression without negative effects on endogenous gene activity.94 A comparative study of the three widely used safe harbor loci—AAVS1, CCR5, and Rosa26—showed that the AAVS1 locus allows the highest transgene expression with the highest homogeneity,96 justifying the generation of landing pad cell lines in the AAVS1 locus for DMS studies.91 Despite its established use as a safe harbor locus, studies have revealed several shortcomings of the AAVS1 locus, including silencing.94 The H11 locus, a more recently discovered and validated safe harbor locus, has also been used.54
Regardless of the method used to stably integrate the landing pad, ensuring one-copy integration is always required, as multiple copies are possible due to multiple integrations at one locus, integration at multiple alleles, or random integration. While Southern blotting has been used to validate single-copy insertion, since it typically requires the use of radioactively labeled probes and is more labor intensive, other validation assays such as inverse PCR95 and genotyping PCR,91 flow cytometry after the recombination of a 1:1 mixture of two fluorescent proteins,91 and Sanger sequencing of a degenerate barcode in the landing pad91 have been developed.
Library generation through large-scale genome editing of the endogenous locus
Saturation genome editing
Interrogating variants at the endogenous locus is accomplished through large-scale CRISPR-Cas9-based genome editing in CRISPR-based DMS studies67,69,71 (Table 2; Figure 2D). This provides the native genomic context for physiological protein levels and associated subcellular locations. Although this increases the biological relevance of the measured functional effects of the variants, one must take into account that functional assays are not readily compatible with endogenous expression when higher expression levels are required for a detectable readout or when the gene of interest needs to be fused to other proteins. In addition, barcoding the variants is not possible. Saturation genome editing (SGE) leverages CRISPR-Cas9-mediated homology-directed repair (HDR) to introduce every possible single-nucleotide or single-codon variant (SNV/SCV) in the endogenous gene. This is achieved by providing a library of repair templates containing all possible SNVs/SCVs in combination with the Cas9 protein and a matching tiling guide RNA (gRNA) collection.67,68
Challenges with SGE
The targeting of multiple regions along the sequence of the endogenous gene can be avoided by ensuring that each cell receives only one gRNA through the introduction of each gRNA with its corresponding repair template library separately. Since, in most cases, at least two alleles of the endogenous gene are present in commonly used cell lines, different mutations can be introduced in these alleles when a cell receives multiple repair templates or when non-homologous end joining (NHEJ), an alternative repair pathway, leads to insertion or deletion (indel) formation. Furthermore, haploid cells, either entirely (e.g., HAP1)67 or specifically for the endogenous gene,69 can be used. However, approaches using haploid cell lines are not compatible when co-expression of the wild-type protein is required, as in the case of haploinsufficiency, or when evaluating dominant-negative effects. This can be resolved by producing both high biallelic and silent HDR rates in a diploid cell line so that only one allele is edited.68
Editing efficiencies need to be high to ensure the presence of all possible variants multiple times. SGE is complicated by low HDR efficiency in mammalian cells, mainly due to competition with NHEJ.97 Other unwanted events include genomic rearrangements,98 head-to-tail concatemerization,99 and off-target editing.100 Cas9 can be mutated into a nicking enzyme (nCas9) that only cuts the complementary or the non-complementary/edited strand, increasing HDR efficiency and decreasing the presence of unwanted indels from NHEJ.101,102
Alternative editing methods
Base editing
Fusion of nCas9 to a cytidine or adenosine deaminase enzyme allows efficient transitions (C:G to T:A or A:T to G:C) in an approach called base editing.103,104 Base editing can be used in DMS studies through the introduction of a tiling gRNA library that is able to generate all compatible single transitions71 (Figure 2D). Each cell receives one gRNA through low-MOI lentiviral transduction of the gRNA library. Combined with the lower possibility of indel formation through NHEJ, this eliminates the need for haploid target loci to ensure the introduction of one mutation per cell. Efficient editing of all alleles is required when the presence of a wild-type allele leads to functional complementation, whereas low editing efficiencies are needed in the case of haploinsufficiency or when evaluating dominant-negative effects to ensure the presence of a wild-type allele. Two important limitations of base editing are linked to the restrictive deamination of target bases that fall within the editing window at a specific distance from an NGG PAM site.105 As a result, not all transitions within the gene of interest can be generated. In addition, careful gRNA design is critical to avoid the presence of multiple target bases within the editing window to ensure a single mutation in each cell, excluding certain transitions to be achievable and reducing the mutational space that can be covered. In line with these limitations, a recent DMS study was only able to evaluate 13% of all SNVs in ClinVar using a cytidine base editor.71 The variant space can be improved by the combined use of cytidine and adenosine base editors. It is estimated that such a combined approach can correct 27% of all pathogenic transition SNVs in ClinVar.105 The use of base editors that recognize non-NGG PAM sites or that have alternative editing windows can further increase this fraction by 2.5-fold.105,106,107 As opposed to the correction of pathogenic SNVs by base editors,105 estimates on the fraction of all transition SNVs in ClinVar that can be generated with base editors for CRISPR-based DMS studies have yet to be published.
Saturation prime editing
The introduction of all possible variants in a double-stranded break-free manner is possible with prime editing, where nCas9 is fused to a reverse transcriptase and a prime editing gRNA (pegRNA) that specifies the target site and encodes the desired edit.108 Saturation prime editing (SPE) for DMS, similar to SGE, introduces every possible SNV/SCV in the target DNA by providing a tiling pegRNA library69 (Figure 2D). The targeting of multiple regions along the sequence of the endogenous gene within one cell is avoided by introducing pegRNA sublibraries based on the target region. Due to the presence of multiple pegRNAs in one cell, different mutations can be introduced in each allele of the endogenous gene, and SPE, therefore, relies on haploid target loci. An alternative approach could encompass low-MOI lentiviral transduction of the entire pegRNA library, eliminating the need for haploid cells. The same considerations regarding the required absence/presence of a wild-type allele as in base editing apply. Current challenges in prime editing are related to the observed editing efficiencies that greatly vary between target loci and cell types.108,109,110,111 Both base editing and prime editing show lower off-target editing and NHEJ events than CRISPR-Cas9-mediated HDR.103,108
Library screening through functional selection
The library in the expression system is subsequently screened using a functional assay that couples the variant function to a selectable phenotype. This is usually either a fitness effect or a fluorescent signal, allowing selection through a proliferation assay or FACS, respectively.35 Most of the functional assays fall under one of three main categories: protein expression/stability, PPIs, and cell survival (see Tables 1 and 2 for examples). The study of expression/stability or PPIs of cell surface proteins typically relies on flow cytometry of labeled detection reagents (e.g., antibodies).17,32 The investigation of secreted and intracellular proteins is more challenging, either requiring display strategies for cell surface analysis53 through labeled detection reagents or fusions to fluorescent proteins for intracellular analysis.44,57
Proliferation assays change the frequency of each variant over time in the presence of selection pressure, either intrinsic67 or induced in the form of drugs or environmental conditions36,63 (Figure 1). The frequency change reflects the variant effect, as it directly correlates to the fitness linked to the variant function. This generates one or more selected populations depending on the experimental setup. For example, populations can be taken at different time points,49 or different types or strengths of induced selection pressure can be applied.36 FACS divides the library into two or more subpopulations based on the intensity of the fluorescent signal linked to the variant function. The resulting distribution of each variant over these subpopulations is consequently a reflection of the variant effect.
Quantification of variant function by sequencing after selection
Sequencing challenges
In the DMS workflow, screening through selection is followed by quantification of the variants within different populations through DNA sequencing to obtain variant frequencies and calculate variant functional scores as a measure for variant effect.112,113,114,115 Depending on the study, variants are sequenced from the population(s) after selection (=selected library) alone or from the population before selection (=input/naive library) as well. Genomic DNA20 or cDNA reverse transcribed from RNA54 is isolated from the cells, followed by PCR amplification of the variable region and sample preparation for NGS (e.g., Illumina sequencing). An additional sequencing step on the plasmid variant library before the introduction into the expression system can provide an overview of library complexity and quality. An important challenge is to obtain high-quality sequence data to reliably identify true mutations. Illumina sequencing has an average error rate between 0.1% and 1%.116,117 Paired-end sequencing can be used to generate double coverage, which reduces error rates. Because Illumina read lengths in paired-end sequencing can go up to 300 bp for each end of both reads, this limits the length of the variable region that is compatible with the sequencing platform to 300 bp (100 amino acids) for double coverage and 600 bp (200 amino acids) for single coverage. A further challenge stems from amplification errors and bias in PCR steps during library preparation for NGS.118 Moreover, PCR of long, highly similar sequences can also cause the formation of chimeras, resulting in the scrambling of variants that are lost for quantification. The potential introduction of amplification errors and bias can be minimized by using high-fidelity polymerases and minimizing the number of PCR steps and the number of amplification cycles within each PCR step. Amplification and sequencing errors can be estimated by the introduction of wild-type sequences before these steps or via the use of unique molecular identifiers (UMIs).119,120
Sequencing larger variable regions
Several strategies allow the compatibility of large variable regions with short-read Illumina sequencing. An option is to individually generate and evaluate multiple sublibraries, with each covering variants within a smaller region compatible with Illumina sequencing,49,121 significantly expanding the experimental setup in all stages of the DMS workflow. Alternatively, parallel amplicon sequencing (Figure 3A) subdivides the variable region isolated from the input/selected library into shorter fragments, which are subsequently sequenced in parallel. Assuming that each variant contains only one mutation, the sequencing read obtained from the fragment covering the mutation is sufficient for variant identification and counting. Sequencing reads that are completely wild type are disregarded, as they could originate from variants with mutations outside of the read. Amplicons can be generated either through PCR amplification17,43 or through random fragmentation,32,51,65 for example by Tn5 transposase tagmentation (Nextera). A downside to this is that the occasional variant with multiple mutations that are located in different fragments will be identified and counted as multiple single-mutant variants. An additional limitation relates to the inability to quantify wild-type sequences, commonly used as an internal control and for normalization during downstream analysis after sequencing. Another approach is mainly applied in CRISPR-based DMS studies using base editing, where variant sequencing relies on sequencing the short genome-integrated gRNAs, which are by design representative for the introduced mutation.36,72 This gRNA counting is a straightforward and commonly used readout in pooled CRISPR screens that exploit selectable phenotypes.122 Although this avoids the more complex amplicon sequencing of the long endogenous gene, non-programmed mutations cannot be detected. In addition, differences in gRNA editing efficiencies can mask the variant effect, and multiple target bases within the editing window can be edited but at different frequencies, leading to different mutations that are represented by the same gRNA.36
Barcoded libraries
In some DMS studies, each variant is coupled to a short unique sequence, referred to as a barcode. This allows each variant to be linked to multiple distinct barcodes, which serve as internal replicates for that particular variant.35,118 Multiple replicates of the wild-type or known variants can provide an estimation of variability in the final functional score. In addition, they allow the determination of representation thresholds, where replicates represented below a certain number are considered unreliable because their functional score is unable to reflect the expected variant function.
A sequencing step on the plasmid variant library before the introduction into the expression system links each barcode to its corresponding variant, generating a so-called barcode lookup table. Barcodes can be introduced in different ways. Most approaches use full degenerate sequences, where the length is chosen so that the number of possible sequences greatly exceeds the number of variants, ensuring that each variant is coupled to multiple unique sequences.53,54,57 Alternatively, barcodes can also be predefined into an oligonucleotide array by providing multiple spots for each variant, where the variant is coupled to a distinct short non-degenerate sequence in each spot.15 Sequencing of the appropriate populations after selection is reduced to the short barcodes only. Additional advantages relate to the less complicated sequencing and decreased chimera formation of the highly diverse barcodes.
Subassembly of barcoded libraries
DMS studies can benefit from the use of barcoded libraries when the variable region is longer than the Illumina read length.35,118 Different strategies collectively referred to as subassembly (Figure 3B) allow the coupling of long variable regions beyond Illumina sequencing length to their barcode in the first sequencing step by dividing the variable region into nested fragments that maintain their link to the barcode.118,123 Paired-end sequencing of these fragments generates one read for the barcode and one read for a part of the variable region. All partial variable region reads associated with one barcode are subsequently combined and assembled to reconstitute the full-length sequence of the variant. Nested barcoded fragments can be generated through tiling PCR37,124 or treatment with a double-strand exonuclease for different time intervals.125,126 In contrast to parallel amplicon sequencing (Figure 3A), the entire sequence of each variant is determined, providing complete information on each variant and allowing the quantification of wild-type sequences. Subassembly protocols are, however, complicated and labor intensive, requiring a lot of manipulations that can result in additional mutations and both inter- and intramolecular representational biases that influence the ability to perform comprehensive barcode-variant coupling.
A potentially exciting alternative approach is clonal sequencing, where the DNA molecules are physically separated into individual microfluidic droplets for fragmentation and subsequent tagging such that all fragments within one reaction chamber get the same unique tag before short-read sequencing.127 Combining reads based on these tags allows the reconstitution of the original DNA molecule.
Due to the advantages mentioned above regarding internal replication, the use of barcoded libraries is also beneficial for short variable regions. However, barcode-variant coupling is complicated when the distance between the short variable region within a large full-length sequence and the barcode cannot be bridged by Illumina reads. This can be resolved by bringing the barcode and variable region close together either by removing the intermediate sequences through restriction/ligation60,61 or by applying a cloning strategy that introduces the intermediate sequences after barcode-variant coupling.54
Long-read sequencing of barcoded libraries
With the advent of long-read sequencing technologies, such as Pacific Biosciences (PacBio) single-molecule real-time (SMRT) sequencing128 and Oxford Nanopore Technologies (ONT) nanopore sequencing,129 longer read lengths are now possible, albeit at the expense of accuracy. The intrinsically high error rate in SMRT sequencing130 can be resolved by increasing the coverage through multiple passes along the sense and antisense strands of a circularized template in circular consensus sequencing (CCS).131 Because of this, SMRT CCS has been widely applied for variant-barcode coupling in DMS studies, providing a more straightforward approach compared to subassembly.22,38,53,57 Despite improvements in nanopore sequencing,132 to our knowledge, this platform has not been used in DMS studies.
Data analysis
Functional score calculation
Fitness-based selection changes the number of cells representing each variant within a population, where this change is reflective of the effect of variant function on cell survival. Sorting-based selection subdivides the cells within a population into bins according to fluorescence intensity and exploits the effect of variant function on the distribution of their respective cells between these subpopulations. In both cases, this quantitative information is obtained through NGS of the variant DNA (or corresponding barcodes) from these cells and is used to calculate a functional score for each variant.
As a typical first step, a normalized enrichment score is calculated for each variant, which is essentially the ratio of observed counts after and before fitness-based selection for that variant relative to the ratio of observed counts after and before selection for the wild type. For sorting-based selection, the ratio of observed counts between the two bins (high and low fluorescence) is used.133 Consequently, values below and above 1 reflect depletion and enrichment compared to wild type, respectively. In the earlier methods, enrichment scores were evaluated using a two-sided Student’s t test (EMPIRIC112) or a Poisson t test corrected for multiple testing (Enrich113). When more than two time points (fitness) or bins (sorting) are sampled to improve the scoring of variants with more subtle effects, a regression-based approach can be used instead.134 The more recent approaches add improved estimates for variant error through their more advanced statistical frameworks applied to log-transformed enrichment scores (Enrich2,114 DiMSum115).
The detailed estimates of the uncertainty surrounding each enrichment score in Enrich2 and DiMSum support the accurate evaluation of variant effects, an important aspect in the context of VUS classification. The informative power of functional scores is highly dependent on their quality, where the pooled nature of DMS studies contributes to the presence and complicates the identification of noise in the data.115 An important source of noise in DMS studies is the inaccurate scoring of variants with low total counts due to cell-to-cell variability and experimental errors.114,115,133 Furthermore, several steps within the DMS workflow linked to sequencing, e.g., inefficient library construction, transfection, and DNA extraction, can restrict the library size or complexity, which can lead to the introduction of random variability.115
Data quality can be estimated through the evaluation of the functional scores for wild-type, synonymous, and/or nonsense variants.46,51,67 Performing replicate experiments to assess reproducibility and calculate mean functional scores is highly warranted.57 Internal replicates through multiple barcodes for each variant15 and/or codon variants that result in the same amino acid change48 allow the assessment of variability and provide information on the minimal total counts required for accurate functional scoring. The field will benefit significantly from the introduction of consistent standards on assessing and reporting data quality in DMS studies. This is also an absolute requirement for clinical use of DMS data.
Data interpretation
Functional scores are commonly visualized using a color gradient in a 2D heatmap, where one axis represents the position within the protein sequence and the other axis the different amino acid substitutions, providing a descriptive overview of the mutational landscape (Figure 1). DMS data can be deposited in the public repository MaveDB (https://www.mavedb.org/), which also provides the MaveVis tool for the generation of heatmaps from one’s own or published DMS data and their integration with information on secondary structure, surface accessibility, interaction interfaces, and evolutionary conservation.135 Other tools such as dms-view allow the interpretation of DMS data in the context of the protein 3D structure.136 More recently, a neural-network-like approach called Mave-NN was introduced to model genotype-phenotype maps.137
DMS in the clinic
Engineering of therapeutic proteins
An important biomedical application of DMS is the engineering of therapeutic proteins.138 DMS in phage and yeast has been widely used for the affinity maturation of antibodies, chimeric antigen receptors (CARs), peptides/small proteins, and decoy receptors for the treatment of viral infections,139,140 parasitic infections,141 cancer,18,39 and other diseases.142 DMS is also commonly used for antibody epitope mapping19,143 and for the identification of potential escape variants in the context of viral infections.144 A third application encompasses the engineering of improved vaccine immunogens.23 Although lower expression systems such as phage and yeast provide a considerably higher throughput, mammalian cells allow the evaluation of the final therapeutic format (i.e., glycosylation) in the proper physiological environment, obviating the need for an additional optimization step.34 DMS in mammalian cells has therefore been applied for the affinity maturation of therapeutic proteins such as CARs41 and decoy receptors.17,40
Resolving VUSs
NGS has emerged as a widely adopted technology for genetic testing in clinical practice to identify germline (inherited diseases and cancer predisposition) or somatic (sporadic tumors) mutations that influence disease risk, prognosis, and drug response to guide preventive and therapeutic options.29 However, this has often created more questions than answers by revealing an overwhelming number of VUSs. The VUS problem is large and exponentially growing, even for well-studied disease genes.30 This is reflected by the observation that of the over 500,000 clinical missense variants currently in the ClinVar database,145 almost 74% are VUSs (data as of October 18th, 2023). They hamper proper interpretation and reporting by physicians.146,147 Classification of VUSs through genetic association studies using population data is complicated by their rare nature.148,149 These observations underscore the need for alternative approaches that are able to classify VUSs and consequently identify important pathogenic variants while eliminating unnecessary prudence around benign variants.
The American College of Medical Genetics and Genomics (ACMG)/Association for Molecular Pathology (AMP) established internationally accepted guidelines for the classification of VUSs using multiple evidence types in addition to population data, including in silico prediction and functional studies.150 They introduced the strong evidence codes PS3 and BS3 for “well-established” in vivo or in vitro functional studies that demonstrate a damaging or non-damaging variant effect, respectively.150 Functional studies are considered well established when they are biologically relevant, reproducible, and robust in a clinical diagnostic laboratory, where different evidence strengths based on the evaluated clinical validity of the functional data should be implemented.151 Despite large differences between datasets, clinical evaluation of DMS data has shown high potential for the classification of VUSs in pilot studies.30,31,32 This further underscores the importance of proper study design and high data quality in DMS studies for clinical variant interpretation and the need to establish corresponding standards and guidelines. Taken together, this shows the exciting potential of DMS to advance the diagnosis and treatment of inherited diseases and cancer, in addition to providing fundamental insights in protein (dys)function in disease.
Future perspectives
Toward the study of larger libraries
One of the shortcomings of DMS in mammalian cells is the limited library size that can be evaluated when compared to phage display or expression systems in bacteria or yeast. Although an obvious solution would be to increase the cell numbers to cover larger libraries, advancements in cell culture are required to make such experiments practically feasible. Therefore, the current focus is on further maximizing the screening efficiency in mammalian cells by improvements in plasmid variant library delivery systems93 and genome-editing approaches.34 Such improvements could involve an optimized recombination site sequence that increases site-specific recombination efficiency.152 The ability to screen larger libraries will allow the expansion of the search space toward larger, full-length proteins. In addition, larger libraries imply a larger representation of each variant during screening, which increases the statistical significance of the obtained functional scores. Larger library sizes would also allow the examination of variants with multiple mutations to assess synergy or to advance antibody engineering. Currently, this can only be accomplished by performing sequential DMS screens.
The search for compatible and biologically relevant readouts
As discussed above, current approaches for the functional evaluation of protein variants mainly rely on protein expression/stability, PPIs, and cell survival. These approaches can be used to gain many, but certainly not all, mechanistic insights. A bottleneck in DMS studies is related to establishing a functional assay that generates a selectable phenotype compatible with fitness- or sorting-based bulk selection. Furthermore, many functional assays are incompatible with endogenous expression. This is further underscored by the observation that some functional assays, even in mammalian cells, are unable to faithfully recapitulate variant effects in vivo. For example, loss-of-function (LOF) variants of BRCA1/2 in CRISPR-based DMS studies induce haploid cell death, whereas they have the opposite effect in vivo.67,71 In contrast, an exogenous DMS study of BRCA1 identified LOF variants through their inability to mediate HDR in a newly developed reporter assay that accurately reflected the effect of known variants.56 This and other examples15,16,25,32,54 reveal the potential of existing or newly developed protein-specific reporter assays (e.g., enzymatic function, transcriptional activity, signal transduction) that are preferentially compatible with endogenous expression in DMS studies.
As proteins exert different functional properties, distinct molecular mechanisms can underly the functional effects of variants. It could therefore be more informative to perform multiple screens combined with distinct functional assays on a particular protein variant library, increasing, in this way, the biological relevance by combining the obtained results. One could analyze protein expression or multiple PPIs in parallel and/or evaluate the roles in different signaling pathways using distinct reporter assays.32,41,66,59 As an example, current mammalian DMS studies where the protein variants are expressed on the cell surface allow simultaneous evaluation of surface expression and extracellular PPIs through distinct fluorescent labeling for FACS.17 These combined experiments allow us to make the distinction between variants that lose a particular interaction due to changes in either stability or affinity. This is significant for the classification of VUSs in human disease proteins. Such studies would benefit from mammalian DMS platforms that are easily adaptable for compatibility with different functional readouts.
Guidelines for reporting
Owing to the complexity and the many critical aspects regarding data quality, the DMS field will benefit from general guidelines regarding the minimal information that should be provided much in the same way as experiments involving expression analysis (MIQE)153 or flow cytometry (MIFlowCyt)154 In addition, the use of data repositories such as MaveDB135 should be encouraged. Related to this, making variant libraries available to the research community (by, for instance, depositing with a non-profit distributor such as Addgene) may increase the chances of comparing and complementing different DMS studies on the same protein, helping to increase reproducibility and boosting time efficiency.
Advancing DMS by harnessing single-cell technologies
An important evolution in the DMS field can be expected with the implementation of droplet microfluidics in the sequencing127 and selection155,156 steps. The use of droplets could facilitate DMS studies of non-cell-autonomous factors, where genotype and phenotype are not directly linked within one cell.157 Furthermore, microfluidics could be combined with alternative readouts such as (targeted) single-cell RNA sequencing (RNA-seq) similar to perturb-seq158,159 in both exogenous and CRISPR-based DMS studies to directly evaluate variant effects on gene expression. Sorting-based selection of variant libraries using visual phenotypes such as protein localization and (sub)cellular morphology is possible through selective illumination of a photoconvertible fluorescent protein (e.g., Dendra2).160,161 Potential alternative strategies include selective laser ablation162 or cell-specific transfection163 with a fluorescent protein, cell surface marker, or antibiotic resistance gene. As current mammalian DMS studies are performed in tissue culture, development toward in vivo DMS in model organisms could provide a more representative physiological environment, increasing the biological relevance of the obtained variant effects. Although in vivo CRISPR screen strategies164 in mice are promising, fulfilling the one-variant-per-cell requirement will be challenging. Combination with Bxb1-mediated transgenesis165 could be an attractive approach to resolve this issue.
Concluding remarks
Recent years have seen significant improvements throughout the complete DMS workflow for mammalian cells. The field is clearly expanding rapidly with many interesting and diverse applications (Table 1) while also setting the stage for solving the VUS challenge that widespread sequencing is posing for healthcare. Indeed, comprehensive genotype-phenotype functional maps as generated by DMS can further unlock the benefits of genetic screening in both patients and healthy individuals. The ability to perform these studies in a mammalian cell context is an important advantage to ensure broader and more relevant functional assessment of variants. The current advances in single-cell (sequencing) technology, automation, and machine learning are already being integrated in DMS approaches, promising a rich future for this exciting field.
Acknowledgments
We apologize to research groups whose work we could not cite here due to space limitations and the scope of this review article. This work was supported by strategic basic research fellowships of the Research Foundation – Flanders (FWO) (grant numbers 1S11017N to S.M. and 1S81320N to N.D.), by an Excellence of Science (EOS) joint project of the Fund for Scientific Research (F.R.S.-FNRS) and FWO (G0H7518N EOS ID: 30981113 to F.P.), and by FWO project funding (G042918 to S.E.).
Declaration of interests
The authors declare no competing interests.
References
- 1.Vyncke L., Bovijn C., Pauwels E., Van Acker T., Ruyssinck E., Burg E., Tavernier J., Peelman F. Reconstructing the TIR Side of the Myddosome: A Paradigm for TIR-TIR Interactions. Structure. 2016;24:437–447. doi: 10.1016/j.str.2015.12.018. [DOI] [PubMed] [Google Scholar]
- 2.Uyttendaele I., Lavens D., Catteeuw D., Lemmens I., Bovijn C., Tavernier J., Peelman F. Random mutagenesis MAPPIT analysis identifies binding sites for Vif and Gag in both cytidine deaminase domains of Apobec3G. PLoS One. 2012;7 doi: 10.1371/journal.pone.0044143. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Vincent O., Gutierrez-Nogués A., Trejo-Herrero A., Navas M.-A. A novel reverse two-hybrid method for the identification of missense mutations that disrupt protein–protein binding. Sci. Rep. 2020;10 doi: 10.1038/s41598-020-77992-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Junqueira D., Cilenti L., Musumeci L., Sedivy J.M., Zervos A.S. Random mutagenesis of PDZOmi domain and selection of mutants that specifically bind the Myc proto-oncogene and induce apoptosis. Oncogene. 2003;22:2772–2781. doi: 10.1038/sj.onc.1206359. [DOI] [PubMed] [Google Scholar]
- 5.Groot-Kormelink P.J., Ferrand S., Kelley N., Bill A., Freuler F., Imbert P.-E., Marelli A., Gerwin N., Sivilotti L.G., Miraglia L., et al. High Throughput Random Mutagenesis and Single Molecule Real Time Sequencing of the Muscle Nicotinic Acetylcholine Receptor. PLoS One. 2016;11 doi: 10.1371/journal.pone.0163129. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Fowler D.M., Fields S. Deep mutational scanning: A new style of protein science. Nat. Methods. 2014;11:801–807. doi: 10.1038/nmeth.3027. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Kircher M., Xiong C., Martin B., Schubach M., Inoue F., Bell R.J.A., Costello J.F., Shendure J., Ahituv N. Saturation mutagenesis of twenty disease-associated regulatory elements at single base-pair resolution. Nat. Commun. 2019;10:3583. doi: 10.1038/s41467-019-11526-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Zhang L., Sarangi V., Moon I., Yu J., Liu D., Devarajan S., Reid J.M., Kalari K.R., Wang L., Weinshilboum R. CYP2C9 and CYP2C19: Deep Mutational Scanning and Functional Characterization of Genomic Missense Variants. Clin. Transl. Sci. 2020;13:727–742. doi: 10.1111/cts.12758. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Olson C.A., Wu N.C., Sun R. A comprehensive biophysical description of pairwise epistasis throughout an entire protein domain. Curr. Biol. 2014;24:2643–2651. doi: 10.1016/j.cub.2014.09.072. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Sarkisyan K.S., Bolotin D.A., Meer M.V., Usmanova D.R., Mishin A.S., Sharonov G.V., Ivankov D.N., Bozhanova N.G., Baranov M.S., Soylemez O., et al. Local fitness landscape of the green fluorescent protein. Nature. 2016;533:397–401. doi: 10.1038/nature17995. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Starr T.N., Picton L.K., Thornton J.W. Alternative evolutionary histories in the sequence space of an ancient protein. Nature. 2017;549:409–413. doi: 10.1038/nature23902. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Diss G., Lehner B. The genetic landscape of a physical interaction. Elife. 2018;7 doi: 10.7554/eLife.32472. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Veerapandian V., Ackermann J.O., Srivastava Y., Malik V., Weng M., Yang X., Jauch R. Directed Evolution of Reprogramming Factors by Cell Selection and Sequencing. Stem Cell Rep. 2018;11:593–606. doi: 10.1016/j.stemcr.2018.07.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Tycko J., DelRosso N., Hess G.T., Aradhana, Banerjee A., Banerjee A., Mukund A., Van M.V., Ego B.K., Yao D., Spees K., et al. High-Throughput Discovery and Characterization of Human Transcriptional Effectors. Cell. 2020;183:2020–2035.e16. doi: 10.1016/j.cell.2020.11.024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Staller M.V., Ramirez E., Kotha S.R., Holehouse A.S., Pappu R.V., Cohen B.A. Directed mutational scanning reveals a balance between acidic and hydrophobic residues in strong human activation domains. Cell Syst. 2022;13:334–345.e5. doi: 10.1016/j.cels.2022.01.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Katrekar D., Xiang Y., Palmer N., Saha A., Meluzzi D., Mali P. Comprehensive interrogation of the ADAR2 deaminase domain for engineering enhanced RNA editing activity and specificity. Elife. 2022;11 doi: 10.7554/eLife.75555. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Chan K.K., Dorosky D., Sharma P., Abbasi S.A., Dye J.M., Kranz D.M., Herbert A.S., Procko E. Engineering human ACE2 to optimize binding to the spike protein of SARS coronavirus 2. Science (New York, N.Y.) 2020;369:1261–1265. doi: 10.1126/science.abc0870. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Sharma P., Marada V.V.V.R., Cai Q., Kizerwetter M., He Y., Wolf S.P., Schreiber K., Clausen H., Schreiber H., Kranz D.M. Structure-guided engineering of the affinity and specificity of CARs against Tn-glycopeptides. Proc. Natl. Acad. Sci. USA. 2020;117:15148–15159. doi: 10.1073/pnas.1920662117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Garrett M.E., Itell H.L., Crawford K.H.D., Basom R., Bloom J.D., Overbaugh J. Phage-DMS: A Comprehensive Method for Fine Mapping of Antibody Epitopes. iScience. 2020;23 doi: 10.1016/j.isci.2020.101622. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Heredia J.D., Park J., Brubaker R.J., Szymanski S.K., Gill K.S., Procko E. Mapping Interaction Sites on Human Chemokine Receptors by Deep Mutational Scanning. J. Immunol. 2018;200:3825–3839. doi: 10.4049/jimmunol.1800343. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Sun Z., Palzkill T. Deep Mutational Scanning Reveals the Active-Site Sequence Requirements for the Colistin Antibiotic Resistance Enzyme MCR-1. mBio. 2021;12 doi: 10.1128/mBio.02776-21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Starr T.N., Greaney A.J., Hilton S.K., Ellis D., Crawford K.H.D., Dingens A.S., Navarro M.J., Bowen J.E., Tortorici M.A., Walls A.C., et al. Deep Mutational Scanning of SARS-CoV-2 Receptor Binding Domain Reveals Constraints on Folding and ACE2 Binding. Cell. 2020;182:1295–1310.e20. doi: 10.1016/j.cell.2020.08.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Leonard A.C., Weinstein J.J., Steiner P.J., Erbse A.H., Fleishman S.J., Whitehead T.A. Stabilization of the SARS-CoV-2 receptor binding domain by protein core redesign and deep mutational scanning. Protein Eng. Des. Sel. 2022;35:gzac002. doi: 10.1093/protein/gzac002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Starr T.N., Greaney A.J., Hannon W.W., Loes A.N., Hauser K., Dillen J.R., Ferri E., Farrell A.G., Dadonaite B., McCallum M., et al. Shifting mutational constraints in the SARS-CoV-2 receptor-binding domain during viral evolution. Science (New York, N.Y.) 2022;377:420–424. doi: 10.1126/science.abo7896. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Tenthorey J.L., Young C., Sodeinde A., Emerman M., Malik H.S. Mutational resilience of antiviral restriction favors primate TRIM5α in host-virus evolutionary arms races. Elife. 2020;9 doi: 10.7554/eLife.59988. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Zhang L., Sarangi V., Liu D., Ho M.-F., Grassi A.R., Wei L., Moon I., Vierkant R.A., Larson N.B., Lazaridis K.N., et al. ACE2 and TMPRSS2 SARS-CoV-2 infectivity genes: deep mutational scanning and characterization of missense variants. Hum. Mol. Genet. 2022;31:4183–4192. doi: 10.1093/hmg/ddac157. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Lei R., Hernandez Garcia A., Tan T.J.C., Teo Q.W., Wang Y., Zhang X., Luo S., Nair S.K., Peng J., Wu N.C. Mutational fitness landscape of human influenza H3N2 neuraminidase. Cell Rep. 2023;42 doi: 10.1016/j.celrep.2022.111951. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Lee J.M., Huddleston J., Doud M.B., Hooper K.A., Wu N.C., Bedford T., Bloom J.D. Deep mutational scanning of hemagglutinin helps predict evolutionary fates of human H3N2 influenza variants. Proc. Natl. Acad. Sci. USA. 2018;115:E8276–E8285. doi: 10.1073/pnas.1806133115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Shendure J., Findlay G.M., Snyder M.W. Genomic Medicine-Progress, Pitfalls, and Promise. Cell. 2019;177:45–57. doi: 10.1016/j.cell.2019.02.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Fayer S., Horton C., Dines J.N., Rubin A.F., Richardson M.E., McGoldrick K., Hernandez F., Pesaran T., Karam R., Shirts B.H., et al. Closing the gap: Systematic integration of multiplexed functional data resolves variants of uncertain significance in BRCA1, TP53, and PTEN. Am. J. Hum. Genet. 2021;108:2248–2258. doi: 10.1016/j.ajhg.2021.11.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Scott A., Hernandez F., Chamberlin A., Smith C., Karam R., Kitzman J.O. Saturation-scale functional evidence supports clinical variant interpretation in Lynch syndrome. Genome Biol. 2022;23:266. doi: 10.1186/s13059-022-02839-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Coyote-Maestas W., Nedrud D., He Y., Schmidt D. Determinants of trafficking, conduction, and disease within a K+ channel revealed through multiparametric deep mutational scanning. Elife. 2022;11 doi: 10.7554/eLife.76903. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.McShan A.C., Devlin C.A., Morozov G.I., Overall S.A., Moschidi D., Akella N., Procko E., Sgourakis N.G. TAPBPR promotes antigen loading on MHC-I molecules using a peptide trap. Nat. Commun. 2021;12:3174. doi: 10.1038/s41467-021-23225-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Mason D.M., Weber C.R., Parola C., Meng S.M., Greiff V., Kelton W.J., Reddy S.T. High-throughput antibody engineering in mammalian cells by CRISPR/Cas9-mediated homology-directed mutagenesis. Nucleic Acids Res. 2018;46:7436–7449. doi: 10.1093/nar/gky550. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Gasperini M., Starita L., Shendure J. The power of multiplexed functional analysis of genetic variants. Nat. Protoc. 2016;11:1782–1787. doi: 10.1038/nprot.2016.135. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Huang C., Li G., Wu J., Liang J., Wang X. Identification of pathogenic variants in cancer genes using base editing screens with editing efficiency correction. Genome Biol. 2021;22:80. doi: 10.1186/s13059-021-02305-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Ng C.-A., Ullah R., Farr J., Hill A.P., Kozek K.A., Vanags L.R., Mitchell D.W., Kroncke B.M., Vandenberg J.I. A massively parallel assay accurately discriminates between functionally normal and abnormal variants in a hotspot domain of KCNH2. Am. J. Hum. Genet. 2022;109:1208–1216. doi: 10.1016/j.ajhg.2022.05.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.An L., Wang Y., Wu G., Wang Z., Shi Z., Liu C., Wang C., Yi M., Niu C., Duan S., et al. Defining the sensitivity landscape of EGFR variants to tyrosine kinase inhibitors. Transl. Res. 2023;255:14–25. doi: 10.1016/j.trsl.2022.11.002. [DOI] [PubMed] [Google Scholar]
- 39.Amorosi C.J., Chiasson M.A., McDonald M.G., Wong L.H., Sitko K.A., Boyle G., Kowalski J.P., Rettie A.E., Fowler D.M., Dunham M.J. Massively parallel characterization of CYP2C9 variant enzyme activity and abundance. Am. J. Hum. Genet. 2021;108:1735–1751. doi: 10.1016/j.ajhg.2021.07.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Park J., Gill K.S., Aghajani A.A., Heredia J.D., Choi H., Oberstein A., Procko E. Engineered receptors for human cytomegalovirus that are orthogonal to normal human biology. PLoS Pathog. 2020;16 doi: 10.1371/journal.ppat.1008647. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Di Roberto R.B., Castellanos-Rueda R., Frey S., Egli D., Vazquez-Lombardi R., Kapetanovic E., Kucharczyk J., Reddy S.T. A Functional Screening Strategy for Engineering Chimeric Antigen Receptors with Reduced On-Target, Off-Tumor Activation. Mol. Ther. 2020;28:2564–2576. doi: 10.1016/j.ymthe.2020.08.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Wu N.C., Qi H. Application of Deep Mutational Scanning in Hepatitis C Virus. Methods Mol. Biol. 2019;1911:183–190. doi: 10.1007/978-1-4939-8976-8_12. [DOI] [PubMed] [Google Scholar]
- 43.Heredia J.D., Park J., Choi H., Gill K.S., Procko E. Conformational Engineering of HIV-1 Env Based on Mutational Tolerance in the CD4 and PG16 Bound States. J. Virol. 2019;93:e00219-19. doi: 10.1128/JVI.00219-19. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.McShan A.C., Devlin C.A., Overall S.A., Park J., Toor J.S., Moschidi D., Flores-Solis D., Choi H., Tripathi S., Procko E., Sgourakis N.G. Molecular determinants of chaperone interactions on MHC-I for folding and antigen repertoire selection. Proc. Natl. Acad. Sci. USA. 2019;116:25602–25613. doi: 10.1073/pnas.1915562116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Park J., Selvam B., Sanematsu K., Shigemura N., Shukla D., Procko E. Structural architecture of a dimeric class C GPCR based on co-trafficking of sweet taste receptor subunits. J. Biol. Chem. 2019;294:4759–4774. doi: 10.1074/jbc.RA118.006173. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Giacomelli A.O., Yang X., Lintner R.E., McFarland J.M., Duby M., Kim J., Howard T.P., Takeda D.Y., Ly S.H., Kim E., et al. Mutational processes shape the landscape of TP53 mutations in human cancer. Nat. Genet. 2018;50:1381–1387. doi: 10.1038/s41588-018-0204-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Trenker R., Wu X., Nguyen J.V., Wilcox S., Rubin A.F., Call M.E., Call M.J. Human and viral membrane-associated E3 ubiquitin ligases MARCH1 and MIR2 recognize different features of CD86 to downregulate surface expression. J. Biol. Chem. 2021;297 doi: 10.1016/j.jbc.2021.100900. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Bridgford J.L., Lee S.M., Lee C.M.M., Guglielmelli P., Rumi E., Pietra D., Wilcox S., Chhabra Y., Rubin A.F., Cazzola M., et al. Novel drivers and modifiers of MPL-dependent oncogenic transformation identified by deep mutational scanning. Blood. 2020;135:287–292. doi: 10.1182/blood.2019002561. [DOI] [PubMed] [Google Scholar]
- 49.Kotler E., Shani O., Goldfeld G., Lotan-Pompan M., Tarcic O., Gershoni A., Hopf T.A., Marks D.S., Oren M., Segal E. A Systematic p53 Mutation Library Links Differential Functional Impact to Cancer Mutation Pattern and Evolutionary Conservation. Mol. Cell. 2018;71:178–190.e8. doi: 10.1016/j.molcel.2018.06.012. [DOI] [PubMed] [Google Scholar]
- 50.Jia X., Burugula B.B., Chen V., Lemons R.M., Jayakody S., Maksutova M., Kitzman J.O. Massively parallel functional testing of MSH2 missense variants conferring Lynch syndrome risk. Am. J. Hum. Genet. 2021;108:163–175. doi: 10.1016/j.ajhg.2020.12.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Kwon J.J., Hajian B., Bian Y., Young L.C., Amor A.J., Fuller J.R., Fraley C.V., Sykes A.M., So J., Pan J., et al. Structure-function analysis of the SHOC2-MRAS-PP1C holophosphatase complex. Nature. 2022;609:408–415. doi: 10.1038/s41586-022-04928-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Hanzl A., Casement R., Imrichova H., Hughes S.J., Barone E., Testa A., Bauer S., Wright J., Brand M., Ciulli A., Winter G.E. Functional E3 ligase hotspots and resistance mechanisms to small-molecule degraders. Nat. Chem. Biol. 2023;19:323–333. doi: 10.1038/s41589-022-01177-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Frank F., Keen M.M., Rao A., Bassit L., Liu X., Bowers H.B., Patel A.B., Cato M.L., Sullivan J.A., Greenleaf M., et al. Deep mutational scanning identifies SARS-CoV-2 Nucleocapsid escape mutations of currently available rapid antigen tests. Cell. 2022;185:3603–3616.e13. doi: 10.1016/j.cell.2022.08.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Jones E.M., Lubock N.B., Venkatakrishnan A.J., Wang J., Tseng A.M., Paggi J.M., Latorraca N.R., Cancilla D., Satyadi M., Davis J.E., et al. Structural and functional characterization of G protein-coupled receptors with deep mutational scanning. Elife. 2020;9:1548955. doi: 10.7554/eLife.54895. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Ouyang W.O., Tan T.J.C., Lei R., Song G., Kieffer C., Andrabi R., Matreyek K.A., Wu N.C. Probing the biophysical constraints of SARS-CoV-2 spike N-terminal domain using deep mutational scanning. Sci. Adv. 2022;8:eadd7221. doi: 10.1126/sciadv.add7221. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Starita L.M., Islam M.M., Banerjee T., Adamovich A.I., Gullingsrud J., Fields S., Shendure J., Parvin J.D. A Multiplex Homology-Directed DNA Repair Assay Reveals the Impact of More Than 1,000 BRCA1 Missense Substitution Variants on Protein Function. Am. J. Hum. Genet. 2018;103:498–508. doi: 10.1016/j.ajhg.2018.07.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Matreyek K.A., Starita L.M., Stephany J.J., Martin B., Chiasson M.A., Gray V.E., Kircher M., Khechaduri A., Dines J.N., Hause R.J., et al. Multiplex assessment of protein variant abundance by massively parallel sequencing. Nat. Genet. 2018;50:874–882. doi: 10.1038/s41588-018-0122-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Matreyek K.A., Stephany J.J., Ahler E., Fowler D.M. Integrating thousands of PTEN variant activity and abundance measurements reveals variant subgroups and new dominant negatives in cancers. Genome Med. 2021;13:165. doi: 10.1186/s13073-021-00984-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Chiasson M.A., Rollins N.J., Stephany J.J., Sitko K.A., Matreyek K.A., Verby M., Sun S., Roth F.P., DeSloover D., Marks D.S., et al. Multiplexed measurement of variant abundance and activity reveals VKOR topology, active site and human variant impact. Elife. 2020;9:e58026. doi: 10.7554/eLife.58026. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Glazer A.M., Kroncke B.M., Matreyek K.A., Yang T., Wada Y., Shields T., Salem J.-E., Fowler D.M., Roden D.M. Deep Mutational Scan of an SCN5A Voltage Sensor. Circ. Genom. Precis. Med. 2020;13 doi: 10.1161/CIRCGEN.119.002786. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Kozek K.A., Glazer A.M., Ng C.-A., Blackwell D., Egly C.L., Vanags L.R., Blair M., Mitchell D., Matreyek K.A., Fowler D.M., et al. High-throughput discovery of trafficking-deficient variants in the cardiac potassium channel KV11.1. Heart Rhythm. 2020;17:2180–2189. doi: 10.1016/j.hrthm.2020.05.041. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Penn W.D., McKee A.G., Kuntz C.P., Woods H., Nash V., Gruenhagen T.C., Roushar F.J., Chandak M., Hemmerich C., Rusch D.B., et al. Probing biophysical sequence constraints within the transmembrane domains of rhodopsin by deep mutational scanning. Sci. Adv. 2020;6 doi: 10.1126/sciadv.aay7505. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.McKee A.G., Kuntz C.P., Ortega J.T., Woods H., Most V., Roushar F.J., Meiler J., Jastrzebska B., Schlebach J.P. Systematic profiling of temperature- and retinal-sensitive rhodopsin variants by deep mutational scanning. J. Biol. Chem. 2021;297 doi: 10.1016/j.jbc.2021.101359. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Carmody P.J., Zimmer M.H., Kuntz C.P., Harrington H.R., Duckworth K.E., Penn W.D., Mukhopadhyay S., Miller T.F., Schlebach J.P. Coordination of -1 programmed ribosomal frameshifting by transcript and nascent chain features revealed by deep mutational scanning. Nucleic Acids Res. 2021;49:12943–12954. doi: 10.1093/nar/gkab1172. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Reisman B.J., Guo H., Ramsey H.E., Wright M.T., Reinfeld B.I., Ferrell P.B., Sulikowski G.A., Rathmell W.K., Savona M.R., Plate L., et al. Apoptolidin family glycomacrolides target leukemia through inhibition of ATP synthase. Nat. Chem. Biol. 2022;18:360–367. doi: 10.1038/s41589-021-00900-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Suiter C.C., Moriyama T., Matreyek K.A., Yang W., Scaletti E.R., Nishii R., Yang W., Hoshitsuki K., Singh M., Trehan A., et al. Massively parallel variant characterization identifies NUDT15 alleles associated with thiopurine toxicity. Proc. Natl. Acad. Sci. USA. 2020;117:5394–5401. doi: 10.1073/pnas.1915680117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Findlay G.M., Daza R.M., Martin B., Zhang M.D., Leith A.P., Gasperini M., Janizek J.D., Huang X., Starita L.M., Shendure J. Accurate classification of BRCA1 variants with saturation genome editing. Nature. 2018;562:217–222. doi: 10.1038/s41586-018-0461-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Meitlis I., Allenspach E.J., Bauman B.M., Phan I.Q., Dabbah G., Schmitt E.G., Camp N.D., Torgerson T.R., Nickerson D.A., Bamshad M.J., et al. Multiplexed Functional Assessment of Genetic Variants in CARD11. Am. J. Hum. Genet. 2020;107:1029–1043. doi: 10.1016/j.ajhg.2020.10.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Erwood S., Bily T.M.I., Lequyer J., Yan J., Gulati N., Brewer R.A., Zhou L., Pelletier L., Ivakine E.A., Cohn R.D. Saturation variant interpretation using CRISPR prime editing. Nat. Biotechnol. 2022;40:885–895. doi: 10.1038/s41587-021-01201-1. [DOI] [PubMed] [Google Scholar]
- 70.Kweon J., Jang A.-H., Shin H.R., See J.-E., Lee W., Lee J.W., Chang S., Kim K., Kim Y. A CRISPR-based base-editing screen for the functional assessment of BRCA1 variants. Oncogene. 2020;39:30–35. doi: 10.1038/s41388-019-0968-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Hanna R.E., Hegde M., Fagre C.R., DeWeirdt P.C., Sangree A.K., Szegletes Z., Griffith A., Feeley M.N., Sanson K.R., Baidi Y., et al. Massively parallel assessment of human variants with base editor screens. Cell. 2021;184:1064–1080.e20. doi: 10.1016/j.cell.2021.01.012. [DOI] [PubMed] [Google Scholar]
- 72.Cuella-Martin R., Hayward S.B., Fan X., Chen X., Huang J.-W., Taglialatela A., Leuzzi G., Zhao J., Rabadan R., Lu C., et al. Functional interrogation of DNA damage response variants with base editing screens. Cell. 2021;184:1081–1097.e19. doi: 10.1016/j.cell.2021.01.041. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Jun S., Lim H., Chun H., Lee J.H., Bang D. Single-cell analysis of a mutant library generated using CRISPR-guided deaminase in human melanoma cells. Commun. Biol. 2020;3:154. doi: 10.1038/s42003-020-0888-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Nov Y. When second best is good enough: another probabilistic look at saturation mutagenesis. Appl. Environ. Microbiol. 2012;78:258–262. doi: 10.1128/AEM.06265-11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Tang L., Gao H., Zhu X., Wang X., Zhou M., Jiang R. Construction of “small-intelligent” focused mutagenesis libraries using well-designed combinatorial degenerate primers. Biotechniques. 2012;52:149–158. doi: 10.2144/000113820. [DOI] [PubMed] [Google Scholar]
- 76.Chandra S., Gupta K., Khare S., Kohli P., Asok A., Mohan S.V., Gowda H., Varadarajan R. The High Mutational Sensitivity of ccdA Antitoxin Is Linked to Codon Optimality. Mol. Biol. Evol. 2022;39 doi: 10.1093/molbev/msac187. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Weile J., Sun S., Cote A.G., Knapp J., Verby M., Mellor J.C., Wu Y., Pons C., Wong C., van Lieshout N., et al. A framework for exhaustively mapping functional missense variants. Mol. Syst. Biol. 2017;13:957. doi: 10.15252/msb.20177908. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Melnikov A., Rogov P., Wang L., Gnirke A., Mikkelsen T.S. Comprehensive mutational scanning of a kinase in vivo reveals substrate-dependent fitness landscapes. Nucleic Acids Res. 2014;42:e112. doi: 10.1093/nar/gku511. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.O’Neill M.J., Muhammad A., Li B., Wada Y., Hall L., Solus J.F., Short L., Roden D.M., Glazer A.M. Dominant negative effects of SCN5A missense variants. Genet. Med. 2022;24:1238–1248. doi: 10.1016/j.gim.2022.02.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Xu Z., Juan V., Ivanov A., Ma Z., Polakoff D., Powers D.B., DuBridge R.B., Wilson K., Akamatsu Y. Affinity and Cross-Reactivity Engineering of CTLA4-Ig To Modulate T Cell Costimulation. J. Immunol. 2012;189:4470–4477. doi: 10.4049/jimmunol.1201813. [DOI] [PubMed] [Google Scholar]
- 81.Forsyth C.M., Juan V., Akamatsu Y., DuBridge R.B., Doan M., Ivanov A.V., Ma Z., Polakoff D., Razo J., Wilson K., Powers D.B. Deep mutational scanning of an antibody against epidermal growth factor receptor using mammalian cell display and massively parallel pyrosequencing. mAbs. 2013;5:523–532. doi: 10.4161/mabs.24979. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82.Schröder A.R.W., Shinn P., Chen H., Berry C., Ecker J.R., Bushman F. HIV-1 integration in the human genome favors active genes and local hotspots. Cell. 2002;110:521–529. doi: 10.1016/s0092-8674(02)00864-4. [DOI] [PubMed] [Google Scholar]
- 83.Vansant G., Chen H.-C., Zorita E., Trejbalová K., Miklík D., Filion G., Debyser Z. The chromatin landscape at the HIV-1 provirus integration site determines viral expression. Nucleic Acids Res. 2020;48:7801–7817. doi: 10.1093/nar/gkaa536. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84.Bushman F.D. Retroviral Insertional Mutagenesis in Humans: Evidence for Four Genetic Mechanisms Promoting Expansion of Cell Clones. Mol. Ther. 2020;28:352–356. doi: 10.1016/j.ymthe.2019.12.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 85.Zhuang J., Mukherjee S., Ron Y., Dougherty J.P. High rate of genetic recombination in murine leukemia virus: implications for influencing proviral ploidy. J. Virol. 2006;80:6706–6711. doi: 10.1128/JVI.00273-06. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86.Shukla N., Roelle S.M., Suzart V.G., Bruchez A.M., Matreyek K.A. Mutants of human ACE2 differentially promote SARS-CoV and SARS-CoV-2 spike mediated infection. PLoS Pathog. 2021;17 doi: 10.1371/journal.ppat.1009715. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 87.Hirano N., Muroi T., Takahashi H., Haruki M. Site-specific recombinases as tools for heterologous gene integration. Appl. Microbiol. Biotechnol. 2011;92:227–239. doi: 10.1007/s00253-011-3519-5. [DOI] [PubMed] [Google Scholar]
- 88.Turan S., Zehe C., Kuehle J., Qiao J., Bode J. Recombinase-mediated cassette exchange (RMCE) - a rapidly-expanding toolbox for targeted genomic modifications. Gene. 2013;515:1–27. doi: 10.1016/j.gene.2012.11.016. [DOI] [PubMed] [Google Scholar]
- 89.Buchholz F., Ringrose L., Angrand P.O., Rossi F., Stewart A.F. Different thermostabilities of FLP and Cre recombinases: Implications for applied site-specific recombination. Nucleic Acids Res. 1996;24:4256–4262. doi: 10.1093/nar/24.21.4256. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 90.Xu Z., Thomas L., Davies B., Chalmers R., Smith M., Brown W. Accuracy and efficiency define Bxb1 integrase as the best of fifteen candidate serine recombinases for the integration of DNA into the human genome. BMC Biotechnol. 2013;13:87–103. doi: 10.1186/1472-6750-13-87. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 91.Matreyek K.A., Stephany J.J., Fowler D.M. A platform for functional assessment of large variant libraries in mammalian cells. Nucleic Acids Res. 2017;45:e102. doi: 10.1093/nar/gkx183. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 92.Cheung R., Insigne K.D., Yao D., Burghard C.P., Wang J., Hsiao Y.H.E., Jones E.M., Goodman D.B., Xiao X., Kosuri S. A Multiplexed Assay for Exon Recognition Reveals that an Unappreciated Fraction of Rare Genetic Variants Cause Large-Effect Splicing Disruptions. Mol. Cell. 2019;73:183–194.e8. doi: 10.1016/j.molcel.2018.10.037. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 93.Matreyek K.A., Stephany J.J., Chiasson M.A., Hasle N., Fowler D.M. An improved platform for functional assessment of large protein libraries in mammalian cells. Nucleic Acids Res. 2020;48:e1. doi: 10.1093/nar/gkz910. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 94.Papapetrou E.P., Schambach A. Gene Insertion Into Genomic Safe Harbors for Human Gene Therapy. Mol. Ther. 2016;24:678–684. doi: 10.1038/mt.2016.38. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 95.Maricque B.B., Chaudhari H.G., Cohen B.A. A massively parallel reporter assay dissects the influence of chromatin structure on cis-regulatory activity. Nat. Biotechnol. 2018;37:90–95. doi: 10.1038/nbt.4285. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 96.Shin S., Kim S.H., Shin S.W., Grav L.M., Pedersen L.E., Lee J.S., Lee G.M. Comprehensive Analysis of Genomic Safe Harbors as Target Sites for Stable Expression of the Heterologous Gene in HEK293 Cells. ACS Synth. Biol. 2020;9:1263–1269. doi: 10.1021/acssynbio.0c00097. [DOI] [PubMed] [Google Scholar]
- 97.Mao Z., Bozzella M., Seluanov A., Gorbunova V. Comparison of nonhomologous end joining and homologous recombination in human cells. DNA Repair. 2008;7:1765–1771. doi: 10.1016/j.dnarep.2008.06.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 98.Kosicki M., Tomberg K., Bradley A. Repair of double-strand breaks induced by CRISPR–Cas9 leads to large deletions and complex rearrangements. Nat. Biotechnol. 2018;36:765–771. doi: 10.1038/nbt.4192. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 99.Medert R., Thumberger T., Tavhelidse-Suck T., Hub T., Kellner T., Oguchi Y., Dlugosz S., Zimmermann F., Wittbrodt J., Freichel M. Efficient single copy integration via homology-directed repair (scHDR) by 5′modification of large DNA donor fragments in mice. Nucleic Acids Res. 2023;51:e14. doi: 10.1093/nar/gkac1150. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 100.Kuscu C., Arslan S., Singh R., Thorpe J., Adli M. Genome-wide analysis reveals characteristics of off-target sites bound by the Cas9 endonuclease. Nat. Biotechnol. 2014;32:677–683. doi: 10.1038/nbt.2916. [DOI] [PubMed] [Google Scholar]
- 101.Jinek M., Chylinski K., Fonfara I., Hauer M., Doudna J.A., Charpentier E. A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity. Science (New York, N.Y.) 2012;337:816–821. doi: 10.1126/science.1225829. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 102.Cong L., Ran F.A., Cox D., Lin S., Barretto R., Habib N., Hsu P.D., Wu X., Jiang W., Marraffini L.A., Zhang F. Multiplex genome engineering using CRISPR/Cas systems. Science (New York, N.Y.) 2013;339:819–823. doi: 10.1126/science.1231143. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 103.Komor A.C., Kim Y.B., Packer M.S., Zuris J.A., Liu D.R. Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage. Nature. 2016;533:420–424. doi: 10.1038/nature17946. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 104.Gaudelli N.M., Komor A.C., Rees H.A., Packer M.S., Badran A.H., Bryson D.I., Liu D.R. Programmable base editing of A⋅T to G⋅C in genomic DNA without DNA cleavage. Nature. 2017;551:464–471. doi: 10.1038/nature24644. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 105.Hu J.H., Miller S.M., Geurts M.H., Tang W., Chen L., Sun N., Zeina C.M., Gao X., Rees H.A., Lin Z., Liu D.R. Evolved Cas9 variants with broad PAM compatibility and high DNA specificity. Nature. 2018;556:57–63. doi: 10.1038/nature26155. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 106.Tan J., Zhang F., Karcher D., Bock R. Expanding the genome-targeting scope and the site selectivity of high-precision base editors. Nat. Commun. 2020;11:629. doi: 10.1038/s41467-020-14465-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 107.Kim Y.B., Komor A.C., Levy J.M., Packer M.S., Zhao K.T., Liu D.R. Increasing the genome-targeting scope and precision of base editing with engineered Cas9-cytidine deaminase fusions. Nat. Biotechnol. 2017;35:371–376. doi: 10.1038/nbt.3803. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 108.Anzalone A.V., Randolph P.B., Davis J.R., Sousa A.A., Koblan L.W., Levy J.M., Chen P.J., Wilson C., Newby G.A., Raguram A., Liu D.R. Search-and-replace genome editing without double-strand breaks or donor DNA. Nature. 2019;576:149–157. doi: 10.1038/s41586-019-1711-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 109.Kim H.K., Yu G., Park J., Min S., Lee S., Yoon S., Kim H.H. Predicting the efficiency of prime editing guide RNAs in human cells. Nat. Biotechnol. 2021;39:198–206. doi: 10.1038/s41587-020-0677-y. [DOI] [PubMed] [Google Scholar]
- 110.Liu P., Liang S.-Q., Zheng C., Mintzer E., Zhao Y.G., Ponnienselvan K., Mir A., Sontheimer E.J., Gao G., Flotte T.R., et al. Improved prime editors enable pathogenic allele correction and cancer modelling in adult mice. Nat. Commun. 2021;12:2121. doi: 10.1038/s41467-021-22295-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 111.Chen P.J., Hussmann J.A., Yan J., Knipping F., Ravisankar P., Chen P.-F., Chen C., Nelson J.W., Newby G.A., Sahin M., et al. Enhanced prime editing systems by manipulating cellular determinants of editing outcomes. Cell. 2021;184:5635–5652.e29. doi: 10.1016/j.cell.2021.09.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 112.Hietpas R.T., Jensen J.D., Bolon D.N.A. Experimental illumination of a fitness landscape. Proc. Natl. Acad. Sci. USA. 2011;108:7896–7901. doi: 10.1073/pnas.1016024108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 113.Bloom J.D. Software for the analysis and visualization of deep mutational scanning data. BMC Bioinf. 2015;16:168. doi: 10.1186/s12859-015-0590-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 114.Rubin A.F., Gelman H., Lucas N., Bajjalieh S.M., Papenfuss A.T., Speed T.P., Fowler D.M. A statistical framework for analyzing deep mutational scanning data. Genome Biol. 2017;18:150. doi: 10.1186/s13059-017-1272-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 115.Faure A.J., Schmiedel J.M., Baeza-Centurion P., Lehner B. DiMSum: an error model and pipeline for analyzing deep mutational scanning data and diagnosing common experimental pathologies. Genome Biol. 2020;21:207. doi: 10.1186/s13059-020-02091-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 116.Stoler N., Nekrutenko A. Sequencing error profiles of Illumina sequencing instruments. NAR Genom. Bioinform. 2021;3:lqab019. doi: 10.1093/nargab/lqab019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 117.Ma X., Shao Y., Tian L., Flasch D.A., Mulder H.L., Edmonson M.N., Liu Y., Chen X., Newman S., Nakitandwe J., et al. Analysis of error profiles in deep next-generation sequencing data. Genome Biol. 2019;20:50. doi: 10.1186/s13059-019-1659-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 118.Fowler D.M., Stephany J.J., Fields S. Measuring the activity of protein variants on a large scale using deep mutational scanning. Nat. Protoc. 2014;9:2267–2284. doi: 10.1038/nprot.2014.153. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 119.Kivioja T., Vähärautio A., Karlsson K., Bonke M., Enge M., Linnarsson S., Taipale J. Counting absolute numbers of molecules using unique molecular identifiers. Nat. Methods. 2011;9:72–74. doi: 10.1038/nmeth.1778. [DOI] [PubMed] [Google Scholar]
- 120.Peng X., Dorman K.S. Accurate estimation of molecular counts from amplicon sequence data with unique molecular identifiers. Bioinformatics. 2023;39 doi: 10.1093/bioinformatics/btad002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 121.Tamer Y.T., Gaszek I., Rodrigues M., Coskun F.S., Farid M., Koh A.Y., Russ W., Toprak E. The Antibiotic Efflux Protein TolC Is a Highly Evolvable Target under Colicin E1 or TLS Phage Selection. Mol. Biol. Evol. 2021;38:4493–4504. doi: 10.1093/molbev/msab190. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 122.Bock C., Datlinger P., Chardon F., Coelho M.A., Dong M.B., Lawson K.A., Lu T., Maroc L., Norman T.M., Song B., et al. High-content CRISPR screening. Nat. Rev. Methods Primers. 2022;2:8. doi: 10.1038/s43586-021-00093-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 123.Hiatt J.B., Patwardhan R.P., Turner E.H., Lee C., Shendure J. Parallel, tag-directed assembly of locally derived short sequence reads. Nat. Methods. 2010;7:119–122. doi: 10.1038/nmeth.1416. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 124.Kitzman J.O., Starita L.M., Lo R.S., Fields S., Shendure J. Massively parallel single-amino-acid mutagenesis. Nat. Methods. 2015;12:203–206. doi: 10.1038/nmeth.3223. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 125.Starita L.M., Young D.L., Islam M., Kitzman J.O., Gullingsrud J., Hause R.J., Fowler D.M., Parvin J.D., Shendure J., Fields S. Massively Parallel Functional Analysis of BRCA1 RING Domain Variants. Genetics. 2015;200:413–422. doi: 10.1534/genetics.115.175802. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 126.Ahler E., Register A.C., Chakraborty S., Fang L., Dieter E.M., Sitko K.A., Vidadala R.S.R., Trevillian B.M., Golkowski M., Gelman H., et al. A Combined Approach Reveals a Regulatory Mechanism Coupling Src’s Kinase Activity, Localization, and Phosphotransferase-Independent Functions. Mol. Cell. 2019;74:393–408.e20. doi: 10.1016/j.molcel.2019.02.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 127.Lan F., Haliburton J.R., Yuan A., Abate A.R. Droplet barcoding for massively parallel single-molecule deep sequencing. Nat. Commun. 2016;7:11784. doi: 10.1038/ncomms11784. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 128.Eid J., Fehr A., Gray J., Luong K., Lyle J., Otto G., Peluso P., Rank D., Baybayan P., Bettman B., et al. Real-time DNA sequencing from single polymerase molecules. Science. 2009;323:133–138. doi: 10.1126/science.1162986. [DOI] [PubMed] [Google Scholar]
- 129.Deamer D., Akeson M., Branton D. Three decades of nanopore sequencing. Nat. Biotechnol. 2016;34:518–524. doi: 10.1038/nbt.3423. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 130.Quail M.A., Smith M., Coupland P., Otto T.D., Harris S.R., Connor T.R., Bertoni A., Swerdlow H.P., Gu Y. A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers. BMC Genom. 2012;13:341. doi: 10.1186/1471-2164-13-341. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 131.Travers K.J., Chin C.S., Rank D.R., Eid J.S., Turner S.W. A flexible and efficient template format for circular consensus sequencing and SNP detection. Nucleic Acids Res. 2010;38:e159. doi: 10.1093/nar/gkq543. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 132.Wang Y., Zhao Y., Bollas A., Wang Y., Au K.F. Nanopore sequencing technology, bioinformatics and applications. Nat. Biotechnol. 2021;39:1348–1365. doi: 10.1038/s41587-021-01108-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 133.Peterman N., Levine E. Sort-seq under the hood: implications of design choices on large-scale characterization of sequence-function relations. BMC Genom. 2016;17:206. doi: 10.1186/s12864-016-2533-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 134.Matuszewski S., Hildebrandt M.E., Ghenu A.-H., Jensen J.D., Bank C. A Statistical Guide to the Design of Deep Mutational Scanning Experiments. Genetics. 2016;204:77–87. doi: 10.1534/genetics.116.190462. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 135.Esposito D., Weile J., Shendure J., Starita L.M., Papenfuss A.T., Roth F.P., Fowler D.M., Rubin A.F. MaveDB: an open-source platform to distribute and interpret data from multiplexed assays of variant effect. Genome Biol. 2019;20:223. doi: 10.1186/s13059-019-1845-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 136.Hilton S.K., Huddleston J., Black A., North K., Dingens A.S., Bedford T., Bloom J.D. dms-view: Interactive visualization tool for deep mutational scanning data. J. Open Source Softw. 2020;5:2353. doi: 10.21105/joss.02353. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 137.Tareen A., Kooshkbaghi M., Posfai A., Ireland W.T., McCandlish D.M., Kinney J.B. MAVE-NN: learning genotype-phenotype maps from multiplex assays of variant effect. Genome Biol. 2022;23:98. doi: 10.1186/s13059-022-02661-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 138.Hanning K.R., Minot M., Warrender A.K., Kelton W., Reddy S.T. Deep mutational scanning for therapeutic antibody engineering. Trends Pharmacol. Sci. 2022;43:123–135. doi: 10.1016/j.tips.2021.11.010. [DOI] [PubMed] [Google Scholar]
- 139.Whitehead T.A., Chevalier A., Song Y., Dreyfus C., Fleishman S.J., De Mattos C., Myers C.A., Kamisetty H., Blair P., Wilson I.A., Baker D. Optimization of affinity, specificity and function of designed influenza inhibitors using deep sequencing. Nat. Biotechnol. 2012;30:543–548. doi: 10.1038/nbt.2214. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 140.Laroche A., Orsini Delgado M.L., Chalopin B., Cuniasse P., Dubois S., Sierocki R., Gallais F., Debroas S., Bellanger L., Simon S., et al. Deep mutational engineering of broadly-neutralizing nanobodies accommodating SARS-CoV-1 and 2 antigenic drift. mAbs. 2022;14 doi: 10.1080/19420862.2022.2076775. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 141.Warszawski S., Dekel E., Campeotto I., Marshall J.M., Wright K.E., Lyth O., Knop O., Regev-Rudzki N., Higgins M.K., Draper S.J., et al. Design of a basigin-mimicking inhibitor targeting the malaria invasion protein RH5. Proteins. 2020;88:187–195. doi: 10.1002/prot.25786. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 142.Wollacott A.M., Robinson L.N., Ramakrishnan B., Tissire H., Viswanathan K., Shriver Z., Babcock G.J. Structural prediction of antibody-APRIL complexes by computational docking constrained by antigen saturation mutagenesis library data. J. Mol. Recogn. 2019;32 doi: 10.1002/jmr.2778. [DOI] [PubMed] [Google Scholar]
- 143.Medina-Cucurella A.V., Zhu Y., Bowen S.J., Bergeron L.M., Whitehead T.A. Pro region engineering of nerve growth factor by deep mutational scanning enables a yeast platform for conformational epitope mapping of anti-NGF monoclonal antibodies. Biotechnol. Bioeng. 2018;115:1925–1937. doi: 10.1002/bit.26706. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 144.Harvey W.T., Carabelli A.M., Jackson B., Gupta R.K., Thomson E.C., Harrison E.M., Ludden C., Reeve R., Rambaut A., et al. COVID-19 Genomics UK COG-UK Consortium SARS-CoV-2 variants, spike mutations and immune escape. Nat. Rev. Microbiol. 2021;19:409–424. doi: 10.1038/s41579-021-00573-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 145.Landrum M.J., Lee J.M., Riley G.R., Jang W., Rubinstein W.S., Church D.M., Maglott D.R. ClinVar: public archive of relationships among sequence variation and human phenotype. Nucleic Acids Res. 2014;42:D980–D985. doi: 10.1093/nar/gkt1113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 146.Hoffman-Andrews L. The known unknown: the challenges of genetic variants of uncertain significance in clinical practice. J. Law Biosci. 2017;4:648–657. doi: 10.1093/jlb/lsx038. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 147.Mighton C., Shickh S., Uleryk E., Pechlivanoglou P., Bombard Y. Clinical and psychological outcomes of receiving a variant of uncertain significance from multigene panel testing or genomic sequencing: a systematic review and meta-analysis. Genet. Med. 2021;23:22–33. doi: 10.1038/s41436-020-00957-2. [DOI] [PubMed] [Google Scholar]
- 148.Chen W., Coombes B.J., Larson N.B. Recent advances and challenges of rare variant association analysis in the biobank sequencing era. Front. Genet. 2022;13 doi: 10.3389/fgene.2022.1014947. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 149.ICGC/TCGA Pan-Cancer Analysis of Whole Genomes Consortium Pan-cancer analysis of whole genomes. Nature. 2020;578:82–93. doi: 10.1038/s41586-020-1969-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 150.Richards S., Aziz N., Bale S., Bick D., Das S., Gastier-Foster J., Grody W.W., Hegde M., Lyon E., Spector E., et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet. Med. 2015;17:405–424. doi: 10.1038/gim.2015.30. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 151.Brnich S.E., Abou Tayoun A.N., Couch F.J., Cutting G.R., Greenblatt M.S., Heinen C.D., Kanavy D.M., Luo X., McNulty S.M., Starita L.M., et al. Recommendations for application of the functional evidence PS3/BS3 criterion using the ACMG/AMP sequence variant interpretation framework. Genome Med. 2019;12:3. doi: 10.1186/s13073-019-0690-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 152.Jusiak B., Jagtap K., Gaidukov L., Duportet X., Bandara K., Chu J., Zhang L., Weiss R., Lu T.K. Comparison of Integrases Identifies Bxb1-GA Mutant as the Most Efficient Site-Specific Integrase System in Mammalian Cells. ACS Synth. Biol. 2019;8:16–24. doi: 10.1021/acssynbio.8b00089. [DOI] [PubMed] [Google Scholar]
- 153.Bustin S.A., Benes V., Garson J.A., Hellemans J., Huggett J., Kubista M., Mueller R., Nolan T., Pfaffl M.W., Shipley G.L., et al. The MIQE guidelines: minimum information for publication of quantitative real-time PCR experiments. Clin. Chem. 2009;55:611–622. doi: 10.1373/clinchem.2008.112797. [DOI] [PubMed] [Google Scholar]
- 154.Lee J.A., Spidlen J., Boyce K., Cai J., Crosbie N., Dalphin M., Furlong J., Gasparetto M., Goldberg M., Goralczyk E.M., et al. MIFlowCyt: the minimum information about a Flow Cytometry Experiment. Cytometry A. 2008;73:926–930. doi: 10.1002/cyto.a.20623. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 155.Nikoomanzar A., Vallejo D., Chaput J.C. Elucidating the Determinants of Polymerase Specificity by Microfluidic-Based Deep Mutational Scanning. ACS Synth. Biol. 2019;8:1421–1429. doi: 10.1021/acssynbio.9b00104. [DOI] [PubMed] [Google Scholar]
- 156.Roychowdhury H., Romero P.A. Microfluidic deep mutational scanning of the human executioner caspases reveals differences in structure and regulation. Cell Death Dis. 2022;8:7. doi: 10.1038/s41420-021-00799-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 157.Yaginuma K., Aoki W., Miura N., Ohtani Y., Aburaya S., Kogawa M., Nishikawa Y., Hosokawa M., Takeyama H., Ueda M. High-throughput identification of peptide agonists against GPCRs by co-culture of mammalian reporter cells and peptide-secreting yeast cells using droplet microfluidics. Sci. Rep. 2019;9 doi: 10.1038/s41598-019-47388-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 158.Dixit A., Parnas O., Li B., Chen J., Fulco C.P., Jerby-Arnon L., Marjanovic N.D., Dionne D., Burks T., Raychowdhury R., et al. Perturb-Seq: Dissecting Molecular Circuits with Scalable Single-Cell RNA Profiling of Pooled Genetic Screens. Cell. 2016;167:1853–1866.e17. doi: 10.1016/j.cell.2016.11.038. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 159.Schraivogel D., Gschwind A.R., Milbank J.H., Leonce D.R., Jakob P., Mathur L., Korbel J.O., Merten C.A., Velten L., Steinmetz L.M. Targeted Perturb-seq enables genome-scale genetic screens in single cells. Nat. Methods. 2020;17:629–635. doi: 10.1038/s41592-020-0837-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 160.Hasle N., Cooke A., Srivatsan S., Huang H., Stephany J.J., Krieger Z., Jackson D., Tang W., Pendyala S., Monnat R.J., et al. High-throughput, microscope-based sorting to dissect cellular heterogeneity. Mol. Syst. Biol. 2020;16:e9442. doi: 10.15252/msb.20209442. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 161.Lee J., Liu Z., Suzuki P.H., Ahrens J.F., Lai S., Lu X., Guan S., St-Pierre F. Versatile phenotype-activated cell sorting. Sci. Adv. 2020;6 doi: 10.1126/sciadv.abb7438. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 162.Okano K., Wang C.-H., Hong Z.-Y., Hosokawa Y., Liau I. Selective induction of targeted cell death and elimination by near-infrared femtosecond laser ablation. Biochem. Biophys. Rep. 2020;24 doi: 10.1016/j.bbrep.2020.100818. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 163.Duckert B., Vinkx S., Braeken D., Fauvart M. Single-cell transfection technologies for cell therapies and gene editing. J. Contr. Release. 2021;330:963–975. doi: 10.1016/j.jconrel.2020.10.068. [DOI] [PubMed] [Google Scholar]
- 164.Kuhn M., Santinha A.J., Platt R.J. Moving from in vitro to in vivo CRISPR screens. Gene and Genome Editing. 2021;2 doi: 10.1016/j.ggedit.2021.100008. [DOI] [Google Scholar]
- 165.Low B.E., Hosur V., Lesbirel S., Wiles M.V. Efficient targeted transgenesis of large donor DNA into multiple mouse genetic backgrounds using bacteriophage Bxb1 integrase. Sci. Rep. 2022;12:5424. doi: 10.1038/s41598-022-09445-w. [DOI] [PMC free article] [PubMed] [Google Scholar]