Skip to main content
PLOS Biology logoLink to PLOS Biology
. 2026 Feb 11;24(2):e3003645. doi: 10.1371/journal.pbio.3003645

A cost-effective and scalable barcoded library construction method for deep mutational scanning studies

Jessica Jann 1,2,3,4,5,6, Isabelle Gagnon-Arsenault 1,2,3,4,5,6, Alicia Pageau 1,2,3,4,5,6, Alexandre K Dubé 1,2,3,4,5,6, Anna Fijarczyk 1,2,3,4,5,6, Romain Durand 1,2,3,4,5,6, Christian R Landry 1,2,3,4,5,6,*
Editor: Claudia Bank7
PMCID: PMC12923136  PMID: 41671286

Abstract

Recent developments in DNA synthesis and sequencing allow the construction of comprehensive gene variant libraries and their functional analysis. Achieving high-replication and thorough mutation characterization remains technically and financially challenging for long genes. Here, we developed an efficient, affordable, and scalable library construction approach that relies on low-cost DNA synthesis and standard cloning technologies, which will increase accessibility to mutational studies and help advance the field of protein science. Each degenerate codon variant is physically associated with multiple DNA barcodes during synthesis, which overcomes the need for long-read sequencing for linking variants to barcodes. We demonstrate the scalability of our approach by constructing a complete library for the multidrug resistance gene PDR1, a 3.2 kb gene encoding a pleiotropic transcription factor in the yeast Saccharomyces cerevisiae. We demonstrate a near-perfect correspondence in the measurement of amino acid variants impact when assessed by barcode sequencing and direct sequencing of the mutated coding sequence.


The scalability of deep mutational scanning (DMS) experiments are limited by gene size due to the complexity of the variant library and the costs of DNA synthesis and sequencing. This study develops an efficient and cost-effective barcoded cloning strategy for plasmid-based DMS libraries that facilitates the study of large genes.

Introduction

Understanding how genetic variation alters protein function is a central goal in genetics, molecular biology, and evolutionary and biomedical research. Over the past decade, deep mutational scanning (DMS) has emerged as a powerful technology to systematically evaluate the functional consequences of all possible single amino acid substitutions in a protein within a single experiment [1,2]. By coupling a comprehensive variant library, selection assays, and high-throughput sequencing, DMS allows the systematic quantification and mapping of genotype-phenotype relationships at high resolution and coverage [1,2].

DMS involves four major steps: (i) the construction of a comprehensive variant library, typically introducing all possible single amino acid substitutions into a target gene (or specific locus); (ii) the delivery of this library into a biological system via vectors (such as plasmids) or genomic integration; (iii) the application of a selective pressure (e.g., drug exposure, receptor activation, sorting-based protein stability); and (iv) the quantification of variant frequencies before and after selection using deep sequencing, allowing the estimation of functional effects for each substitution. Connecting mutations to phenotypes leads to the construction of mutation-phenotype landscapes that provide insights into diverse biological processes, such as structure-function relationships [3], disease-associated mutations [4], and mechanisms of adaptation [5].

The robustness and completeness of a DMS experiment rely largely on the quality and design of the variant library. Constructing a systematic variant library offers several advantages over random or non-systematic approaches (e.g., error-prone PCR) [6,7]. It ensures exhaustive mutational coverage and removes biases introduced by targeted or region-focused mutagenesis. Furthermore, vector-based variant libraries offer flexibility in downstream applications: they can be easily stored and reused in different genetic backgrounds or biological systems [8], coupled with fluorescent tags or reporters [9], and they allow flexible expression under constitutive or inducible promoters [10]. Vector-based libraries can be used not only for transformation but also as templates for transferring mutations into genomic loci using CRISPR-Cas9 editing or homologous recombination [11,12]. Examples of vector-based DMS applications include signaling-specific variants of MC4R in obesity [13], immune escape mutations in the SARS-CoV-2 Spike protein [14], and pathogenicity classifications for CRX and PAX6 in inherited eye disorders [15]. In microbes, recent efforts have used DMS to map resistance-conferring mutations [5,16,17].

While DMS is a powerful tool for high-throughput functional analysis, generating reproducible and high-coverage data remains challenging. Incorporating internal replicates, such as multiple unique codons or DNA barcodes per amino acid variant, reduces technical noise, improves the precision of effect estimates, and enhances sensitivity and reproducibility [1820]. The use of plasmid-based libraries further facilitates linking each variant to unique short DNA barcodes, enabling cost-effective tracking of mutation frequencies across diverse conditions using short-read sequencing. Similar barcoding strategies have previously been used in CRISPR-based pooled screens to enable high-throughput variant tracking [21].

Although the cost of high-throughput sequencing has steadily declined over the past 20 years, the large size of many target genes makes the cost of such experiments prohibitive. Gene size limits DMS scalability due to the complexity of the variant library and the associated synthesis and sequencing costs, including for linking variants to DNA barcodes. To overcome these challenges, we developed an efficient and cost-effective two-step cloning strategy for plasmid-based DMS libraries that facilitates the study of large genes using DNA-barcoded synthetic oligonucleotide variants.

Our approach involves fragmenting a gene of interest (Your Favorite Gene: YFG) into subregions of up to 150 base pairs (bp), in accordance with the synthesis length limits of single-stranded oligonucleotide pools (oPools). The oPools are composed of five parts (Fig 1A): (i) ~40 bp of homology upstream of the YFG fragment of interest, (ii) a YFG fragment sequence containing a single NNK codon substitution, (iii) BsaI restriction sites, (iv) a 30 bp DNA barcode alternating random dinucleotides (NN) with codon position-specific sequences, and (v) a conserved i7 primer binding site (PBS_i7).

Fig 1. A) Strategy to generate a DMS plasmid library for Your Favorite Gene (YFG) using short, degenerate libraries.

Fig 1

1. Segmentation of YFG into sub-fragments, each fragment corresponding to a DNA region to be synthesized. The same approach can be applied to promoter and terminator regions, if desired. 2. Example of a pool of degenerate oligonucleotides (oPool) derived from one YFG fragment associated with DNA barcodes. Each oPool contains: (i) ~40 bp of homology upstream of the YFG fragment of interest, (ii) the YFG fragment sequence with a single NNK codon, (iii) BsaI cloning sites, (iv) a DNA barcode composed of codon-position specific regions and six degenerate nucleotides (N), and (v) a conserved i7 primer binding site (PBS_i7) present in all oPools and used for rapid and efficient sequencing library preparation. Current oligonucleotide synthesis technologies allow for a total of nine degenerate positions per fragment: three are used for the degenerate codon (NNK), and six for the barcode. A complete list of all oPool sequences and their detailed composition is provided in S1 Table. 3. Protocol for constructing YFG DMS plasmid library from oPools using two cloning steps that maintain the physical barcode-mutation association. The libraries of oPools are cloned into the plasmid template by Gibson cloning. Following this step, for each fragment, a necessary short-read sequencing using PBS_i5 (included in the 5′ sequencing primer) and PBS_i7 is performed to associate each barcode with its corresponding mutation and to assess both barcode diversity per mutation and mutation coverage for the whole fragment. The ultimate step consists in Golden Gate cloning of the missing 3′ gene fragment between the degenerate fragment and the barcode. An additional short-read sequencing step of the barcodes can be performed to make sure that coverage and diversity have been maintained. Figure created in BioRender. Barff, T. (2025) https://BioRender.com/1vfl2on. B) Optimization of cloning steps. Cumulative percentage of informative barcodes (those associated with a single coding mutation and not with the wild-type (WT)), mutation coverage (percentage of mutations represented by at least one informative barcode) and barcode diversity (percentage of mutations represented by informative #barcodes >4 or >9) as a function of the number of transformants recovered after Gibson cloning for two fragments (F13 and F43) of a long gene coding for a transcription factor. Data shown here are derived from the combined results of multiple independent small-scale transformations, and results are normalized to the number of transformants per base pair (bp) to facilitate comparison across gene fragments of different lengths. Means and confidence intervals were obtained from 100 random subsamplings of independent transformation experiments, each consisting of 5,000 (F13) or 7,500 (F43) transformants, cumulatively combined. The numerical data underlying these graphs is provided in the file S1 Data. Library quality control after (C) Gibson and (D) Golden Gate assemblies for PDR1 F13 and F43. Heatmaps show barcode diversity for each possible amino acid substitution at each codon position. For each fragment, a total of 25,000 and 100,000 transformants were recovered and analyzed after Gibson and Golden Gate assemblies, respectively. Barcode diversity is shown using an unclipped color scale, allowing visualization of the full range of barcode counts. Mutations covered by high barcode diversity (#barcodes ≥10 and ≥4) are represented by blue and purple scales, respectively, while lower barcode diversity (#barcodes <4) is represented by a red scale. Gray squares represent WT amino acids. The numerical data underlying these graphs is provided in the files S2 and S3 Data. Clipped versions of these graphs (maximum of 10 barcodes per mutation), optimized to highlight lower barcode diversity, are provided in S1 and S2 Figs.

The integration of a barcode facilitates variant identification while allowing each codon, and thus each amino acid substitution, to be represented by multiple distinct barcodes, thereby providing internal replicates for downstream analysis. The barcodes and their corresponding variants remain physically linked on the same DNA molecule (plasmid), ensuring accurate tracking throughout the experiment.

This strategy utilizes the economical oPools, which are ~100-fold cheaper per base than double-stranded DNA synthesis (<$0.001 versus ~$0.05 per base) [22,23]. In addition, low-cost short-read sequencing is used (i) to associate random barcodes to their corresponding coding sequence variants and (ii) to track mutations in downstream experiments via the barcodes. This approach offers a cost-effective alternative to methods that require long-read sequencing for variant identification and tracking [24,25].

The cloning strategy is divided into two steps: a Gibson assembly, followed by a Golden Gate assembly (Fig 1A). In the first step, the plasmid backbone is generated via PCR from a yeast expression vector containing the wild-type (WT) YFG. Gibson assembly allows inserting the oPools (variant fragment + barcode + PBS_i7) into the backbone. These constructs then serve as backbone plasmids for the second cloning step.

In the second step, the missing YFG 3′ end sequence is amplified, including the C-terminal coding region, terminator, and PBS_i5. Golden Gate assembly enables the one-pot assembly of this missing YFG 3′ end sequence into the Gibson-assembled plasmids, using BsaI digestion and vector-insert ligation cycles (see Methods section). This strategy results in one distinct plasmid library per YFG fragment, each containing a full-length YFG sequence carrying a single mutation (in the targeted fragment) specifically linked to a DNA barcode flanked by the PBS_i5 and PBS_i7.

Each library undergoes quality control by high-throughput sequencing at both cloning steps (Fig 1):

  • After Gibson assembly, a necessary short-read sequencing (mutated fragment + barcode) is used to (i) establish barcode-mutation associations and (ii) assess both mutation coverage and barcode diversity per mutation.

  • After Golden Gate assembly, only the DNA barcodes are sequenced (facultative) to confirm that coverage and diversity have been maintained.

Our method is unique but shares some key elements with other methods previously developed, while being significantly more affordable. For instance, Jones and colleagues [26] (and slightly modified by Howard and colleagues [27]) also employed a two-step approach to link barcodes to variants, thereby overcoming the need for long-read sequencing. Their approach relies on pools of individual oligonucleotides with the specific sequences to be screened, i.e., without degenerate positions. The variants are linked to a random DNA barcode (15N) following their synthesis using a degenerate oligonucleotide, and this occurs before the missing gene fragment is inserted. There are major differences with our approach. The gene variant synthesis relies on more expensive gene synthesis technology. Second, the approach utilizes a random barcode, which increases the complexity of the library and potentially increases the fraction of low-complexity barcodes that can cause issues during downstream sequencing. Our design enables a library with a codon-specific barcode, which helps reduce sequence complexity and allows tracking positions and fragments easily for quality control. This is made feasible because the barcodes and mutated gene fragments are initially located on the same DNA molecules. An approach that allows for the use of affordable oPools like ours is SUNi mutagenesis [28]. This protocol employs a completely different method for mutagenesis, using degenerate pools of oligonucleotides that anneal to single-stranded plasmids to introduce variants. One noticeable difference between this approach and others is the proportion of mutants successfully recovered. While our method achieves near-complete mutation coverage (99.8%), the authors report substantially lower coverage, with 77% and 69% of mutation coverage for two independent libraries. In addition, this protocol does not use DNA barcodes that bypass the need for tile-sequencing, particularly for long genes, although it could arguably be adapted to do so. Finally, DIMPLE [29] relies on oligonucleotides generated using DNA microarrays, which likely improves coverage compared to SUNi, but at a significantly higher synthesis cost. One similarity between the DIMPLE approach and ours is the use of Golden Gate cloning, specifically for assembling the synthesized mutated gene fragments with full-length gene backbones. The advantage of DIMPLE is the introduction of other types of mutations, such as insertions and deletions (indels). We did not consider them in our current library, but we have previously successfully used oPools for making indel libraries [30], so our protocol could also include such mutations. In addition to synthesis cost, one significant difference between our protocol and DIMPLE is that it does not allow for rapid and affordable library barcoding during gene fragment synthesis.

To validate the feasibility and the strength of our approach, DMS libraries for all 43 fragments of a long gene coding for a transcription factor were constructed and characterized (Figs 1B1D and S1S8). Even if the information on all fragments is provided, we focus here on fragments F13 and F43 because they cover different distances from the DNA barcodes and because F13 contains mutations of interest. The performance of a DMS plasmid library depends on four key parameters: Informative barcodes (i.e., barcodes associated with only one coding mutation and not with the WT), which ensure data usability and limit sequencing costs of non informative barcodes; Mutation coverage (i.e., percentage of mutations represented by at least one informative barcode); Barcode diversity (i.e., percentage of mutations represented by informative #barcodes >4 or >9), which provides internal replicates; Variant uniformity (i.e., how evenly individual variants are represented within the library), ensuring balanced abundance and accurate quantitative comparisons.

These parameters are largely dependent on the number of recovered transformants after each cloning step. To better estimate this relationship, we combined sequencing data obtained from multiple independent small-scale transformations and normalized the results to the number of transformants per bp, allowing direct comparison across fragments of different lengths. A random sampling of resulting distributions indicates that after Gibson assembly, ~350 transformants per bp are needed to optimize sufficient informative barcode, mutation coverage, and barcode diversity(Fig 1B). This level of recovery yields 300–350 informative barcodes covering all possible mutations, with over 99.8% of mutations represented by #barcodes >4 and over 97.0% by #barcodes >9 (Figs 1B, 1C, and S1). Altogether, the findings demonstrate excellent mutation coverage and barcode diversity, after the first cloning step, two essential features for the robustness and reproducibility of DMS data.

As each cloning step inevitably results in some diversity loss, the number of transformants recovered after Golden Gate assembly was increased 4-fold (~1,400 transformants per bp). Although a small subset of mutations showed lower representation, our cloning strategy ensured over 99.8% of mutation coverage and preserved high barcode diversity: over 93.6% of mutations were still covered by #barcodes >4, and over 68.6% by #barcodes >9 (Figs 1D and S2).

To complement the barcode diversity and coverage analyses, the full distributions of barcode counts per amino acid mutation after Gibson and Golden Gate assemblies are shown in S9 Fig, providing a direct visualization of barcode representation across variants for fragments F13 and F43. In addition, the uniformity of variant representation was assessed after both the Gibson and the Golden Gate steps using the Gini coefficient, an index of distribution bias ranging from 0 (perfect uniformity) to 1 (maximum bias) [31]. We observed that the Golden Gate assemblies showed a slight increase in the Gini coefficient compared to the Gibson assemblies (S10 and S11 Figs), reflecting a minimal bottleneck related to multi-fragment cloning. Overall, uniformity remained high across all 43 fragments (Gini coefficients consistently <0.5).

To demonstrate the feasibility and utility of our library construction strategy, we focused on antifungal resistance, an increasingly pressing issue due to the rising incidence of invasive fungal infections and the limited number of effective treatment options [32]. Drug resistance mutations significantly compromise the efficacy of antifungal therapies, underscoring the urgent need for innovative approaches to investigate and counteract resistance mechanisms [33]. Among current antifungal drugs, azoles are widely used in clinical settings, including posaconazole (POSA), a broad-spectrum triazole [3437]. The transcription factor Pleiotropic Drug Resistance 1 (Pdr1) plays a key role in mediating multidrug resistance [38]. Even if it regulates many downstream pathways, PDR1 itself is not essential for yeast viability. However, mutations in PDR1 lead to overexpression of downstream targets, including ABC efflux pump genes like CDR1, CDR2, and SNQ2, conferring broad-spectrum drug resistance in both the model yeast Saccharomyces cerevisiae and the human pathogen Nakaseomyces glabratus (Candida glabrata) [39,40]. Although numerous studies have demonstrated the central role of PDR1, the precise identity and functional impact of resistance-conferring mutations remain largely unknown. Thus, our cloning strategy targeted S. cerevisiae PDR1, a large protein of 1,068 amino acids, whose comprehensive mutagenesis would have been financially and technically challenging using conventional methods. To demonstrate the accuracy of our method, we focused on fragment 13 (F13; residues 301–325) of Pdr1. This fragment combines both unknown and well-characterized regions associated with drug resistance [38], including previously identified and experimentally validated mutations [4044].

The PDR1 F13 variant library was used to perform high-throughput functional screening of all possible amino acid substitutions, linking genotype to antifungal resistance phenotypes (Fig 2). The DMS experiment consisted of four main steps (Fig 2A): (i) generation of the PDR1 variant library by transforming the plasmid library into a S. cerevisiae strain with doxycycline-repressible genomic PDR1 expression (ScPDR1-DOX), (ii) competition assay with POSA, (iii) high-throughput sequencing, and (iv) fitness analysis of each variant.

Fig 2. Demonstration of the high-accuracy mutation profile obtained through barcode sequencing of a DMS plasmid library built from oPools.

Fig 2

A) Pooled competition workflow using our barcoded DMS plasmid library targeting a drug resistance hotspot. 1. The Saccharomyces cerevisiae PDR1 variant library for F13 (residues 301-325) was transformed into a S. cerevisiae strain with a doxycycline-repressible endogenous PDR1 expression (ScPDR1-DOX). 2. Transformed strain expressing the variant library was grown in the presence of posaconazole (POSA, 0.40 μg/mL) or under drug-free control (CTL) conditions. 3. High-throughput short-read sequencing of DNA barcodes. 4. Selection coefficient analysis to identify resistant variants. Figure created in BioRender. Barff, T. (2025) https://BioRender.com/4wpf522. B) Azole resistance landscape of Pdr1 F13. Heatmaps of selection coefficients for the 525 single amino acid variants screened, based on either direct sequencing of the mutated F13 region (top) or DNA-barcode inference (bottom). Positive selection coefficients (>0, red) indicate increased fitness relative to the WT, while negative values (<0, blue) indicate reduced fitness. Black dots mark WT amino acids, and asterisks represent stop codons (as internal controls). Each value is the median of synonymous codons per amino acid. White cells indicate missing data. Squares in gray (S. cerevisiae or S. paradoxus) and purple (N. glabratus) highlight azole resistance mutations previously reported. The numerical data underlying these graphs is provided in the files S5 and S6 Data. C) Distribution of correlation coefficients between the selection coefficients measured with direct sequencing of the mutated F13 coding region and DNA-barcode inference, as a function of the number of barcodes per variant considered. Each boxplot summarizes 100 random subsamplings per variant, performed using a standardized resampling procedure to normalize barcode counts across variants. The numerical data underlying this graph is provided in the file S7 Data. D) Pdr1 tertiary structure with DMS azole resistance landscape on F13 region. Predicted tertiary structure of S. cerevisiae Pdr1 (AlphaFold 3 prediction, pTM = 0.72) shown in gray with its Zn2Cys6 DNA binding domain (residues 1–78) in green. The median selection coefficients from the DMS are mapped onto F13 (residues 301–325).

In this system, the endogenous PDR1 locus is placed under the control of a doxycycline-repressible promoter, allowing conditional regulation of its genomic expression. Although PDR1 is not essential for yeast viability, its activity strongly influences azole resistance phenotypes. The design allows conditional inhibition of genomic PDR1 expression and functional complementation with the variant libraries generated in this study, as shown in S12 Fig. With this system in place, we evaluated the performance of barcode-based sequencing for accurate quantification of variant fitness.

In principle, only short-read sequencing of DNA barcodes would be required. Here, we also performed full-length sequencing of the F13 fragment to compare the accuracy of the selection coefficients estimated from the barcode and from the mutated fragment directly, as is routinely done for small genes [4547] or through tile-seq for longer genes [5,48]. The estimates from the barcodes or from the F13 coding region were strongly correlated (r = 0.97, all available barcodes, S13 Fig). This high level of concordance was also observed with a much smaller number of barcodes (four barcodes per mutation (r = 0.91)), confirming the robustness of our cloning and barcoding strategy (Figs 2C and S14). Only a few discrepancies between barcode- and sequence-based estimates were observed (8 out of 525 mutations), most of which involving mutations supported by few barcodes (≤5) and showing neutral-to-sensitive fitness effects. By analyzing selection coefficients across all amino acid substitutions, we identified two distinct regions within fragment F13 with different contributions to resistance (Fig 2B and 2D). Residues 301–312 showed frequent gain-of-resistance mutations and residues 313–324 appeared to play a limited role in resistance, with fewer substitutions conferring resistance. These results closely match resistance-conferring mutations reported in the literature for Pdr1 in S. cerevisiae, S. paradoxus, and the closely related species N. glabratus (Fig 2B; e.g., T304N, I307S, M308I, R310S, etc.), supporting the power of our cloning strategy for mapping resistance landscapes.

In summary, we developed an efficient, scalable, and cost-effective two-step cloning strategy for the construction of high-quality, barcoded DMS plasmid libraries for genes of any length. By combining affordable oPools, DNA barcodes, and optimized Gibson and Golden Gate assemblies, while using short-read sequencing to minimize financial costs, our method ensures high mutation coverage and barcode diversity that provide robust internal replicates, key parameters for accurate and reproducible measurements of mutational effects on fitness or resistance. This approach reduces both technical and financial barriers, facilitating its broader application across diverse gene sizes, biological systems, and experimental contexts. By supporting comprehensive genotype-phenotype analyses, our strategy strengthens functional genomics and mechanistic insights, and enhances the accessibility of these technologies across laboratories.

Methods

General information for plasmid and strain constructions and media

All cloning procedures were performed using Escherichia coli strain MC1061, unless otherwise stated that we used E. coli NEB 5-alpha Competent Cells. Saccharomyces cerevisiae strain R1158, described in [49], served as the background for generating the doxycycline-repressible S. cerevisiae PDR1 strain (ScPDR1-DOX, details below).

All antimicrobials were prepared as 1,000× stock solutions: ampicillin (AMP), doxycycline (DOX), geneticin (G418), and nourseothricin (NAT) were dissolved in sterile water, filtered, and stored at −20 °C; posaconazole (POSA) and itraconazole (ITR) were dissolved in dimethyl sulfoxide (DMSO) and also stored at −20 °C.

For bacterial liquid cultures, E. coli was grown in LB medium (0.5% yeast extract, 1% tryptone, 1% NaCl) at 37 °C with shaking at 250 rpm. For solid cultures, cells were grown on 2YT agar plates (1% yeast extract, 1.6% tryptone, 0.5% NaCl, 0.2% glucose, 2% agar) and incubated at 37 °C. When required, AMP was added at a final concentration of 100 μg/mL for plasmid selection. S. cerevisiae was cultured in YPD medium (1% yeast extract, 2% tryptone, 2% glucose) at 30 °C with shaking at 250 rpm. For solid cultures, 2% agar was added to YPD. Yeast transformants with pRS31N plasmids were selected on YPD plates supplemented with 100 μg/mL G418 and 100 μg/mL NAT.

All PCRs were performed using KAPA High-Fidelity HotStart polymerase unless otherwise specified. Strains, reagents, plasmids, computational tools, and additional resources are listed in S2 Table. Oligos are listed in S3 Table, and PCR conditions in S4 Table.

All transformations were performed using E. coli NEB 5-alpha Competent Cells (High Efficiency) following the manufacturer’s recommended heat-shock protocol. Briefly, 2–5 µL of assembly reaction (Gibson or Golden Gate; see below) was added to 50 µL of chemically competent cells, followed by incubation on ice for 30 min, a 30-s heat shock at 42 °C, and immediate cooling on ice for 5 min. Cells were then recovered in 450 µL of SOC medium (0.5% Yeast Extract, 2% Vegetable Peptone, 0.5% Yeast Extract, 10 mM NaCl, 2.5 mM KCl, 10 mM MgCl2, 10 mM MgSO4, and 20 mM glucose) and incubated at 37 °C for 60 min with shaking (250 rpm) before plating on selective agar plates. The complete transformation protocol is available from New England Biolabs [50].

The integrity of all assembled and mutagenized plasmids was verified either by Sanger sequencing (Plateforme de séquençage et de génotypage des génomes, Centre de recherche du centre hospitalier de Québec-Université Laval (CRCHUL), Canada) or by whole plasmid sequencing using Oxford Nanopore Technology with custom analysis and annotation (Plasmidsaurus, USA or Flow Genomics, Canada).

Plasmid library design

pRS31N-ScPDR1 WT:

S. cerevisiae PDR1 was expressed from a yeast centromeric plasmid pRS31N-ScPDR1 that was constructed as follows. First, a plasmid containing the PDR1 sequence was created, including its promoter and terminator. The plasmid pRS31N containing AMP and NAT resistance markers was doubly digested with 20 U of both HindIII-HF and XbaI restriction enzymes. The insert, a fragment containing S. cerevisiae PDR1 native promoter (834 bp), coding sequence (3,207 bp), and terminator (345 bp), was amplified from the pMoBY-ScPDR1 plasmid with primers adding homology to the pRS31N plasmid cloning site (PCR 1). The resulting PCR product corresponding to the insert was incubated for 1h30 at 37 °C with 10 U of DpnI enzyme to remove parental DNA. Subsequently, the plasmid and insert were purified on magnetic beads and assembled by Gibson assembly to obtain pRS31N-ScPDR1.

As a BsaI restriction site is used in the construction of the PDR1 DMS plasmid library, we had to remove three BsaI restriction sites in pRS31N-ScPDR1. The three sites were mutated by site-directed mutagenesis (site 1) or by homologous recombination with a short double-stranded DNA fragment (gBlock) corresponding to PDR1 coding sequence between positions 2238 and 2720 without BsaI restriction sites (sites 2 and 3). The BsaI site 1 was removed by performing site-directed mutagenesis based on the QuickChange Site-Directed Mutagenesis System (Stratagene, La Jolla, CA). We amplified the pRS31N-ScPDR1 plasmid using a pair of primers containing the desired mutation at the center (PCR 2). The PCR product was then incubated for 1h30 at 37 °C with 10 U of DpnI enzyme to remove parental DNA, and pRS31N-ScPDR1 mutated plasmids (pRS31N-ScPDR1 minus site 1) were retrieved directly by transformation in bacteria. For sites 2 and 3, the plasmid pRS31N-ScPDR1 minus site 1 was first PCR amplified (PCR 3). The resulting PCR product was incubated for 1h30 at 37 °C with 10 U of DpnI enzyme to remove parental DNA and then, purified on magnetic beads. Finally, the plasmid and gBlock insert were assembled by Gibson assembly [51] to obtain pRS31N-ScPDR1 WT. For clarity, throughout this study, pRS31N-ScPDR1 WT refers to the plasmid after the BsaI site replacements, which served as the template for constructing the whole PDR1 DMS plasmid library.

oPools:

The DMS library was generated using only the PDR1 coding sequence (CDS). Due to the limited length of synthetic DNA and to enable barcode-mutation association with short-read sequencing, the 3,204 bp PDR1 gene was divided into 43 fragments of 75 bp each, except the last one which is of 54 bp. Fragment 1 goes from codon 1–25, fragment 2 goes from residue 26–50, and so on, up to the last fragment 43 (the smallest) going from residue 1,050–1,068. For each fragment, a library was built using affordable pools of degenerate oligonucleotides (oPools). oPools are composed of five parts designed as follows (Fig 1A):

  1.  ~40 bp of homology upstream of the PDR1 fragment of interest;

  2. PDR1 sequence of the fragment containing a single NNK codon (32 codons, including a stop codon);

  3. BsaI module (two BsaI sites separated by CCGAAGCT to avoid self-dimerization);

  4. 30 bp DNA barcode with six random N and two fixed sequences (12 bp each) for each codon position. This design facilitates the identification of the position of the mutated codon while allowing multiple distinct barcodes to represent the same mutation. As a result, each unique mutation is linked to several barcodes, providing internal replicates for downstream DMS analyses. oPools currently only allow for nine degenerate positions per DNA fragment. Three are used for the codon and six for the barcode; and

  5. the sequencing PBS_i7 site, which is common to all oPools and used for rapid and efficient short-read sequencing library preparation.

To generate the barcodes, we use the following Python script: github.com/lzamparo/DNAbarcodes, which generates all possible 12-nucleotide barcode sequences. From these, 30,000 barcodes with a GC content between 40 and 60% and a Hamming distance of 3 were selected. We then removed barcodes in which BsaI restriction site was present to prevent any problem in the Golden Gate assembly step of our protocol. We also removed barcodes with two identical bases at the beginning or at the end to prevent stretches of the same bases. We generated all individual 30 bp barcodes for the oPools by combining two 12 bp barcodes with six degenerate nucleotides, NN+barcode1+NN+barcode2+NN. We made sure the GC content of the sum of the two barcodes would be ~50%.

All oPool sequences were then generated using a custom Jupyter notebook (opool_generator_20240212.ipynb). We cut the PDR1 coding sequence into 75 bp fragments (25 codons per fragment), and we identified the 40 bp before and the 4 bp after each fragment. For each fragment, we generated 25 variant oligonucleotides in which a single codon was replaced by an NNK degenerate codon. Each oPool contains 25 different oligonucleotide sequences, each targeting one specific codon. The final sequences were generated by combining the 40 bp before the fragment, the NNK sequence, the 4 bp after the fragment, the BsaI module (GGAGACCgaagCTGGTCTCGACAG), one 30 bp barcode and the PBS_i7 (CTGTCTCTTATACACATCTCCGAGCCCACGAGAC).

All oPools are flanked by homology sequences at both the 5′ end (40 bp upstream of the target fragment within the PDR1 sequence) and the 3′ end (PBS_i7 sequence, facilitating rapid sequencing library preparation), for a total of 186–258 bp per oPool. The complete list of oPool sequences used in this study is provided in S1 Table.

Because cloning applications require double-stranded DNA inserts, oPools were amplified using two PCR reactions. The first PCR consisted of a single cycle using only the reverse primer (PCR4). Amplicons were purified using magnetic beads at a bead:DNA ratio of 1.8 to enrich for full-length oligonucleotides and to minimize the incorporation of truncated sequences (linked to oPool incomplete synthesis) into the final product. Then, 2 µL of the purified product was used as template for a second PCR, involving both forward and reverse primers and limited to five cycles to reduce PCR-related errors (PCR5). Each oPool amplification was performed in four replicates, which were pooled after the second PCR to maximize the diversity of variants carried into downstream cloning steps. This approach is intended to improve the overall quality of the oPools, ensuring sufficient insert quantity and diversity for subsequent steps of DMS library construction.

Plasmid library construction

Cloning strategy of the DMS plasmid library:

The cloning strategy was divided into two steps: Gibson assembly followed by Golden Gate assembly.

In the first step, the plasmid backbone was amplified from pRS31N-ScPDR1 WT (PCR 6). The amplified region includes homology arms corresponding to the sequence upstream of the targeted fragment and extends to the plasmid region downstream of the terminator (i7 region). The resulting linear PCR product was incubated for 1h30 at 37 °C with 10 U of DpnI enzyme to remove parental plasmid DNA and then, purified using magnetic beads. The oPools are amplified as described in the oPools section. Backbone and insert fragments were assembled using Gibson assembly protocol. To reach a target of ~25,000 transformants per fragment, necessary to ensure good barcode diversity and mutation coverage (Fig 1), two parallel Gibson reactions (10 μL each) followed by ~5 transformations (3 μL of Gibson reaction per transformation) using NEB 5-alpha Competent E. coli cells were performed for each fragment. Following transformations, 5 mL of 2YT medium was added directly onto each agar plate, and colonies were scraped using a sterile glass spreader. All transformants corresponding to a given fragment were pooled together. Aliquots were prepared, each corresponding to an optical density (OD₆₀₀) of 20 per fragment, and stored at −80 °C without (cell pellet) or with (glycerol stock) 15% glycerol. Plasmid libraries were extracted and purified from one cell pellet aliquot per fragment. These purified plasmids served as backbones for the subsequent cloning step.

In the second step, the PDR1 missing 3′ end region, including the 3′ end coding sequence, terminator, and PBS_i5 (TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG) (Fig 1A), was amplified (PCR 7). BsaI recognition sites were added to both ends of the amplicon to generate non-palindromic overhangs compatible with those in the Gibson-assembled plasmids. The resulting PCR product was incubated for 1h30 at 37 °C with 10 U of DpnI enzyme to remove parental DNA and then, purified on magnetic beads. The previously prepared plasmid libraries served as vectors, and the PDR1 missing 3′ region acted as inserts for Golden Gate assembly, a one-pot cloning method alternating between BsaI digestion and ligation (PCR 8). To ensure sufficient barcode diversity and adequate mutation coverage (Fig 1), approximately 100,000 transformants per fragment were needed. To achieve this, two parallel Golden Gate reactions (20 μL each) and ~10 transformations (2.5 μL of Golden Gate reaction per transformation) were performed for each fragment using NEB 5-alpha Competent E. coli cells, according to the manufacturer’s instructions. After transformation, 5 mL of 2YT medium was added directly to each agar plate, and colonies were scraped using a sterile glass spreader. All transformants corresponding to a given fragment were pooled together. Aliquots were prepared, each corresponding to an optical density (OD₆₀₀) of 80 per fragment, and stored at −80 °C without (cell pellet) or with (glycerol stock) 15% glycerol. Plasmid libraries were extracted and purified from one cell pellet aliquot per fragment. The resulting plasmid preparations corresponded to the final PDR1 DMS plasmid libraries, with a distinct library for each PDR1 fragment.

Plasmid library quality control by sequencing:

At each cloning step, library quality was assessed by high-throughput sequencing, focusing on parameters such as mutation coverage and barcode diversity per mutation (Figs 1C, 1D, and S1S4).

Sequencing library preparation:

To perform sequencing-based quality control after Gibson assembly, we started with 20 ng of plasmid library obtained from bacteria minipreps. Sequencing libraries were prepared via three PCRs. The first two PCRs (PCR9 and PCR10) amplified the fragment of interest along with the downstream DNA barcode, while simultaneously adding the PBS_i5 adapter sequence at the 5′ end. The forward primer included the PBS_i5 and 20 bp upstream of the targeted region, while the reverse primer corresponded only to the PBS_i7. The amplicons were purified, diluted 1:1,000 and used as a template for a third PCR (PCR11), adding the appropriate Illumina i5 and i7 sample indexes for multiplexed sequencing. For each sample, the last PCR was performed in triplicate to maintain the diversity of the sequenced sample. All amplicons from the same sample were pooled and purified using magnetic beads. All samples were sent for high-throughput sequencing (sample description in S5 Table). Although this approach ensures robust amplification of the library and sufficient sequencing depth, it does not directly account for potential amplicon sequencing biases. Such biases can occur when random DNA molecules are over-amplified compared to others, which can cause significant issues for downstream quantitative analyses. Incorporating Unique Molecular Identifiers (UMIs) during PCR when preparing sequencing libraries would help mitigate the possible occurrence of such biases.

Following Golden Gate assembly, only the DNA barcode region was sequenced, in order to verify that barcode diversity and mutation coverage were maintained. Because the PBS_i5 and PBS_i7 were already present on each side of the DNA barcode, sequencing libraries after Golden Gate assembly were generated in a single PCR (PCR12), which added the appropriate Illumina i5 and i7 sample indexes. Like after Gibson assembly, this PCR was also performed in triplicate per sample to prevent sampling bottlenecks. All replicate amplicons were pooled and purified on magnetic beads prior to sequencing. All samples were then sent for high-throughput sequencing (sample description in S6 Table).

Sequencing platform:

High-throughput sequencing was performed in paired-end 100 bp or 150 bp at the Centre de recherche du CHU de Québec–Université Laval sequencing platform (CHUL, Canada) using an Illumina NovaSeq 6000 S4 system. Libraries were sequenced at an average depth of ~250 reads per expected variant to assess barcode diversity and mutation coverage.

Sequencing read processing:

To analyze quality control after Gibson assembly, first, low-quality reads were removed with fastp using default parameters -q 15 (the minimum phred score quality value that a base is qualified equal to 15) and -u 40 (40% of bases allowed to be unqualified). Reads were then merged with bbmerge from bbmap. All subsequent processing was done using custom Python scripts available at https://github.com/Landrylab/Jann_et_al_2025. To remove potential sequencing errors, the following reads were discarded: reads with incorrect length (reads including insertions or deletions), reads with more than one mutated codon within the PDR1 coding region (75 bp), reads with unexpected mutations within barcode sequences, and reads with missing nucleotides within the PDR1 or barcode sequence. After filtering, more than 60% and 80% of all reads were retained for F13 and F43, respectively.

Mutations with coverage of less than two reads were discarded. To measure the impact of combining results from multiple transformations (16 reactions, each with 5,000 (F13) or 7,500 (F43) transformants), we calculated the cumulative % of different metrics across all transformations sorted in a randomized manner. Means and confidence intervals were obtained for 100 random draws of transformation reactions. Metrics included: (i) informative barcodes, (ii) mutation coverage, (iii) barcode diversity: #barcodes >4 per mutation, and (iv) barcode diversity: #barcodes >9 per mutation. To standardize data, all metrics have been converted to the number of transformants/bp.

As done following the Gibson assembly, raw reads obtained after Golden Gate assembly product sequencing were filtered using fastp (default parameters), and overlapping paired-end reads were merged using bbmerge. Reads with incorrect length, unexpected barcode sequences, or missing nucleotides were removed, along with single-count reads. Barcode-mutation associations were performed by merging barcode reads from the Golden Gate assembly with the reference association generated after the Gibson assembly. To evaluate the library quality after the Golden Gate step, we re-calculated key metrics: informative barcodes, mutation coverage, and barcode diversity.

Assessment of plasmid library quality, diversity, and uniformity

To evaluate the quality and representation of variants in the constructed libraries, we quantified their uniformity using two established metrics. First, the Gini coefficient, an index of distribution bias ranging from 0 (perfect uniformity) to 1 (maximum bias) [31], was calculated to assess the equal representation of variants after each cloning step (Gibson and Golden Gate). Second, we used the uniformity score proposed by [28], defined as the logarithmic difference between the 90th and 10th percentiles of the read count for each variant per fragment, where a small value indicates greater uniformity. These analyses were performed on the plasmid sequencing data to confirm a balanced distribution of variants before functional selection (S10 and S11 Figs; S7 Table).

The presence of all expected mutations, with excellent DNA barcode diversity per mutation was confirmed in libraries F13 and F43 (after Gibson assembly: all mutations are covered by #barcodes >4 and after Golden Gate assembly: ~93.6%–100% of mutations are covered by #barcodes >4) (Figs 1C, 1D, S1, and S2). A small number of mutations, specifically those at residues 322–324, were detected at lower read counts. Nevertheless, they still showed high mutation coverage (98.3%) and acceptable barcode diversity, with 71.7% of mutations with #barcodes >4 (Figs 1D and S2), which is the threshold required to have over 0.9 in terms of correlation coefficient between the selection coefficients measured from F13 directly and the barcodes: (Fig 2C).

Although the main text presents in detail only the characterization of the mutant libraries for two representative PDR1 fragments (F13 and F43), we also carried out, in parallel, the construction of a comprehensive plasmid mutant library covering the entire PDR1 coding sequence (total of 43 libraries, one for each fragment of 25 aa). This extended library was generated using the same two-step cloning strategy described above.

Analysis of the complete 43-fragment library confirmed strong performance across all quality metrics (S5S8 Figs; S8 Table). Nearly all expected mutations (∼98,4%) were recovered, with only a few absent positions likely attributable to synthesis dropout in the oPools. Barcode diversity per mutation remained high throughout all libraries: after Gibson assembly, ~97,5% mutations were represented by #barcodes >4, and after Golden Gate assembly this remained true for ~95.4% of positions (S5S8 Figs; S8 Table). Uniformity analysis further indicated that variant abundance across the library (Gini coefficient ~0.32) was well balanced and acceptable for downstream DMS experiments (S8 Table).

Consistent with previous observations using NNK degenerate codon libraries, expected biases in variant diversity were observed across amino acid substitutions as some amino acids are encoded by more codons. Amino acids such as leucine, arginine, and serine are encoded by three codons, while most others are encoded by one or two, resulting in higher expected representation for these residues (Figs 1C, 1D, S5, and S6).

However, codon-level analyses revealed that synonymous codon multiplicity alone does not fully explain the observed patterns (S3, S4, S7, and S8 Figs). Certain codons rich in thymine or guanine (e.g., TTT, GGG, TGG, TTG) consistently showed higher barcode diversity, while others were underrepresented, such as the codons for proline, threonine, and alanine (S3S8 Figs). In addition, position-specific biases were detected along the gene, with a small number of codon positions showing reduced (PDR1 F13–codons 301 and 322–324: S3 and S4 Figs) or absent representation (19 codon positions in the entire PDR1 gene: S5S8 Figs), likely reflecting dropouts during oPool synthesis rather than cloning inefficiencies.

We additionally quantified the representation of non-programmed variants and found that ~10% of reads corresponded to the WT sequence and ~10% to double-mutants (S8 Table). These species can either be excluded during downstream fitness estimation or leveraged for dedicated analyses of epistatic interactions, depending on the experimental aims.

To support reproducibility and provide a broader overview of the experimental workflow, the results and quality metrics for each cloning stage of the full-length PDR1 library are presented in S8 Table and S5S8 Figs. In addition, we provide a detailed estimated experimental timeline for the construction of the full 43-fragment PDR1 mutant library in S9 Table. This resource is intended to serve as a practical reference for researchers aiming to apply or adapt such an approach in future DMS projects.

Quality control of final plasmid libraries:

To assess the accuracy of the cloning strategy and evaluate potential PCR-induced mutations, we performed Whole Plasmid Sequencing of six plasmids derived from independent fragments assembly by Golden Gate. No unintended mutations were present in the PDR1 promoter, coding sequence, or terminator regions. This quality-control step validates the reliability of our cloning approach. Although pre-sequencing of all PCR-amplified fragments prior to Golden Gate assembly could further ensure accuracy, it was not performed here due to cost and scalability considerations.

Variant library construction

Even if PDR1 is not essential for S. cerevisiae viability, we replaced its endogenous promoter with a DOX-repressible promoter (tetO7). This allowed inhibition of genomic PDR1 expression and complementation by plasmid-encoded variant libraries. We chose this approach after observing low transformation efficiency when attempting to transform a yeast strain harboring a complete PDR1 deletion.

Yeast strain construction (S12A Fig):

The ScPDR1-DOX strain was generated based on the Yeast Tet-Promoters Hughes Collection (yTHC) [52]. The endogenous promoter of PDR1 (spanning 50 bp upstream of the ATG codon; chrVIII:121683-121733) was replaced with a tetracycline-regulatable promoter. The KANMX-tetO7 cassette was amplified from plasmid pKB33 (PCR13) and introduced into competent S. cerevisiae R1158 cells using an adapted lithium acetate transformation protocol [53]. Transformants were selected on YPD medium containing 200 μg/mL G418.

Validation of genomic PDR1 expression control in ScPDR1-DOX:

To validate the control of genomic PDR1 expression in the ScPDR1-DOX strain, PDR1 expression levels were quantified in the presence and absence of DOX in the culture medium (S12B Fig). For this purpose, the gene coding for a monomeric enhanced Green Fluorescent Protein (mEGFP) was fused to the PDR1 coding sequence in the genome of ScPDR1-DOX strain. The mEGFP sequence along with a hygromycin resistance module was amplified from plasmid pFA-mEGFP-HPHNT1 (see construction below) using primers introducing homology arms targeting the 3′ end of the PDR1 CDS and its terminator (PCR 14). Amplicon was treated with 20 U of DpnI for 1 h at 37 °C, purified using magnetic beads, and used to transform ScPDR1-DOX competent cells using the standard lithium acetate transformation protocol [53].

Yeast strains (ScPDR1-GFP-DOX and ScPDR1-DOX) were cultured overnight at 30 °C with shaking in synthetic complete (SC) medium in 24 deep-well plates. The SC medium contained 0.17% yeast nitrogen base without amino acids and ammonium sulfate, 0.1% monosodium glutamate, 2% glucose, and amino acids drop-out without tryptophan. Saturated cultures were diluted to an OD₆₀₀ of 0.15 in fresh medium with or without DOX (10 µg/mL), and grown again under agitation at 30 °C until reaching an OD₆₀₀ of 0.5. Cells were then diluted to ~500 cells/µL in 0.2 µm filtered distilled water for flow cytometry analysis. GFP fluorescence was measured using a Guava easyCyte flow cytometer (MilliporeSigma) with excitation at 642 nm (GRN-B channel, 525/30 filter). A preliminary gating step (FSC-H (Forward Scatter Height) <15,000 and FSC-A (Forward Scatter Area) <15,000) was applied to select the main population of morphologically normal cells and to exclude non-representative events. A total of 5,000 events were recorded per replicate. Three independent biological replicates were analyzed per condition. Fluorescence signals were processed using a custom Python script and all the data (including control lacking GFP) are shown in S12B Fig.

In order to generate ScPDR1-GFP-DOX yeast strain, we first had to construct pFA-mEGFP-HPHNT1 plasmid. This was done by inserting mEGFP gene (insert 1) and ENO1 terminator fragment (insert 2) into pFA-hphNT1 (vector) using Gibson assembly. All three fragments were PCR amplified (PCR15, PCR16, and PCR17, respectively) and treated with 20 units of DpnI for 1 h at 37 °C before being purified on magnetic beads.

Yeast strain validation by complementation:

In order to validate whether PDR1 expressed from the plasmid complements its genomic counterpart, we selected a condition where PDR1 expression is beneficial (in presence of antifungal) and assessed the growth of ScPDR1-DOX strain transformed either with an empty plasmid or with one containing PDR1 WT sequence (S12C Fig). After adjusting cell density of precultures to an OD600 of 1, three serial 1/5 dilutions were prepared in 200 μL of water (i.e., 40 μL of cells in 160 μL of water). Then, 5 μL of each dilution were spotted on YPD + NAT agar plates supplemented with either 0.1% DMSO (control), 1 μg/mL ITR, 2 μg/mL ITR, or 4 μg/mL ITR, in the presence or not of DOX (10 μg/mL). Plates were incubated at 30 °C for 48 h before imaging.

Library in yeast:

The DMS plasmid library was introduced into ScPDR1-DOX yeast cells via lithium acetate transformation, performed fragment by fragment using a standard protocol [53]. For each fragment, approximately 100,000 colonies were recovered to ensure sufficient library coverage and barcode diversity. To collect transformants, 5 mL of YPD medium was added directly to each plate, and colonies were scraped using a sterile glass spreader. All transformants corresponding to a given fragment were pooled together. Aliquots were prepared, each corresponding to an optical density (OD₆₀₀) of 80 per fragment, and stored at −80 °C in 25% glycerol.

Pooled competition assay

A pooled competition assay was performed using the PDR1 F13 variant library, in the presence or absence of an antifungal. All cultures were grown in YPD medium supplemented with NAT (100 μg/mL), to maintain plasmid selection, and DOX (10 μg/mL), to repress the endogenous PDR1 promoter. As the used antifungal compound was dissolved in DMSO, the control condition also included 0.1% DMSO. POSA was used at a final concentration of 0.40 μg/mL, corresponding to the concentration that inhibits approximately 50% of WT growth, as previously reported [5].

The PDR1 F13 variant library was first pre-cultured overnight in YPD + NAT, starting with an estimated inoculum of ~10,000 cells per variant (Time Point 0, TP0). Saturated cultures were diluted to an initial OD₆₀₀ of 0.05 into fresh YPD + NAT + DOX + POSA, and grown until an OD600 of 0.80 was reached (TP1). A second dilution and growth cycle under the same conditions was performed (OD600 = 0.05 to 0.80), corresponding to a total of approximately eight mitotic generations (TP2). Identical conditions were used for the control (DMSO) condition.

At TP0, TP1, and TP2, cells were harvested by collecting 5 OD₆₀₀ units per sample in 24 deep-well plate, followed by centrifugation at 916g for 5 min. Pellets were resuspended in 1 mL of YPD, transferred to 1.5 mL microcentrifuge tubes, and centrifuged again at 600g for 2 min. Supernatants were removed by aspiration, and resulting pellets were stored at −80 °C until plasmid extraction.

Plasmid recovery

Plasmid recovery from yeast was performed using the Zymoprep Yeast Plasmid Miniprep II kit, with the following modified protocol based on [5,24] to improve yield. Frozen cell pellets were resuspended in 200 μL of Solution 1, followed by the addition of 30 U of Zymolyase 20T. Cells were gently vortexed and incubated at 37 °C for 2 h. Following incubation, tubes were briefly vortexed and frozen at −80 °C for at least 20 min (up to overnight). After thawing at 37 °C for 3 min, 200 μL of Solution 2 was added and mixed, followed by the addition of 400 μL of Solution 3. Tubes were centrifuged at maximum speed for 3 min, and the supernatant (~800 μL) was transferred to a Zymo-Spin I column. Columns were washed with 550 μL of ethanol-containing wash buffer and centrifuged for 1.5 min at maximum speed. After removing the flow-through, columns were spun for an additional min to dry. Plasmids were eluted by adding 10 μL of Tris buffer (10 mM, pH 8.0) and centrifuging for 1 min at maximum speed. Eluted DNA was stored at −20 °C until further processing.

High-throughput sequencing:

To validate whether DNA-barcode inference mutations corresponded accurately to the actual mutations present in the F13 sequence of PDR1, two types of library were generated: one containing the F13 region of PDR1 and one consisting of the barcodes.

Sequencing from yeast minipreps was performed using a protocol similar to that described in the Plasmid library quality control by sequencing section. Starting from 3 μL of yeast miniprep, an initial PCR was carried out to amplify only PDR1 from the plasmid, using Phusion High-Fidelity DNA Polymerase and M13 primers flanking the PDR1 insert on pRS31N (PCR18). This step avoids sequencing contaminating genomic PDR1 and ensures that the same sequencing material could be used for both F13 region and barcode-based analyses, allowing a direct and unbiased comparison between the two datasets. In a standard barcode-based DMS experiment, the DNA-barcodes could be directly amplified using the PBS_i5 and PBS_i7 flaking sites, which would simplify library preparation and improve sequencing data quality.

To generate the F13 sequencing libraries, two rounds of PCR were then performed. Starting with 2 μL of M13-purified amplicons, a first PCR made with Phusion High-Fidelity DNA Polymerase (PCR19) was used to amplify the F13 region while introducing PBS_i5 and PBS_i7 sequences at the 5′ and 3′ ends, respectively. In addition, a 0–3 nucleotides spacer (N) was added to the primers to increase sequencing diversity. A second PCR (PCR20) was then used to add the appropriate Illumina i5 and i7 sample indexes, as described in the Plasmid library quality control by sequencing section.

To generate the DNA barcode sequencing libraries, 2 μL of the M13-purified amplicons were used as a template for a single PCR reaction to add the appropriate Illumina i5 and i7 indexes (PCR21), as described in the Plasmid library quality control by sequencing section.

Both the F13 and DNA barcode sequencing libraries were sent for high-throughput sequencing in paired-end 150 bp at the Centre de recherche du CHU de Québec - Université Laval sequencing platform (CHUL, Canada) using an Illumina NovaSeq 6000 S4 flow cell (sample description in S10 Table). All samples were sequenced to achieve a minimum of 500 reads per expected variant (i.e., ~ 0.4 million reads per sample for the F13 libraries and ~4 million reads per sample for the DNA barcode libraries). Only samples from timepoints TP0 and TP2 were sequenced. TP1 was excluded, as the limited frequency changes observed after just four generations hindered robust genotype-phenotype associations.

Data analysis

Selection coefficient and variant characterization:

Selection coefficients were obtained with gyōza, a Snakemake-based workflow for the analysis of deep mutational scanning (DMS) data [54]. The pipeline includes adapter trimming with Cutadapt, read merging using PANDAseq, and clustering of identical reads via VSEARCH. Singleton reads (i.e., sequences observed only once) were excluded to reduce noise from sequencing errors. A custom script then aggregated read statistics across all processing steps.

The F13 samples were processed using the ‘codon’ mode of gyōza v1.1.8 [54], with constant 20 bp sequences on either side of the mutated locus specified for trimming. For the barcoded library, the “barcode” mode of gyōza v1.1.8 was used instead, by providing the dataframe of barcode-variant associations. Briefly, (i) all singletons (read count of 1) were discarded, (ii) log₂ fold changes of allele frequencies were calculated between TP0 and TP2, (iii) selection coefficients were calculated by normalizing with the number of mitotic generations (to account for different growth rates across cultures and make sure functional impact scores reflect the per-generation selective advantage), and (iv) selection coefficients were further normalized by subtracting the median log₂ fold-change of silent variants (excluding the WT nucleotide sequence).

Variants were assigned confidence scores (as defined by gyōza) based on their initial read count at TP0 across replicates. A minimum threshold of five reads per nucleotide sequence was used to define high-confidence variants, which were retained for downstream analysis. Selection coefficients were then aggregated by amino acid position, averaging across synonymous codons (S5 and S6 Data). The resulting tables were used to classify mutations from resistant to sensitive (0.6 to −0.6) based on their deviation from the WT. Finally, heatmaps were generated using the selection coefficient for each variant to enable clear visualization and comparison between F13 region and DNA barcode sequencing (Figs 2B and S14).

To assess the quality of the barcoded library, we compared the final selection coefficients per mutation obtained from sequencing of either mutated F13 region or DNA-barcode inference. To evaluate how barcode diversity influences the reliability of these coefficients, we progressively increased the number of barcodes required per mutation (from 1 to 10) and, at each level, performed 100 random samplings. To ensure consistent comparisons across variants with different barcode counts, we implemented a standardized subsampling procedure during correlation analysis. If a variant had fewer barcodes than the target number (n), some barcodes were randomly resampled multiple times to reach n. Conversely, if a variant had more than n barcodes, a random subset of n barcodes was selected. This approach allowed us to normalize barcode representation across variants and to accurately assess correlation metrics based on 100 independent subsamplings. For each sampling, we calculated the correlation between selection coefficients inferred from DNA barcodes and those obtained from the F13 region (Fig 2C). An analysis of the correlation stratified by the number of barcodes per variant confirms the robustness of our strategy, even for variants associated with a lower barcode count (S14 Fig).

As no experimentally determined structure was available for Pdr1 in the Protein Data Bank (PDB, RCSB), we used the AlphaFold 3 server [55] to predict its tertiary structure (pTM score = 0.72; seed = 264,428,788) Fig 2D The amino acid sequence of S. cerevisiae Pdr1 was obtained from UniProt (accession: P12383). The predicted structure was visualized using ChimeraX [56].

Supporting information

S1 Fig. Barcode diversity per amino acid substitution after Gibson assembly.

Heatmaps show the total number of unique barcodes for each possible amino acid substitution at each codon position in Pdr1 fragments F13 and F43. For each fragment, a total of 25,000 transformants were recovered and analyzed. Covered mutations are mutations represented by #barcodes >0, while barcode diversity represented by #barcodes >4 or >9 per mutation demonstrates an increasing number of replicates for a same mutation. Gray squares represent WT amino acids. In the heatmaps, the number of barcodes per mutation is clipped at 10. The numerical data underlying this graph is provided in S2 Data.

(TIF)

pbio.3003645.s001.tif (2.3MB, tif)
S2 Fig. Barcode diversity per amino acid substitution after Golden Gate assembly.

Heatmaps show the total number of unique barcodes for each possible amino acid substitution at each codon position in Pdr1 fragments F13 and F43. For each fragment, a total of 100,000 transformants were recovered and analyzed. Covered mutations are mutations represented by #barcodes >0, while barcode diversity represented by #barcodes >4 or >9 per mutation demonstrates an increasing number of replicates for a same mutation. Gray squares represent WT amino acids. In the heatmaps, the number of barcodes per mutation is clipped at 10. The numerical data underlying this graph is provided in S3 Data.

(TIF)

pbio.3003645.s002.tif (2.4MB, tif)
S3 Fig. Barcode diversity per NNK codon substitution after Gibson assembly.

Heatmaps show barcode diversity for each possible NNK codon substitution at each codon position in Pdr1 fragments F13 and F43. For each fragment, a total of 25,000 transformants were recovered and analyzed. Barcode diversity is shown using an unclipped color scale, allowing visualization of the full range of barcode counts. Mutations covered by high barcode diversity (#barcodes ≥10 and ≥4) are represented by a blue and purple scale, respectively, while lower barcode diversity (#barcodes <4) is represented by a red scale. Gray squares represent WT amino acids. The numerical data underlying this graph is provided in S2 Data.

(TIF)

pbio.3003645.s003.tif (2.9MB, tif)
S4 Fig. Barcode diversity per NNK codon substitution after Golden Gate assembly.

Heatmaps show barcode diversity for each possible NNK codon substitution at each codon position in Pdr1 fragments F13 and F43. For each fragment, a total of 100,000 transformants were recovered and analyzed. Barcode diversity is shown using an unclipped color scale, allowing visualization of the full range of barcode counts. Mutations covered by high barcode diversity (#barcodes ≥10 and ≥4) are represented by a blue and purple scale, respectively, while lower barcode diversity (#barcodes <4) is represented by a red scale. Gray squares represent WT amino acids. The numerical data underlying this graph is provided in S3 Data.

(TIF)

S5 Fig. Library quality control per amino acid substitution after Gibson assembly of the full-length PDR1 sequence (43 fragments).

Heatmap displays the barcode diversity associated with each possible amino acid substitution at each codon position. Barcode diversity is shown using an unclipped color scale, allowing visualization of the full range of barcode counts. Mutations covered by high barcode diversity (#barcodes ≥10 and ≥4) are represented by a blue and purple scale, respectively, while lower barcode diversity (#barcodes <4) is represented by a red scale. Gray squares represent WT amino acids. The numerical data underlying this graph is provided in S2 Data.

(PNG)

pbio.3003645.s005.png (370.7KB, png)
S6 Fig. Library quality control per amino acid substitution after Golden Gate assembly of the full-length PDR1 sequence (43 fragments).

Heatmap displays the barcode diversity associated with each possible amino acid substitution at each codon position. Barcode diversity is shown using an unclipped color scale, allowing visualization of the full range of barcode counts, including low values in red. Mutations covered by high barcode diversity (#barcodes ≥10 and ≥4) are represented by a blue and purple scale, respectively, while lower barcode diversity (#barcodes <4) is represented by a red scale. Gray squares represent WT amino acids. The numerical data underlying this graph is provided in S3 Data.

(PNG)

pbio.3003645.s006.png (373.3KB, png)
S7 Fig. Library quality control per NNK codon substitution after Gibson assembly of the full-length PDR1 sequence (43 fragments).

Heatmaps show barcode diversity for each possible NNK codon substitution at each codon position. Barcode diversity is shown using an unclipped color scale, allowing visualization of the full range of barcode counts. Mutations covered by high barcode diversity (#barcodes ≥10 and ≥4) are represented by a blue and purple scale, respectively, while lower barcode diversity (#barcodes <4) is represented by a red scale. Gray squares represent WT amino acids. The numerical data underlying this graph is provided in S2 Data.

(PNG)

pbio.3003645.s007.png (426.4KB, png)
S8 Fig. Library quality control per NNK codon substitution after Golden Gate assembly of the full-length PDR1 sequence (43 fragments).

Heatmaps show barcode diversity for each possible NNK codon substitution at each codon position. Barcode diversity is shown using an unclipped color scale, allowing visualization of the full range of barcode counts, including low values in red. Mutations covered by high barcode diversity (#barcodes ≥10 and ≥4) are represented by a blue and purple scale, respectively, while lower barcode diversity (#barcodes <4) is represented by a red scale. Gray squares represent WT amino acids. The numerical data underlying this graph is provided in S3 Data.

(PNG)

pbio.3003645.s008.png (411.1KB, png)
S9 Fig. Distribution of barcode counts per amino‑acid mutation after (A) Gibson and (B) Golden Gate assemblies for Pdr1 F13 and F43.

Histograms show the number of unique barcodes associated with each amino‑acid substitution in fragments F13 (gray) and F43 (blue). For each fragment, the x‑axis indicates the number of barcodes linked to a given amino‑acid substitution, and the y‑axis shows the number of substitutions observed at each barcode count. The numerical data underlying these graphs is provided in S2 and S3 Data.

(TIF)

pbio.3003645.s009.tif (2.3MB, tif)
S10 Fig. Assessment of library uniformity after each cloning step (Gibson and Golden Gate assemblies) using the Gini coefficient.

Boxplots represent the distribution of Gini coefficients (ranging from 0 to 1) calculated for each fragment, where lower values indicate more uniform coverage and higher values indicate increased inequality in representation. The numerical data underlying this graph is provided in S4 Data.

(TIF)

pbio.3003645.s010.tif (202.6KB, tif)
S11 Fig. Assessment of library uniformity after each cloning step (Gibson and Golden Gate assemblies) using the uniformity score.

Boxplots show the distribution of uniformity scores for each fragment, defined as the log difference between the 90th and 10th percentiles of mutant read counts. Lower scores indicate more uniform representation, while higher scores indicate greater inequality. The numerical data underlying this graph is provided in S4 Data.

(TIF)

pbio.3003645.s011.tif (220.2KB, tif)
S12 Fig. Construction and validation of a S. cerevisiae strain to measure the impact of PDR1 mutations on drug resistance.

A) Inducible control of genomic PDR1 expression in Saccharomyces cerevisiae using a doxycycline-repressible promoter (TetO7). Created in BioRender. Barff, T. (2025) https://BioRender.com/zo8wlgo. B) Validation of genomic PDR1 repression in the ScPDR1-DOX strain. PDR1 expression level was assessed by flow cytometry using a GFP fusion as a reporter. Fluorescence intensity is shown for the strain expressing GFP (left panel: ScPDR1-GFP-DOX) and the negative control strain without GFP (right panel: ScPDR1-DOX) in media with (red) or without (blue) DOX. GFP intensity was analyzed on the main population of morphologically normal cells (FSC-H<15000 and FSC-A<15000). 5,000 events were recorded per replicate. Three independent biological replicates were analyzed per condition. The numerical data underlying this graph is provided in S8 Data. C) in ScPDR1-DOX by plasmid-expression of ScPDR1 (pRS31N-ScPDR1 WT) along with the control strain (pRS31N-empty). Growth conditions: DMSO (control) or Itraconazole (ITR) (antifungal).

(TIF)

pbio.3003645.s012.tif (4.6MB, tif)
S13 Fig. Correlation between the selection coefficients measured with direct sequencing of the mutated PDR1 F13 coding region and DNA-barcode inference, using all usable barcodes.

The strong correlation (Pearson r = 0.97) indicates that barcode-based estimates accurately recapitulate the directly measured fitness effects. The number of barcodes per mutation does not affect the correlation. The numerical data underlying this graph is provided in S5 and S6 Data.

(TIF)

pbio.3003645.s013.tif (828.1KB, tif)
S14 Fig. Correlation between the selection coefficients measured with direct sequencing of the mutated PDR1 F13 coding region and DNA-barcode inference, as a function of the number of barcodes per variant considered.

Curves correspond to variants with exactly 5 (n = 40), 10 (n = 25), or 15 (n = 17) barcodes. Means and confidence intervals were calculated from 100 random barcodes subsamplings per variant. The high correlation observed even for variants with only five barcodes indicates that the overall correlation is not driven only by highly diversified variants. The numerical data underlying this graph is provided in S9 Data.

(TIF)

pbio.3003645.s014.tif (707.2KB, tif)
S1 Table. oPool design.

(XLSX)

pbio.3003645.s015.xlsx (81KB, xlsx)
S2 Table. Key resources.

(XLSX)

pbio.3003645.s016.xlsx (17.3KB, xlsx)
S3 Table. Oligonucleotide sequences.

(XLSX)

pbio.3003645.s017.xlsx (17.4KB, xlsx)
S4 Table. PCR protocols.

(XLSX)

pbio.3003645.s018.xlsx (21.3KB, xlsx)
S5 Table. Gibson sequencing sample description.

(XLSX)

pbio.3003645.s019.xlsx (21.6KB, xlsx)
S6 Table. Golden Gate sequencing sample description.

(XLSX)

pbio.3003645.s020.xlsx (19.4KB, xlsx)
S7 Table. Comparison of library uniformity (TP0) between two DMS plasmid library construction strategies.

This table presents a quantitative comparison of the variant uniformity achieved by our scalable barcoded strategy PDR1 DMS TP0 F13 library and a reference DMS plasmid library CaERG11 previously generated in our laboratory [5]. The CaERG11 library was constructed using a highly precise, low-throughput method: DNA fragments synthesized by Twist Bioscience were cloned codon-by-codon into a yeast expression vector, and the individual codon libraries were then pooled together proportionally based on the number of mutants. This meticulous process allows to achieve a near-perfect uniformity of variant representation, which serves here as a gold-standard reference. The comparison indicates that the uniformity achieved by our barcoded strategy is certainly less efficient than the CaERG11 reference, but it is perfectly acceptable for robust DMS experiments (Gini coefficient around 0.52). Crucially, the data demonstrate that our method achieves this acceptable uniformity while remaining substantially more cost-effective and scalable. The metrics computed include the Gini coefficient (0 = perfect uniformity, 1 = maximum bias) and the Uniformity Score.

(XLSX)

pbio.3003645.s021.xlsx (72.6KB, xlsx)
S8 Table. Summary of coverage statistics for the two cloning steps for all libraries of the complete PDR1 gene.

(XLSX)

pbio.3003645.s022.xlsx (135.4KB, xlsx)
S9 Table. Estimated experimental timeline to construct the complete plasmid mutant PDR1 library.

(XLSX)

pbio.3003645.s023.xlsx (67KB, xlsx)
S10 Table. DMS sample description.

(XLSX)

pbio.3003645.s024.xlsx (23.8KB, xlsx)
S1 Data. Barcode diversity and mutation coverage analysis per transformants.

(XLSX)

pbio.3003645.s025.xlsx (167.9KB, xlsx)
S2 Data. Barcode diversity per after Gibson assembly.

(XLSX)

pbio.3003645.s026.xlsx (1.9MB, xlsx)
S3 Data. Barcode diversity per after Golden Gate assembly.

(XLSX)

pbio.3003645.s027.xlsx (1.3MB, xlsx)
S4 Data. Uniformity scores after Gibson and Golden Gate assemblies.

(XLSX)

pbio.3003645.s028.xlsx (11KB, xlsx)
S5 Data. Gyoza all scores based on PDR1 F13 region direct sequencing.

(XLSX)

pbio.3003645.s029.xlsx (663.9KB, xlsx)
S6 Data. Gyoza all scores based on DNA barcodes sequencing.

(XLSX)

pbio.3003645.s030.xlsx (2.1MB, xlsx)
S7 Data. Correlation of selection coefficients between PDR1 F13 direct sequencing and DNA barcodes sequencing from barcode subsampling.

(XLSX)

pbio.3003645.s031.xlsx (30.7KB, xlsx)
S8 Data. Individual flow cytometry event data used to generate the PDR1-DOX and PDR1-GFP-DOX graphs.

This table contains FSC, SSC, and GFP fluorescence measurements, including log10-transformed GFP values, for all events contributing to the summarized distributions shown in S12 Fig.

(XLSX)

pbio.3003645.s032.xlsx (1.2MB, xlsx)
S9 Data. Correlation of selection coefficients between PDR1 F13 direct sequencing and DNA barcodes sequencing from barcode subsampling for variants with 5, 10, or 15 barcodes.

(XLSX)

pbio.3003645.s033.xlsx (107.4KB, xlsx)

Acknowledgments

We thank Dan Evans-Yamamoto, Philippe Després, and other Landrylab members for discussions and feedback on the design of the cloning strategy.

Abbreviations

AMP

ampicillin

DMS

deep mutational scanning

DMSO

dimethyl sulfoxide

DOX

doxycycline

G418

geneticin

ITR

itraconazole

mEGFP

monomeric enhanced Green Fluorescent Protein

NAT

nourseothricin

POSA

posaconazole

SC

synthetic complete

WT

wild-type

bp

base pair

PDR1

Pleiotropic Drug Resistance 1

YFG

Your Favorite Gene

Data Availability

All data are available in the main text or in the Supporting information files. The numerical data underlying all figures are provided in the files S1 Data (corresponding to Fig 1B), S2 Data (corresponding to Figs 1D, S2, S4, S6, S8, and S9), S3 Data (corresponding to Figs 1C, S1, S3, S5, S7, and S9), S4 Data (corresponding to S10 and S11 Figs), S5 Data (corresponding to Figs 2B and S13), S6 Data (corresponding to Figs 2B and S13), S7 Data (corresponding to Fig 2C), S8 Data (corresponding to S12B Fig) and S9 Data (corresponding to S14 Fig). Sequencing data are available in the NCBI Sequence Read Archive (SRA) under BioProject PRJNA1269095. All code and processed data to reproduce the analyses, results, and figures are available at https://github.com/Landrylab/Jann_et_al_2025 and archived on Zenodo (DOI: https://doi.org/10.5281/zenodo.18342596).

Funding Statement

This work was supported by a Genome Québec and Genome Canada grant (6569), a Canadian Institutes of Health Research (CIHR) Foundation grant (387697), a CIHR project grant (202409PJT), and a Natural Sciences and Engineering Research Council of Canada (NSERC) CREATE grant (EvoFunPath) to CRL. JJ and RD are supported by postdoctoral fellowships from NSERC. JJ benefits also from FRQS postdoctoral fellowship (343898). CRL holds the Canada Research Chair in Cellular Systems and Synthetic Biology. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Fowler DM, Fields S. Deep mutational scanning: a new style of protein science. Nat Methods. 2014;11(8):801–7. doi: 10.1038/nmeth.3027 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Wei H, Li X. Deep mutational scanning: a versatile tool in systematically mapping genotypes to phenotypes. Front Genet. 2023;14:1087267. doi: 10.3389/fgene.2023.1087267 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Escobedo A, Voigt G, Faure AJ, Lehner B. Genetics, energetics, and allostery in proteins with randomized cores and surfaces. Science. 2025;389(6758):eadq3948. doi: 10.1126/science.adq3948 [DOI] [PubMed] [Google Scholar]
  • 4.Tabet DR, Coté AG, Lancaster MC, Weile J, Rayhan A, Fotiadou I. The functional landscape of coding variation in the familial hypercholesterolemia gene LDLR. Science. 2025. doi: 10.1126/science.ady7186 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Bédard C, Gagnon-Arsenault I, Boisvert J, Plante S, Dubé AK, Pageau A, et al. Most azole resistance mutations in the Candida albicans drug target confer cross-resistance without intrinsic fitness cost. Nat Microbiol. 2024;9(11):3025–40. doi: 10.1038/s41564-024-01819-2 [DOI] [PubMed] [Google Scholar]
  • 6.Kitzman JO, Starita LM, Lo RS, Fields S, Shendure J. Massively parallel single-amino-acid mutagenesis. Nat Methods. 2015;12(3):203–6, 4 p following 206. doi: 10.1038/nmeth.3223 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Wrenbeck EE, Klesmith JR, Stapleton JA, Adeniran A, Tyo KEJ, Whitehead TA. Plasmid-based one-pot saturation mutagenesis. Nat Methods. 2016;13(11):928–30. doi: 10.1038/nmeth.4029 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Stiffler MA, Hekstra DR, Ranganathan R. Evolvability as a function of purifying selection in TEM-1 β-lactamase. Cell. 2015;160(5):882–92. doi: 10.1016/j.cell.2015.01.035 [DOI] [PubMed] [Google Scholar]
  • 9.Fowler DM, Araya CL, Fleishman SJ, Kellogg EH, Stephany JJ, Baker D, et al. High-resolution mapping of protein sequence-function relationships. Nat Methods. 2010;7(9):741–6. doi: 10.1038/nmeth.1492 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Matreyek KA, Starita LM, Stephany JJ, Martin B, Chiasson MA, Gray VE, et al. Multiplex assessment of protein variant abundance by massively parallel sequencing. Nat Genet. 2018;50(6):874–82. doi: 10.1038/s41588-018-0122-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Findlay GM, Daza RM, Martin B, Zhang MD, Leith AP, Gasperini M, et al. Accurate classification of BRCA1 variants with saturation genome editing. Nature. 2018;562(7726):217–22. doi: 10.1038/s41586-018-0461-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Sadhu MJ, Bloom JS, Day L, Siegel JJ, Kosuri S, Kruglyak L. Highly parallel genome variant engineering with CRISPR-Cas9. Nat Genet. 2018;50(4):510–4. doi: 10.1038/s41588-018-0087-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Howard CJ, Abell NS, Osuna BA, Jones EM, Chan LY, Chan H, et al. High resolution deep mutational scanning of the melanocortin-4 receptor enables target characterization for drug discovery. eLife Sciences Publications; 2025. doi: 10.7554/elife.104725.2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Dadonaite B, Brown J, McMahon TE, Farrell AG, Figgins MD, Asarnow D, et al. Spike deep mutational scanning helps predict success of SARS-CoV-2 clades. Nature. 2024;631(8021):617–26. doi: 10.1038/s41586-024-07636-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Shepherdson JL, Granas DM, Li J, Shariff Z, Plassmeyer SP, Holehouse AS, et al. Mutational scanning of CRX classifies clinical variants and reveals biochemical properties of the transcriptional effector domain. Genome Res. 2024;34(10):1540–52. doi: 10.1101/gr.279415.124 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Després PC, Cisneros AF, Alexander EMM, Sonigara R, Gagné-Thivierge C, Dubé AK, et al. Asymmetrical dose responses shape the evolutionary trade-off between antifungal resistance and nutrient use. Nat Ecol Evol. 2022;6(10):1501–15. doi: 10.1038/s41559-022-01846-4 [DOI] [PubMed] [Google Scholar]
  • 17.Romanowicz KJ, Resnick C, Hinton SR, Plesa C. Exploring antibiotic resistance in diverse homologs of the dihydrofolate reductase protein family through broad mutational scanning. Sci Adv. 2025;11(33):eadw9178. doi: 10.1126/sciadv.adw9178 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Kinsler G, Schmidlin K, Newell D, Eder R, Apodaca S, Lam G, et al. Extreme sensitivity of fitness to environmental conditions: lessons from #1BigBatch. J Mol Evol. 2023;91(3):293–310. doi: 10.1007/s00239-023-10114-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Maes S, Deploey N, Peelman F, Eyckerman S. Deep mutational scanning of proteins in mammalian cells. Cell Rep Methods. 2023;3(11):100641. doi: 10.1016/j.crmeth.2023.100641 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Fowler DM, Adams DJ, Gloyn AL, Hahn WC, Marks DS, Muffley LA, et al. An Atlas of Variant Effects to understand the genome at nucleotide resolution. Genome Biol. 2023;24(1):147. doi: 10.1186/s13059-023-02986-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Wong ASL, Choi GCG, Cui CH, Pregernig G, Milani P, Adam M, et al. Multiplexed barcoded CRISPR-Cas9 screening enabled by CombiGEM. Proc Natl Acad Sci U S A. 2016;113(9):2544–9. doi: 10.1073/pnas.1517883113 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Maciá Valero A, Prins RC, de Vroet T, Billerbeck S. Combining oligo pools and golden gate cloning to create protein variant libraries or guide RNA libraries for CRISPR applications. Methods Mol Biol. 2025;2850:265–95. doi: 10.1007/978-1-0716-4220-7_15 [DOI] [PubMed] [Google Scholar]
  • 23.Kuiper BP, Prins RC, Billerbeck S. Oligo pools as an affordable source of synthetic DNA for cost-effective library construction in protein- and metabolic pathway engineering. Chembiochem. 2022;23(7):e202100507. doi: 10.1002/cbic.202100507 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Starr TN, Greaney AJ, Hilton SK, Ellis D, Crawford KHD, Dingens AS, et al. Deep mutational scanning of SARS-CoV-2 receptor binding domain reveals constraints on folding and ACE2 binding. Cell. 2020;182(5):1295-1310.e20. doi: 10.1016/j.cell.2020.08.012 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Weile J, Ferra G, Boyle G, Pendyala S, Amorosi C, Yeh C-L, et al. Pacybara: accurate long-read sequencing for barcoded mutagenized allelic libraries. Bioinformatics. 2024;40(4):btae182. doi: 10.1093/bioinformatics/btae182 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Jones EM, Lubock NB, Venkatakrishnan AJ, Wang J, Tseng AM, Paggi JM, et al. Structural and functional characterization of G protein-coupled receptors with deep mutational scanning. Elife. 2020;9:e54895. doi: 10.7554/eLife.54895 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Howard CJ, Abell NS, Osuna BA, Jones EM, Chan LY, Chan H, et al. High-resolution deep mutational scanning of the melanocortin-4 receptor enables target characterization for drug discovery. Elife. 2025;13:RP104725. doi: 10.7554/eLife.104725 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Mighell TL, Toledano I, Lehner B. SUNi mutagenesis: scalable and uniform nicking for efficient generation of variant libraries. PLoS One. 2023;18(7):e0288158. doi: 10.1371/journal.pone.0288158 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Macdonald CB, Nedrud D, Grimes PR, Trinidad D, Fraser JS, Coyote-Maestas W. DIMPLE: deep insertion, deletion, and missense mutation libraries for exploring protein variation in evolution, disease, and biology. Genome Biol. 2023;24(1):36. doi: 10.1186/s13059-023-02880-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Aubé S, Dubé AK, Landry CR. Adaptation is less accessible through mutations in promoters than in coding sequences when large effect sizes are needed. bioRxiv. 2025. doi: 10.1101/2025.09.09.675213 [DOI] [Google Scholar]
  • 31.Mateyko N, de Boer CG. Culture wars: empirically determining the best approach for plasmid library amplification. ACS Synth Biol. 2024;13(8):2328–34. doi: 10.1021/acssynbio.4c00377 [DOI] [PubMed] [Google Scholar]
  • 32.Fisher MC, Alastruey-Izquierdo A, Berman J, Bicanic T, Bignell EM, Bowyer P, et al. Tackling the emerging threat of antifungal resistance to human health. Nat Rev Microbiol. 2022;20(9):557–71. doi: 10.1038/s41579-022-00720-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Berman J, Krysan DJ. Drug resistance and tolerance in fungi. Nat Rev Microbiol. 2020;18(6):319–31. doi: 10.1038/s41579-019-0322-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Langner S, Staber PB, Neumeister P. Posaconazole in the management of refractory invasive fungal infections. Ther Clin Risk Manag. 2008;4(4):747–58. doi: 10.2147/tcrm.s3329 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Boutin C-A, Luong M-L. Update on therapeutic approaches for invasive fungal infections in adults. Ther Adv Infect Dis. 2024;11:20499361231224980. doi: 10.1177/20499361231224980 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Maquera-Afaray J, Luna-Vilchez M, Salazar-Mesones B, Portillo-Alvarez D, Uribe-Ramirez L, Taipe-Sedano G, et al. Antifungal prophylaxis with posaconazole in immunocompromised children younger than 13 years. J Pediatr Pharmacol Ther. 2022;27(1):57–62. doi: 10.5863/1551-6776-27.1.57 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Pardiwala A, Datarkar AN, Manekar V, Daware S. Posaconazole as single or adjunctive antifungal in mucormycosis of the maxilla: systematic review and meta-analysis. J Oral Med Oral Surg. 2023;29(4):41. doi: 10.1051/mbcb/2023046 [DOI] [Google Scholar]
  • 38.Bédard C, Pageau A, Fijarczyk A, Mendoza-Salido D, Alcañiz AJ, Després PC, et al. FungAMR: a comprehensive database for investigating fungal mutations associated with antimicrobial resistance. Nat Microbiol. 2025;10(9):2338–52. doi: 10.1038/s41564-025-02084-7 [DOI] [PubMed] [Google Scholar]
  • 39.Gale AN, Pavesic MW, Nickels TJ, Xu Z, Cormack BP, Cunningham KW. Redefining pleiotropic drug resistance in a pathogenic yeast: Pdr1 functions as a sensor of cellular stresses in Candida glabrata. mSphere. 2023;8(4):e0025423. doi: 10.1128/msphere.00254-23 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Ferrari S, Ischer F, Calabrese D, Posteraro B, Sanguinetti M, Fadda G, et al. Gain of function mutations in CgPDR1 of Candida glabrata not only mediate antifungal resistance but also enhance virulence. PLoS Pathog. 2009;5(1):e1000268. doi: 10.1371/journal.ppat.1000268 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Mizoguchi H, Watanabe M, Nishimura A. Characterization of a PDR1 mutant allele from a clotrimazole-resistant sake yeast mutant with improved fermentative activity. J Biosci Bioeng. 1999;88(1):20–5. doi: 10.1016/s1389-1723(99)80169-8 [DOI] [PubMed] [Google Scholar]
  • 42.Carvajal E, van den Hazel HB, Cybularz-Kolaczkowska A, Balzi E, Goffeau A. Molecular and phenotypic characterization of yeast PDR1 mutants that show hyperactive transcription of various ABC multidrug transporter genes. Mol Gen Genet. 1997;256(4):406–15. doi: 10.1007/s004380050584 [DOI] [PubMed] [Google Scholar]
  • 43.Bautista C, Gagnon-Arsenault I, Utrobina M, Fijarczyk A, Bendixsen DP, Stelkens R, et al. Hybrid adaptation is hampered by Haldane’s sieve. Nat Commun. 2024;15(1):10319. doi: 10.1038/s41467-024-54105-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Conway TP, Vu BG, Beattie SR, Krysan DJ, Moye-Rowley WS. Similarities and distinctions in the activation of the Candida glabrata Pdr1 regulatory pathway by azole and non-azole drugs. bioRxiv. 2024;:2024.09.19.613905. doi: 10.1101/2024.09.19.613905 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Cisneros AF, Gagnon-Arsenault I, Dubé AK, Després PC, Kumar P, Lafontaine K, et al. Epistasis between promoter activity and coding mutations shapes gene evolvability. Sci Adv. 2023;9(5):eadd9109. doi: 10.1126/sciadv.add9109 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Dibyachintan S, Dubé AK, Bradley D, Lemieux P, Dionne U, Landry CR. Cryptic genetic variation shapes the fate of gene duplicates in a protein interaction network. Nat Commun. 2025;16(1):1530. doi: 10.1038/s41467-025-56597-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Domingo J, Diss G, Lehner B. Pairwise and higher-order genetic interactions during the evolution of a tRNA. Nature. 2018;558(7708):117–21. doi: 10.1038/s41586-018-0170-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Weile J, Sun S, Cote AG, Knapp J, Verby M, Mellor JC, et al. A framework for exhaustively mapping functional missense variants. Mol Syst Biol. 2017;13(12):957. doi: 10.15252/msb.20177908 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Hughes TR, Marton MJ, Jones AR, Roberts CJ, Stoughton R, Armour CD, et al. Functional discovery via a compendium of expression profiles. Cell. 2000;102(1):109–26. doi: 10.1016/s0092-8674(00)00015-5 [DOI] [PubMed] [Google Scholar]
  • 50.New England Biolabs. High Efficiency Transformation Protocol for 96-tube format (C2987U). [cited 9 Nov 2025]. Available from: https://www.neb.com/en-ca/protocols/2015/11/24/high-efficiency-transformation-protocol-for-96-tube-format-c2987u
  • 51.Gibson DG, Young L, Chuang R-Y, Venter JC, Hutchison CA 3rd, Smith HO. Enzymatic assembly of DNA molecules up to several hundred kilobases. Nat Methods. 2009;6(5):343–5. doi: 10.1038/nmeth.1318 [DOI] [PubMed] [Google Scholar]
  • 52.Mnaimneh S, Davierwala AP, Haynes J, Moffat J, Peng W-T, Zhang W, et al. Exploration of essential gene functions via titratable promoter alleles. Cell. 2004;118(1):31–44. doi: 10.1016/j.cell.2004.06.013 [DOI] [PubMed] [Google Scholar]
  • 53.Gietz RD, Schiestl RH. High-efficiency yeast transformation using the LiAc/SS carrier DNA/PEG method. Nat Protoc. 2007;2(1):31–4. doi: 10.1038/nprot.2007.13 [DOI] [PubMed] [Google Scholar]
  • 54.Durand R, Pageau A, Landry CR. gyōza: a Snakemake workflow for modular analysis of deep-mutational scanning data. Genetics. 2026;232(1):iyaf199. doi: 10.1093/genetics/iyaf199 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Abramson J, Adler J, Dunger J, Evans R, Green T, Pritzel A, et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature. 2024;630(8016):493–500. doi: 10.1038/s41586-024-07487-w [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Meng EC, Goddard TD, Pettersen EF, Couch GS, Pearson ZJ, Morris JH, et al. UCSF ChimeraX: tools for structure building and analysis. Protein Sci. 2023;32(11):e4792. doi: 10.1002/pro.4792 [DOI] [PMC free article] [PubMed] [Google Scholar]

Decision Letter 0

Richard Hodge

20 Jun 2025

Dear Dr Landry,

Thank you for submitting your manuscript entitled "Making deep mutational scanning accessible: a cost-efficient approach to construct barcoded libraries for genes of any length" for consideration as a Methods and Resources article by PLOS Biology. Please accept my apologies for the delay in getting back to you, as we consulted with an academic editor about your submission.

Your manuscript has now been evaluated by the PLOS Biology editorial staff, as well as by an academic editor with relevant expertise, and I am writing to let you know that we would like to send your submission out for external peer review.

However, before we can send your manuscript to reviewers, we need you to complete your submission by providing the metadata that is required for full assessment. To this end, please login to Editorial Manager where you will find the paper in the 'Submissions Needing Revisions' folder on your homepage. Please click 'Revise Submission' from the Action Links and complete all additional questions in the submission questionnaire.

Once your full submission is complete, your paper will undergo a series of checks in preparation for peer review. After your manuscript has passed the checks it will be sent out for review. To provide the metadata for your submission, please Login to Editorial Manager (https://www.editorialmanager.com/pbiology) within two working days, i.e. by Jun 22 2025 11:59PM.

If your manuscript has been previously peer-reviewed at another journal, PLOS Biology is willing to work with those reviews in order to avoid re-starting the process. Submission of the previous reviews is entirely optional and our ability to use them effectively will depend on the willingness of the previous journal to confirm the content of the reports and share the reviewer identities. Please note that we reserve the right to invite additional reviewers if we consider that additional/independent reviewers are needed, although we aim to avoid this as far as possible. In our experience, working with previous reviews does save time.

If you would like us to consider previous reviewer reports, please edit your cover letter to let us know and include the name of the journal where the work was previously considered and the manuscript ID it was given. In addition, please upload a response to the reviews as a 'Prior Peer Review' file type, which should include the reports in full and a point-by-point reply detailing how you have or plan to address the reviewers' concerns.

During the process of completing your manuscript submission, you will be invited to opt-in to posting your pre-review manuscript as a bioRxiv preprint. Visit http://journals.plos.org/plosbiology/s/preprints for full details. If you consent to posting your current manuscript as a preprint, please upload a single Preprint PDF.

Feel free to email us at plosbiology@plos.org if you have any queries relating to your submission.

Kind regards,

Richard

Richard Hodge, PhD

Senior Editor, PLOS Biology

rhodge@plos.org

PLOS

Empowering researchers to transform science

Carlyle House, Carlyle Road, Cambridge, CB4 3DN, United Kingdom

California (U.S.) corporation #C2354500, based in San Francisco

Decision Letter 1

Richard Hodge

24 Jul 2025

Dear Dr Landry,

Thank you for your patience while your manuscript "Making deep mutational scanning accessible: a cost-efficient approach to construct barcoded libraries for genes of any length" was peer-reviewed at PLOS Biology as a Methods and Resources Article. Please accept my sincere apologies for the delays that you have experienced during the peer review process. Your manuscript has now been evaluated by the PLOS Biology editors, an Academic Editor with relevant expertise, and by four independent reviewers.

In light of the reviews, which you will find at the end of this email, we would like to invite you to revise the work to thoroughly address the reviewers' reports.

As you will see, the reviewers are generally positive about your library construction approach and think it will be useful for the DMS field. Reviewer’s #1 and #3 raise some overlapping concerns with the claim that the method can be applied to whole genes and Reviewer #3 asks that a direct demonstration of its application to sizable sequence length is provided, as well as quantifying variant representation biases. In addition, Reviewer #1 notes that a more thorough evaluation and comparison to the current literature should be provided to contextualize the methodological advance. Finally, Reviewer’s #2 and #4 are very positive and provide a few minor comments to strengthen the reporting and presentation.

Given the extent of revision needed, we cannot make a decision about publication until we have seen the revised manuscript and your response to the reviewers' comments. Your revised manuscript is likely to be sent for further evaluation by all or a subset of the reviewers.

In addition to these revisions, you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests shortly.

We expect to receive your revised manuscript within 3 months. Please email us (plosbiology@plos.org) if you have any questions or concerns, or would like to request an extension.

At this stage, your manuscript remains formally under active consideration at our journal; please notify us by email if you do not intend to submit a revision so that we may withdraw it.

**IMPORTANT - SUBMITTING YOUR REVISION**

Your revisions should address the specific points made by each reviewer. Please submit the following files along with your revised manuscript:

1. A 'Response to Reviewers' file - this should detail your responses to the editorial requests, present a point-by-point response to all of the reviewers' comments, and indicate the changes made to the manuscript.

*NOTE: In your point-by-point response to the reviewers, please provide the full context of each review. Do not selectively quote paragraphs or sentences to reply to. The entire set of reviewer comments should be present in full and each specific point should be responded to individually, point by point.

You should also cite any additional relevant literature that has been published since the original submission and mention any additional citations in your response.

2. In addition to a clean copy of the manuscript, please also upload a 'track-changes' version of your manuscript that specifies the edits made. This should be uploaded as a "Revised Article with Changes Highlighted" file type.

*Re-submission Checklist*

When you are ready to resubmit your revised manuscript, please refer to this re-submission checklist: https://plos.io/Biology_Checklist

To submit a revised version of your manuscript, please go to https://www.editorialmanager.com/pbiology/ and log in as an Author. Click the link labelled 'Submissions Needing Revision' where you will find your submission record.

Please make sure to read the following important policies and guidelines while preparing your revision:

*Published Peer Review*

Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. Please see here for more details:

https://blogs.plos.org/plos/2019/05/plos-journals-now-open-for-published-peer-review/

*PLOS Data Policy*

Please note that as a condition of publication PLOS' data policy (http://journals.plos.org/plosbiology/s/data-availability) requires that you make available all data used to draw the conclusions arrived at in your manuscript. If you have not already done so, you must include any data used in your manuscript either in appropriate repositories, within the body of the manuscript, or as supporting information (N.B. this includes any numerical values that were used to generate graphs, histograms etc.). For an example see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5

*Blot and Gel Data Policy*

We require the original, uncropped and minimally adjusted images supporting all blot and gel results reported in an article's figures or Supporting Information files. We will require these files before a manuscript can be accepted so please prepare them now, if you have not already uploaded them. Please carefully read our guidelines for how to prepare and upload this data: https://journals.plos.org/plosbiology/s/figures#loc-blot-and-gel-reporting-requirements

*Protocols deposition*

To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

Thank you again for your submission to our journal. We hope that our editorial process has been constructive thus far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Best regards,

Richard

Richard Hodge, PhD

Senior Editor, PLOS Biology

rhodge@plos.org

------------------------------------

REVIEWS:

Reviewer #1: Jann et al. present a method for constructing saturation mutagenesis libraries coupled to nucleic acid barcodes. They demonstrate the efficacy of this method by performing construction of two barcoded NNK tiles (one 75-nt, one 54-nt - (Tile 43, Supporting Tables)), and performing a deep mutational scanning experiment on one 75-nt tile of the PDR1 sequence. This motivation behind this method is the need for cost-effective deep mutational experiments of full-length genes (for example, the coding sequence for PDR1 is 3204 bp). Two major strategies exist - one can either mutate each tile separately and perform separate selections and then sequencing ('tile-seq'). Alternatively, one can couple a nucleic acid barcode to a specific variant of interest and sequence the barcode (for an early example see Sarkisyan, Karen S., et al. "Local fitness landscape of the green fluorescent protein." Nature 533.7603 (2016): 397-401.). The Sarkisyan paper used a molecular biology 'hack' to connect the barcode to a variant using short read sequencing; there are other - conceptually similar - short read coupling strategies as well (Petersen …Whitehead Nat Comms 2024). Alternatively, there are methods to pre-assign a barcode to a gene variant, avoiding barcode-variant haplotyping altogether (Ranganathan Nat Comms 2019). More commonly, the barcode is coupled to the variant of interest using long read sequencing by PacBio or Oxford Nanopore. The disadvantage of the long read strategy is the sequencing cost and throughput. In the author's strategy, they use a two-step method where a barcode is covalently fused to a 75-nt gene tile. In step 1, a Gibson assembly is performed. In step 2, the full plasmid is reconstituted via Golden Gate assembly using the 3' portion of the remaining gene. The advantage of the Jann method is resolving the barcode to the gene variant after step 1 via short read sequencing. The major disadvantage of their method is in the need for n separate long-length PCRs to generate the linearized DNA for Gibson assembly in step 1 (here n is the number of tiles). The paper flows well and has well-described and executed experiments. However - as described below - it is unclear on reading whether this method represents an improvement over current methods for scanning long full-length genes. A more thorough evaluation and comparison to the current literature, as well as additional analysis would improve the quality of this contribution.

Major concerns

1. The method is specifically designed to address mutagenesis of very large genes. However, the paper does not appreciably contextualizing similarities/differences between this method and previous methods, and the introduction does not place the present work with the current literature on technical improvements for large genes. For example, it appears that the method presented here is very similar to the method employed by Sri Kosuri et al in two papers, one of which is cited by the authors of the present manuscript: [Jones et al., 2020, eLife] and [Howard et al., 2025, eLife]. The authors should specify if indeed the method is the same, or if not, how it differs. What are the distinguishing features? What are the strengths of this short read haplotyping strategy vs. other approaches that exist in the literature? Apt comparisons would be with nicking mutagenesis (Mighell et al., 2023, PLoS One) or DIMPLE (Macdonald et al., 2023, Genome Biology). These methods are much more widely used, can incorporate NNK encoded nucleotides, and are alternatives for generating the library when coupled with barcodes and long read haplotyping.

2. The authors list 3 parameters key for satmut libraries: informative barcodes, mutation coverage, and barcode diversity. However, one parameter is missing from this list: uniformity of variants. This is important for limiting sequencing costs: two libraries with the same percent of informative barcodes, mutation coverage, and barcode diversity could have very different uniformity, leading to more sequencing reads required to assess the less uniform one. Metrics exist for evaluating uniformity in tiled gene libraries (e.g., Mighell et al., 2023, PLoS One), and would encourage an analysis of tile 13 and 43 libraries using this or similar metrics.

3. My suggestion is to remove the cost comparison or to greatly strengthen the rigor behind the claims (Supp Fig 1; Table S1). First, the synthesis argument obscures the relevant difference between sat mutagenesis & barcoded oPools. No one outside of a handful of academic labs and in industry would consider gene synthesis at scale like this. Second, the cost comparison presented here omits time and labor requirements of the method, which appear to be quite substantial. Specifically, the example gene PDR1 requires 43 sub-libraries to be constructed, each of which requires long-range PCR amplification of the original wild-type vector. These PCRs can be very tricky and often require multiple rounds of optimization of conditions. 43 of these reactions is quite a molecular biology load, and would scale unfavorably as you increase the length of YFG. Third, the claims behind the numbers are not well sourced and cannot be independently verified. In my opinion, the quality of the paper does not rest on an economic argument. In such cases, it's best to remove a cost comparison.

4. What is claimed in the paper (a method for size independent barcoded sat mut library generation) vs. what is demonstrated (barcoded libraries for two short tiles) is not reflected in the title, abstract, or introduction. Accurate description of what is demonstrated in the method, along with a discussion of strengths and limitations (n PCRs, n synthesized gene fragments of YFG, n parallel Gibson Assemblies, n parallel Golden Gate assemblies) in the discussion would improve the readability of the manuscript. Specifically, the authors did not demonstrate the method on multiple genes; did not create a barcoded full length saturation NNK library of their gene of interest; and did not perform a deep mutational scan on the full-length gene. While I don't think these experiments are necessary for publication of the method, they are necessary if the abstract and title remain unchanged.

Smaller comments

5. Figure 1C and 1D: it appears that the number of barcodes per variant is clipped at 10. This should either be mentioned explicitly, or better the raw values should be shown (this is better because the reader would get a better sense of the uniformity of the generated libraries).

6. Figure 2C: when evaluating the effect of number of barcodes on replicate correlation; was the number of short-read sequencing reads accounted for? Variants with higher number of barcodes will also tend to be more abundant in the library and their fitness will be better estimated simply due to increase short read coverage.

7. Figure 2 - what is the internal reproducibility of the assay (barcode vs. barcode or tile vs. tile)? And, how does this compare with the Figure 2 graph shown comparing barcode vs. tile?

8. The barcoding scheme uses position specific sequences along with degenerate sequences to tag each variant at a given position. Do the authors have any evidence that this scheme leads to higher fidelity variant-barcode associations compared with fully random barcodes?

Reviewer #2 (Olivier Tenaillon, signs review): In this manuscript by Jessica Jann entitled "Making deep mutational scanning accessible: a cost-efficient approach to construct barcoded libraries for genes of any length" propose a cost efficient method to do DMS in long genes with an coupling with Barcode.

The method is astute and straight forward and propose an intermediate coupling with barcode, which is truly relevant, as altenrative methods require long reads or emulsion PCR. Here the match between barcodes and sequence is extremely good.

I think the method will be useful for the community, including our lab.

I have just very minor comments:

twice it is mentioned:

-"multiple reactions and transformations were performed for each fragment" to reach the good numbers... It would be good to give a few examples of what multiple typically means... 3, 6, 20?

-The association from barcode to sequence is in theory imposed in the design. Here, a careful step of sequencing is done to validate it. Could it be possible to know the amount of differences between the initial coupling and the one observed. In other words, what would be the error rate if this "association sequencing" was not done and we jsut trusted the oPools sequences.

Overall it is a nice experimental protocol, that will be valuable for the community. Some, may be cheaper; alternative may exist but they do not include the coupling to barcodes, and end up having very large fraction of wild type sequences, which in the end make them much less cost competitive.

Olivier Tenaillon, I always sign my reviews.

Reviewer #3 (Guillaume Cambray, identifies himself): The manuscript "Making deep mutational scanning accessible: a cost-efficient approach to construct barcoded libraries for genes of any length" by Jann et al. describe an ingenious method to generate high-coverage variants libraries from relatively small pool of synthetic oligonucleotides. The authors present a proof-of-concept application, screening variant of yeast's PDR1 protein for increased antifungal resistance.

The method consists in using pools of degenerate oligonucleotides associating NNK random variations at each codon with partially random barcodes to generate trackable diversity through subsequent barcode-to-mutation mapping, and barcode frequency analysis upon functional screening. This enables substantial economy of scale in term of the number of oligonucleotides to synthetize. The authors further describe a two-step cloning strategy that enable targeting sequences that exceed the size of available oligonucleotide pools. Combined, these two aspects provide a convincing method to cost-efficiently generate large libraries of mutants. Although the authors focus on protein coding sequences, variations of the approach could be extensible to any kind of sequences of interest.

The method is conceptually sound and its efficiency demonstrated experimentally for two small and separate regions of PDR1. In my opinion, the manuscript nevertheless fails to demonstrate the applicability of the method to "genes of any length", as claimed in the title and repeatedly in the text. In effect, the manuscript only provides data for one sub-library covering 75 nucleotides of a 3,204-nucleotide gene (and in a more limited manner on another sub-library of 54 nucleotides).

While the core concept is appropriately demonstrated and it is implicitly stated that it can be easily scaled up, I do not think it does. I happen to be working with a similar strategy for constructing variants of a 6 kb genome—which yielded 45 sub-libraries—and I know for a fact that mixing these sub-libraries into a full library to perform pervasive screens leads to all sort of unforeseen issues. I therefore recommend that the manuscript demonstrate application to a sizable sequence length to substantiate one of the main claims of the paper. This might not need to be on the full extent of the pdr1 gene, but at least on a substantial length that exceed pure synthesis capacity (say 1kb).

I understand and respect that the authors might want to valorize their perhaps more extensive work with several publications, with the present manuscript being more method-oriented. I seem to understand that the authors chose to focus on a well-known region where mutants are already known and want to keep extensive descriptions of screening results on other regions for another functionally-oriented paper. There might be ways to present these data without focusing on (or even disclosing) their biological meaning, just showing that many sub-libraries can be constructed, pooled sufficiently uniformly to be assayed together and quality functional data be obtained over large sequences.

This being a method paper, I would also recommend to strengthen the precision of the materials and methods section.

I have listed below a number of additional comments that may help to improve the manuscript:

Figure 1:

Panel A: some aspect of the construction scheme could be depicted with more precision, e.g. including elements that are brought by oligonucleotides during PCR on the oligos. Some elements are also at odd with the description on the metod section (see details below)

Panel B: Reading the main text, it was particularly unclear to me where the data shown here came from. I seem to understand from the previous section that these are derived by cumulating sequencing sequencing data obtained separately on independent small-scale transformations. This should be stated clearly in the text. Points should be added to the graphs in addition to lines, since it is difficult to tell where the straight lines break.

The denomination ""% barcode diversity" was somehow difficult to grasp. I seems to understand that this quantity represents the percentage of sequence associated with at least 4 and at least 9 informative barcodes? Maybe something like #barcode>4 and #barcode>9 could be clearer (this also apply to panel CD).

l102: "Means and confidence intervals obtained for 100 random draws of transformation reactions": I am not sure I understand what this means exactly.

Panel CD: the thremomter should show discretized colors and the color scheme should make it easier to tell exactly how many barcodes are identified in each position. Data on the rightmost panel would be better shown as a barplot.

Figure 2:

Panel B: I understand that a heatmap conveys more information per se, but comparing data from direct and barcode-based mutant identification would be better conveyed with a scatter plot of the one vs the other. Points could be additionally colored by the number of barcodes from which the values are estimated for the barcode-base identification. The heatmap for direct sequencing is not necessary and could either be removed or kept is space allows.

More generally, an explanation of the few observed discrepancies between the two approaches could be discussed.

The NNK strategy can yield one stop codon at each position. The heatmap from direct sequencing show occasional positive selection for these, while that from direct identification show mildly negative values. I would expect highly negative selection for these mutations and think that the observed data deserve some explanation: is it expected that truncated protein remain functional? Is it due to leakage from the endogenous gene that is normally dominated by the plasmid-borne variants? Is it indicative of a lack of measurement precision?

Panel C: l204-205: "Means and confidence intervals were calculated from 100 random subsamplings per variant." I do not see error bars here. Also, it is unclear how these data are generated: are variants with more or less than N barcodes left out from the correlation? If so, the correlation obtained by using all usable data should be communicated.

l148: "The performance of a DMS plasmid library depends on three key parameters": I do agree with all these parameters but would argue that a fourth parameter quantifying representation bias between variants is also key. Such bias can come from synthesis, construction, subsequent growth, etc… Significant distribution biases hinder cost-effectivess as much sequencing throughput is wasted on the most abundant to get data for the least abundant variants. A metric should be chosen to quantify this.

l218-219: "estimates from the barcodes or from the F13 coding region were strongly correlated (r = 0.95, 10 barcodes)": it would be better to highlight the correlation when all available barcodes are used.

l342-343: "Since oPools are synthesized in the 5' to 3' direction, this initial step minimizes the incorporation of truncated oligos (linked to oPool incomplete synthesis) into the final product": this is factually wrong, the phosphoramidite chemistry used for all chemical oligonucleotide synthesis actually proceeds in the 3' to 5' direction. This first PCR and associated purification seems rather useless.

l348-350: "This approach improved the overall quality of the oPools, ensuring sufficient insert quantity and diversity for subsequent steps of DMS library construction.": I don't think this claim is substantiated by data, please edit to something along the line of "This approach is intended to improve..."

l357-359: "The resulting PCR product was incubated for 1h30 at 37 °C with 10 U of DpnI enzyme to remove parental plasmid DNA and then, purified using magnetic beads.": In this description the linear PCR product is directly used for Gibson assembly. This is not what is suggested in Figure 1A, where it appear as a circular intermediate obtained from transformation (which would have been better to avoid PCR-related mutation in the backbone, including part of YFV). Likewise for the second cloning step, the missing fragment would have better been cloned along with flanking BsaI and sequence verified before used in golden-gate. I suggest the author add such recommendations to the method section as potential improvements.

l363-365: "Following transformations, 5 mL of 2YT medium was added directly onto each agar plate": this is too imprecise for a methodological paper. What transformation procedure is used (chemical, electroporation)? As I understand 2YT was added to the cell after the transformations. Were cell grown after that before plating. At what point are transformant counted?

l380-381: "To achieve this, multiple parallel Golden Gate reactions and transformations were performed for each fragment." This is surprising to me, a transformation efficiency of 10^5 is fairly easy to attain, especially with commercial competent cells...

l 389-… Plasmid library quality control by sequencing: Description of the the sequencing libraries preparation comes after description of sequencing read processing. It would be clearer to have a dedicated section for the various flavors of sequencing library preparation.

More broadly, amplicon sequencing is notoriously biased, with random molecules sometime getting way more amplified than other, which is a big problem for downstream quantitative analyses. A now common approach to mitigate this is to incorporate Unique Molecular Identifier during PCR. The authors have not done this here, but could discuss this as a weakness and suggested it as an improvement.

l446-....: "Variant library construction": most of this description is not matched by any main text. A few main text sentences describing that PDR1 is essential, has been placed under a dox-repressible promoter to enable turning its expression down and conditional complementation by a plasmid library would be welcome.

l535-536: "an initial PCR was carried out to amplify only the PDR1 from the plasmid": I understand that this is necessary for the direct sequencing of F13 variants, and it makes sense to use the same material for barcode library prep in the framework of comparing data derived from barcode sequencing and direct variant region sequencing. However, higher quality data on barcodes could be derived by directly targeting PBS_i5 and PBS_i7 priming site...

l563-...: "Selection coefficient and characterization of the variants". The authors have already established a pipeline to identify barcodes. As the very same pipeline can presumably be used to count barcodes, what exactly is the added value of using gyoza?

l577: "normalization by the number of mitotic generations": this needs to be explained.

l578: "subtraction of the median log2 fold-change of silent variant": likewise, this require additional explanation. What is the fold-change distribution of silent variant, where is the actual WT in this distribution? How does it compare with the distribution of variants?

l581-583: "Variants were assigned confidence scores based on their initial abundance at TP0 across replicates. A minimum threshold of five reads per mutation was used to define high-confidence variants": Does this mean that all replicates must have at least 5 reads? As far as I can tell, there has been no mention of experimental replicates until this point: what are they?

As much as the number of initial reads is important, this seems a little light to qualify as a confidence score... I guess the spread of values around barcode replicates and synonymous amino-acids could be used for that purpose, but neither is mentioned, despite means being considered (l584).

Finally, a potential weakness of the strategy could be discussed. In the construction scheme, each codon position is associated with a defined barcode core of 24 nts, while 6 barcode positions are degenerate. One can't really rule out a potential functional effect of the core barcode, which would then systematically impact all variants at that position. Are there ways to diagnose or mitigate such effects?

Reviewer #4: Systematic mutational scanning is a powerful technique that enables researchers to assess the functional impact of all possible point mutations in a gene within a single experiment, providing a comprehensive view of the relationship between protein sequence and function. However, generating libraries for these experiments is often prohibitively expensive and labor-intensive, with costs increasing proportionally to gene size.

The manuscript by Jann et al. introduces a cost-effective and accessible method construction of a barcoded library that addresses these challenges. Overall, the manuscript is clearly written and well-presented, and the authors' elegant approach to library design will be valuable for future DMS studies. I have just a few comments and questions that arose while reading the manuscript. Overall, I recommend this manuscript for publication in PLOS Biology.

1. It may be helpful to include a table summarizing coverage statistics for each of the two library fragments constructed in this study. This could include the number of transformants for the two cloning steps, sequencing reads, informative barcodes, mutation coverage, barcode diversity, and other relevant metrics.

2. While the oPool-based method is clearly more affordable than purchasing a commercial library, it appears potentially labor-intensive—particularly when synthesizing a large gene such as PDR1 in 43 separate fragments. Could the authors provide an estimate of how many days it would take to clone the entire PDR1 library using this strategy?

3. In Supplementary Figure 1A, the cost of cloning a commercially synthesized library is estimated at approximately $60 per amino acid position. I was curious how this number was calculated? Based on a rough back of the envelope estimate—assuming ~100,000 transformants per Gibson assembly reaction and thus approximately 10 reactions to cover a 300-aa barcoded library—the total cost in my mind would only be a few hundred dollars in total. Could the authors clarify the basis of the $60/aa cost?

Decision Letter 2

Richard Hodge

5 Jan 2026

Dear Christian,

Thank you for your continued patience while we considered your revised manuscript "Making deep mutational scanning accessible: a cost-efficient approach to construct barcoded libraries for genes of any length" for publication as a Methods and Resources Article at PLOS Biology. Please accept my sincere apologies for the delays that you have experienced during this round of the peer review process. This revised version of your manuscript has been evaluated by the PLOS Biology editors, the Academic Editor and two of the original reviewers. In addition, I have provided some specific remaining comments from the Academic Editor below the reviewer reports (labelled 'Comments from the Academic Editor').

Based on the reviews, I am pleased to say that we are likely to accept this manuscript for publication, provided you satisfactorily address the remaining points raised by Reviewer's #1 and #3. After discussions with the academic editor, we will not make the request to include additional validation for the screening of pooled sub-libraries essential for the revision, but we encourage you to include this data if you have it to hand.

In addition, please make sure to address the following data and other policy-related requests that I have provided below (A-G):

(A) We routinely suggest changes to titles to ensure maximum accessibility for a broad, non-specialist readership. In this case, we would suggest the following edit to the title. Please ensure you change both the manuscript file and the online submission system, as they need to match for final acceptance:

“A cost-effective and scalable barcoded library construction method for deep mutational scanning studies”

(B) We note that the Abstract is currently 71 words and we would be grateful if it could be expanded at this stage to around 150 words to fully contextualize and summarize the methodological developments offered by the study.

(C) You may be aware of the PLOS Data Policy, which requires that all data be made available without restriction: http://journals.plos.org/plosbiology/s/data-availability. For more information, please also see this editorial: http://dx.doi.org/10.1371/journal.pbio.1001797

Note that we do not require all raw data. Rather, we ask that all individual quantitative observations that underlie the data summarized in the figures and results of your paper be made available in one of the following forms:

-Supplementary files (e.g., excel). Please ensure that all data files are uploaded as 'Supporting Information' and are invariably referred to (in the manuscript, figure legends, and the Description field when uploading your files) using the following format verbatim: S1 Data, S2 Data, etc. Multiple panels of a single or even several figures can be included as multiple sheets in one excel file that is saved using exactly the following convention: S1_Data.xlsx (using an underscore).

-Deposition in a publicly available repository. Please also provide the accession code or a reviewer link so that we may view your data before publication.

Regardless of the method selected, please ensure that you provide the individual numerical values that underlie the summary data displayed in the following figure panels as they are essential for readers to assess your analysis and to reproduce it:

Figure 1B-D, 2B-C, S1, S2, S3, S4, S5B, S6, S9

NOTE: the numerical data provided should include all replicates AND the way in which the plotted mean and errors were derived (it should not present only the mean/average values).

(D) I appreciate that the data requested above may be contained within your Github deposition (https://github.com/Landrylab/Jann_et_al_2025). If so, I would be grateful if you could clearly label where the underlying data can be found in the deposition folders.

(E) Please note that we cannot accept sole deposition of code in GitHub, as this could be changed after publication. However, you can archive this version of your publicly available GitHub code to Zenodo. Once you do this, it will generate a DOI number, which you will need to provide in the Data Accessibility Statement (you are welcome to also provide the GitHub access information). See the process for doing this here: https://docs.github.com/en/repositories/archiving-a-github-repository/referencing-and-citing-content

(F) Please also ensure that each of the relevant figure legends in your manuscript include information on *WHERE THE UNDERLYING DATA CAN BE FOUND*, and ensure your supplemental data file/s has a legend.

(G) Please ensure that your Data Statement in the submission system accurately describes where your data can be found and is in final format, as it will be published as written there.

(H) Please note that per journal policy, the species studied should be clearly stated in the abstract of your manuscript.

------------------------------------------------------------------------

As you address these items, please take this last chance to review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the cover letter that accompanies your revised manuscript.

In addition to these revisions, you may need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests shortly. If you do not receive a separate email within a few days, please assume that checks have been completed, and no additional changes are required.

We expect to receive your revised manuscript within three weeks.

To submit your revision, please go to https://www.editorialmanager.com/pbiology/ and log in as an Author. Click the link labelled 'Submissions Needing Revision' to find your submission record. Your revised submission must include the following:

- a cover letter that should detail your responses to any editorial requests, if applicable, and whether changes have been made to the reference list

- a Response to Reviewers file that provides a detailed response to the reviewers' comments (if applicable, if not applicable please do not delete your existing 'Response to Reviewers' file.)

- a track-changes file indicating any changes that you have made to the manuscript.

NOTE: If Supporting Information files are included with your article, note that these are not copyedited and will be published as they are submitted. Please ensure that these files are legible and of high quality (at least 300 dpi) in an easily accessible file format. For this reason, please be aware that any references listed in an SI file will not be indexed. For more information, see our Supporting Information guidelines:

https://journals.plos.org/plosbiology/s/supporting-information

*Published Peer Review History*

Please note that you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. Please see here for more details:

https://plos.org/published-peer-review-history/

*Press*

Should you, your institution's press office or the journal office choose to press release your paper, please ensure you have opted out of Early Article Posting on the submission form. We ask that you notify us as soon as possible if you or your institution is planning to press release the article.

*Protocols deposition*

To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

Please do not hesitate to contact me should you have any questions.

Best regards,

Richard

Richard Hodge, PhD

Senior Editor, PLOS Biology

rhodge@plos.org

------------------------------------------------------------------------

Reviewer remarks:

Reviewer #1: The authors have addressed most of my comments; they have added new experimental data scanning the entire gene. One very small note - in Figure 2B the PDR1 name is clipped by a white opaque box in the updated figure; this can be handled in the proof.

Reviewer #3 (Guillaume Cambray, identifies himself): This revised version of the manuscript "Making deep mutational scanning accessible: a cost-efficient approach to construct barcoded libraries for genes of any length" by Jessica Jann et al. largely addresses most of my previous questions and concerns, as well as those raised by the other reviewers.

A major comment—also raised by another reviewer—concerned the manuscript's emphasis on the gene-scale applicability of the approach, while only two short regions were presented. The authors have now provided library construction data for all 43 regions of the gene (Sup. Figs. 7 and 8), which addresses this comment only in part.

The innovative aspect of this work lies in coupling NNK mutagenesis of a relatively short region (i.e., one that can be sequenced using widely available and cost-efficient short-read sequencing such as PE150) with a partially randomized barcode. As the authors point out, different barcodes attached to the same mutation can be viewed as internal replicates that enhance measurement accuracy and statistical robustness. Nevertheless, direct sequencing of the mutated region remains the gold standard, as used by the authors in Figure 2BC and Sup. Figs. 6 and 9.

A major advantage of the barcode strategy is that multiple sub-libraries can be pooled and assayed simultaneously rather than separately, effectively performing one large experiment instead of 43 independent ones. Beyond the obvious cost efficiency, this pooling also improves data consistency and reduces the need for normalization. In my view, a complete validation of the proposed approach should demonstrate that a pool of sub-libraries can indeed be successfully screened. Without pooling, barcoding brings little added value: a single cloning of an NNK-mutagenized oPool, followed by sequencing of the mutated region—as performed here and in other studies (Tile-Seq)—would be sufficient.

In their response, the authors suggest that because they have successfully pooled sub-libraries previously, there is no need to demonstrate the effectiveness of pooling here. By that logic, however, there would have been no need to perform a screening experiment on a single barcoded sub-library, since barcoding itself is also a previously established strategy.

I emphasize this point because, in a very similar experimental context, pooling has unexpectedly caused issues in our own hands—even though we had previously pooled complex libraries successfully. This might be specific to our system, but it would be unfortunate if this manuscript did not address pooling experimentally to fully substantiate the method. I am not asking for a full gene-scale pool, but rather a small number of pooled regions (preferably contiguous, to highlight continuity in measurements across regions). This seems feasible considering all sub-libraries have already been constructed. Again, the goal is not an extended functional analysis—which could be published separately—but a demonstration that substantiates the method's core claim.

Aside from this major point, I have a few remaining comments and suggestions, detailed below:

* Lines 113-114:

"Means and confidence intervals obtained for 100 draws of randomly picked transformants."

Despite the revision, this statement remains unclear. To my understanding, a transformant refers to a single clone. The description in the Methods section is clearer. I suggest rephrasing along the lines of: "100 ordered arrangements of independent transformation experiments with 5,000 and 7,500 clones, respectively, were randomly drawn to construct this graph."

* Figure 1B:

Unless I am missing something, I do not understand what the statistics would indicate when the number of transformants per base pair is 0. I believe there may be an error in the x-axis labeling.

* Lines 198-200:

"~350 transformants per base pair are needed to optimize sufficient mutation coverage, barcode diversity, and informative barcodes (Figure 1B)."

This seems to apply only to F13. In contrast, F43 suggests that ~150 transformants would suffice. This sizable difference deserves discussion. What accounts for the discrepancy? Were similar analyses performed for other regions? If so, do additional patterns emerge? I am also uncertain how many transformants per bp were finally used for the functional screening of Figure 2.

* Figures 1C-D:

I had not realized previously (though it now seems obvious) that the number of barcodes was clipped at 10. As Reviewer 1 suggested, the non-clipped versions are now shown in Sup. Figs. 1 and 2. In my opinion, these non-clipped figures should appear in the main text, as the clipped versions obscure important information. The observation that only a few mutations have fewer than 5 barcodes can instead be described in the text, with reference to the clipped data as supplementary information (or very low barcode counts could be color-coded differently in the main figure).

The new Sup. Figs. 1 and 2 make it clear that systematic biases exist in barcode coverage. Much of this can be attributed to the number of synonymous codons generated by NNK mutagenesis. For example, Leucine, Arginine, and Serine have more barcodes because they are encoded by three possible codons. Valine, Alanine, Glycine, and Proline each have two codons, yet Proline and Alanine appear underrepresented in both F13 and F43. Since Sup. Figs. 7 and 8 are also clipped at 10, it is difficult to assess whether this pattern generalizes more broadly. I have no personal experience with NNK mutagenesis—perhaps these biases are known—but they merit some discussion, as they represent a minor limitation compared with more expensive strategies in which amino acid substitutions are preprogrammed in the oligo sequences.

COMMENTS FROM THE ACADEMIC EDITOR

- The new Figure 2C is almost unreadable; at least the text size has to be increased.

- All code should be archived (e.g., on Zenodo) - the github links should be replaced with dois to the archived versions.

- In Figure 1 C and D it should say ">=10" in the heatmap legend.

- In the Supplement, I would have appreciated an actual distribution plot of the barcode numbers in addition to the heatmaps and the uniformity coefficients.

Decision Letter 3

Richard Hodge

26 Jan 2026

Dear Christian,

On behalf of my colleagues and the Academic Editor, Claudia Bank, I am pleased to say that we can accept your manuscript for publication, provided you address any remaining formatting and reporting issues. These will be detailed in an email you should receive within 2-3 business days from our colleagues in the journal operations team; no action is required from you until then. Please note that we will not be able to formally accept your manuscript and schedule it for publication until you have completed any requested changes.

Please take a minute to log into Editorial Manager at http://www.editorialmanager.com/pbiology/, click the "Update My Information" link at the top of the page, and update your user information to ensure an efficient production process.

PRESS

We frequently collaborate with press offices. If your institution or institutions have a press office, please notify them about your upcoming paper at this point, to enable them to help maximise its impact. If the press office is planning to promote your findings, we would be grateful if they could coordinate with biologypress@plos.org. If you have previously opted in to the early version process, we ask that you notify us immediately of any press plans so that we may opt out on your behalf.

We also ask that you take this opportunity to read our Embargo Policy regarding the discussion, promotion and media coverage of work that is yet to be published by PLOS. As your manuscript is not yet published, it is bound by the conditions of our Embargo Policy. Please be aware that this policy is in place both to ensure that any press coverage of your article is fully substantiated and to provide a direct link between such coverage and the published work. For full details of our Embargo Policy, please visit http://www.plos.org/about/media-inquiries/embargo-policy/.

Thank you again for choosing PLOS Biology for publication and supporting Open Access publishing. We look forward to publishing your study.

Best wishes,

Richard

Richard Hodge, PhD

Senior Editor, PLOS Biology

rhodge@plos.org

PLOS

Empowering researchers to transform science

Carlyle House, Carlyle Road, Cambridge, CB4 3DN, United Kingdom

California (U.S.) corporation #C2354500, based in San Francisco

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Fig. Barcode diversity per amino acid substitution after Gibson assembly.

    Heatmaps show the total number of unique barcodes for each possible amino acid substitution at each codon position in Pdr1 fragments F13 and F43. For each fragment, a total of 25,000 transformants were recovered and analyzed. Covered mutations are mutations represented by #barcodes >0, while barcode diversity represented by #barcodes >4 or >9 per mutation demonstrates an increasing number of replicates for a same mutation. Gray squares represent WT amino acids. In the heatmaps, the number of barcodes per mutation is clipped at 10. The numerical data underlying this graph is provided in S2 Data.

    (TIF)

    pbio.3003645.s001.tif (2.3MB, tif)
    S2 Fig. Barcode diversity per amino acid substitution after Golden Gate assembly.

    Heatmaps show the total number of unique barcodes for each possible amino acid substitution at each codon position in Pdr1 fragments F13 and F43. For each fragment, a total of 100,000 transformants were recovered and analyzed. Covered mutations are mutations represented by #barcodes >0, while barcode diversity represented by #barcodes >4 or >9 per mutation demonstrates an increasing number of replicates for a same mutation. Gray squares represent WT amino acids. In the heatmaps, the number of barcodes per mutation is clipped at 10. The numerical data underlying this graph is provided in S3 Data.

    (TIF)

    pbio.3003645.s002.tif (2.4MB, tif)
    S3 Fig. Barcode diversity per NNK codon substitution after Gibson assembly.

    Heatmaps show barcode diversity for each possible NNK codon substitution at each codon position in Pdr1 fragments F13 and F43. For each fragment, a total of 25,000 transformants were recovered and analyzed. Barcode diversity is shown using an unclipped color scale, allowing visualization of the full range of barcode counts. Mutations covered by high barcode diversity (#barcodes ≥10 and ≥4) are represented by a blue and purple scale, respectively, while lower barcode diversity (#barcodes <4) is represented by a red scale. Gray squares represent WT amino acids. The numerical data underlying this graph is provided in S2 Data.

    (TIF)

    pbio.3003645.s003.tif (2.9MB, tif)
    S4 Fig. Barcode diversity per NNK codon substitution after Golden Gate assembly.

    Heatmaps show barcode diversity for each possible NNK codon substitution at each codon position in Pdr1 fragments F13 and F43. For each fragment, a total of 100,000 transformants were recovered and analyzed. Barcode diversity is shown using an unclipped color scale, allowing visualization of the full range of barcode counts. Mutations covered by high barcode diversity (#barcodes ≥10 and ≥4) are represented by a blue and purple scale, respectively, while lower barcode diversity (#barcodes <4) is represented by a red scale. Gray squares represent WT amino acids. The numerical data underlying this graph is provided in S3 Data.

    (TIF)

    S5 Fig. Library quality control per amino acid substitution after Gibson assembly of the full-length PDR1 sequence (43 fragments).

    Heatmap displays the barcode diversity associated with each possible amino acid substitution at each codon position. Barcode diversity is shown using an unclipped color scale, allowing visualization of the full range of barcode counts. Mutations covered by high barcode diversity (#barcodes ≥10 and ≥4) are represented by a blue and purple scale, respectively, while lower barcode diversity (#barcodes <4) is represented by a red scale. Gray squares represent WT amino acids. The numerical data underlying this graph is provided in S2 Data.

    (PNG)

    pbio.3003645.s005.png (370.7KB, png)
    S6 Fig. Library quality control per amino acid substitution after Golden Gate assembly of the full-length PDR1 sequence (43 fragments).

    Heatmap displays the barcode diversity associated with each possible amino acid substitution at each codon position. Barcode diversity is shown using an unclipped color scale, allowing visualization of the full range of barcode counts, including low values in red. Mutations covered by high barcode diversity (#barcodes ≥10 and ≥4) are represented by a blue and purple scale, respectively, while lower barcode diversity (#barcodes <4) is represented by a red scale. Gray squares represent WT amino acids. The numerical data underlying this graph is provided in S3 Data.

    (PNG)

    pbio.3003645.s006.png (373.3KB, png)
    S7 Fig. Library quality control per NNK codon substitution after Gibson assembly of the full-length PDR1 sequence (43 fragments).

    Heatmaps show barcode diversity for each possible NNK codon substitution at each codon position. Barcode diversity is shown using an unclipped color scale, allowing visualization of the full range of barcode counts. Mutations covered by high barcode diversity (#barcodes ≥10 and ≥4) are represented by a blue and purple scale, respectively, while lower barcode diversity (#barcodes <4) is represented by a red scale. Gray squares represent WT amino acids. The numerical data underlying this graph is provided in S2 Data.

    (PNG)

    pbio.3003645.s007.png (426.4KB, png)
    S8 Fig. Library quality control per NNK codon substitution after Golden Gate assembly of the full-length PDR1 sequence (43 fragments).

    Heatmaps show barcode diversity for each possible NNK codon substitution at each codon position. Barcode diversity is shown using an unclipped color scale, allowing visualization of the full range of barcode counts, including low values in red. Mutations covered by high barcode diversity (#barcodes ≥10 and ≥4) are represented by a blue and purple scale, respectively, while lower barcode diversity (#barcodes <4) is represented by a red scale. Gray squares represent WT amino acids. The numerical data underlying this graph is provided in S3 Data.

    (PNG)

    pbio.3003645.s008.png (411.1KB, png)
    S9 Fig. Distribution of barcode counts per amino‑acid mutation after (A) Gibson and (B) Golden Gate assemblies for Pdr1 F13 and F43.

    Histograms show the number of unique barcodes associated with each amino‑acid substitution in fragments F13 (gray) and F43 (blue). For each fragment, the x‑axis indicates the number of barcodes linked to a given amino‑acid substitution, and the y‑axis shows the number of substitutions observed at each barcode count. The numerical data underlying these graphs is provided in S2 and S3 Data.

    (TIF)

    pbio.3003645.s009.tif (2.3MB, tif)
    S10 Fig. Assessment of library uniformity after each cloning step (Gibson and Golden Gate assemblies) using the Gini coefficient.

    Boxplots represent the distribution of Gini coefficients (ranging from 0 to 1) calculated for each fragment, where lower values indicate more uniform coverage and higher values indicate increased inequality in representation. The numerical data underlying this graph is provided in S4 Data.

    (TIF)

    pbio.3003645.s010.tif (202.6KB, tif)
    S11 Fig. Assessment of library uniformity after each cloning step (Gibson and Golden Gate assemblies) using the uniformity score.

    Boxplots show the distribution of uniformity scores for each fragment, defined as the log difference between the 90th and 10th percentiles of mutant read counts. Lower scores indicate more uniform representation, while higher scores indicate greater inequality. The numerical data underlying this graph is provided in S4 Data.

    (TIF)

    pbio.3003645.s011.tif (220.2KB, tif)
    S12 Fig. Construction and validation of a S. cerevisiae strain to measure the impact of PDR1 mutations on drug resistance.

    A) Inducible control of genomic PDR1 expression in Saccharomyces cerevisiae using a doxycycline-repressible promoter (TetO7). Created in BioRender. Barff, T. (2025) https://BioRender.com/zo8wlgo. B) Validation of genomic PDR1 repression in the ScPDR1-DOX strain. PDR1 expression level was assessed by flow cytometry using a GFP fusion as a reporter. Fluorescence intensity is shown for the strain expressing GFP (left panel: ScPDR1-GFP-DOX) and the negative control strain without GFP (right panel: ScPDR1-DOX) in media with (red) or without (blue) DOX. GFP intensity was analyzed on the main population of morphologically normal cells (FSC-H<15000 and FSC-A<15000). 5,000 events were recorded per replicate. Three independent biological replicates were analyzed per condition. The numerical data underlying this graph is provided in S8 Data. C) in ScPDR1-DOX by plasmid-expression of ScPDR1 (pRS31N-ScPDR1 WT) along with the control strain (pRS31N-empty). Growth conditions: DMSO (control) or Itraconazole (ITR) (antifungal).

    (TIF)

    pbio.3003645.s012.tif (4.6MB, tif)
    S13 Fig. Correlation between the selection coefficients measured with direct sequencing of the mutated PDR1 F13 coding region and DNA-barcode inference, using all usable barcodes.

    The strong correlation (Pearson r = 0.97) indicates that barcode-based estimates accurately recapitulate the directly measured fitness effects. The number of barcodes per mutation does not affect the correlation. The numerical data underlying this graph is provided in S5 and S6 Data.

    (TIF)

    pbio.3003645.s013.tif (828.1KB, tif)
    S14 Fig. Correlation between the selection coefficients measured with direct sequencing of the mutated PDR1 F13 coding region and DNA-barcode inference, as a function of the number of barcodes per variant considered.

    Curves correspond to variants with exactly 5 (n = 40), 10 (n = 25), or 15 (n = 17) barcodes. Means and confidence intervals were calculated from 100 random barcodes subsamplings per variant. The high correlation observed even for variants with only five barcodes indicates that the overall correlation is not driven only by highly diversified variants. The numerical data underlying this graph is provided in S9 Data.

    (TIF)

    pbio.3003645.s014.tif (707.2KB, tif)
    S1 Table. oPool design.

    (XLSX)

    pbio.3003645.s015.xlsx (81KB, xlsx)
    S2 Table. Key resources.

    (XLSX)

    pbio.3003645.s016.xlsx (17.3KB, xlsx)
    S3 Table. Oligonucleotide sequences.

    (XLSX)

    pbio.3003645.s017.xlsx (17.4KB, xlsx)
    S4 Table. PCR protocols.

    (XLSX)

    pbio.3003645.s018.xlsx (21.3KB, xlsx)
    S5 Table. Gibson sequencing sample description.

    (XLSX)

    pbio.3003645.s019.xlsx (21.6KB, xlsx)
    S6 Table. Golden Gate sequencing sample description.

    (XLSX)

    pbio.3003645.s020.xlsx (19.4KB, xlsx)
    S7 Table. Comparison of library uniformity (TP0) between two DMS plasmid library construction strategies.

    This table presents a quantitative comparison of the variant uniformity achieved by our scalable barcoded strategy PDR1 DMS TP0 F13 library and a reference DMS plasmid library CaERG11 previously generated in our laboratory [5]. The CaERG11 library was constructed using a highly precise, low-throughput method: DNA fragments synthesized by Twist Bioscience were cloned codon-by-codon into a yeast expression vector, and the individual codon libraries were then pooled together proportionally based on the number of mutants. This meticulous process allows to achieve a near-perfect uniformity of variant representation, which serves here as a gold-standard reference. The comparison indicates that the uniformity achieved by our barcoded strategy is certainly less efficient than the CaERG11 reference, but it is perfectly acceptable for robust DMS experiments (Gini coefficient around 0.52). Crucially, the data demonstrate that our method achieves this acceptable uniformity while remaining substantially more cost-effective and scalable. The metrics computed include the Gini coefficient (0 = perfect uniformity, 1 = maximum bias) and the Uniformity Score.

    (XLSX)

    pbio.3003645.s021.xlsx (72.6KB, xlsx)
    S8 Table. Summary of coverage statistics for the two cloning steps for all libraries of the complete PDR1 gene.

    (XLSX)

    pbio.3003645.s022.xlsx (135.4KB, xlsx)
    S9 Table. Estimated experimental timeline to construct the complete plasmid mutant PDR1 library.

    (XLSX)

    pbio.3003645.s023.xlsx (67KB, xlsx)
    S10 Table. DMS sample description.

    (XLSX)

    pbio.3003645.s024.xlsx (23.8KB, xlsx)
    S1 Data. Barcode diversity and mutation coverage analysis per transformants.

    (XLSX)

    pbio.3003645.s025.xlsx (167.9KB, xlsx)
    S2 Data. Barcode diversity per after Gibson assembly.

    (XLSX)

    pbio.3003645.s026.xlsx (1.9MB, xlsx)
    S3 Data. Barcode diversity per after Golden Gate assembly.

    (XLSX)

    pbio.3003645.s027.xlsx (1.3MB, xlsx)
    S4 Data. Uniformity scores after Gibson and Golden Gate assemblies.

    (XLSX)

    pbio.3003645.s028.xlsx (11KB, xlsx)
    S5 Data. Gyoza all scores based on PDR1 F13 region direct sequencing.

    (XLSX)

    pbio.3003645.s029.xlsx (663.9KB, xlsx)
    S6 Data. Gyoza all scores based on DNA barcodes sequencing.

    (XLSX)

    pbio.3003645.s030.xlsx (2.1MB, xlsx)
    S7 Data. Correlation of selection coefficients between PDR1 F13 direct sequencing and DNA barcodes sequencing from barcode subsampling.

    (XLSX)

    pbio.3003645.s031.xlsx (30.7KB, xlsx)
    S8 Data. Individual flow cytometry event data used to generate the PDR1-DOX and PDR1-GFP-DOX graphs.

    This table contains FSC, SSC, and GFP fluorescence measurements, including log10-transformed GFP values, for all events contributing to the summarized distributions shown in S12 Fig.

    (XLSX)

    pbio.3003645.s032.xlsx (1.2MB, xlsx)
    S9 Data. Correlation of selection coefficients between PDR1 F13 direct sequencing and DNA barcodes sequencing from barcode subsampling for variants with 5, 10, or 15 barcodes.

    (XLSX)

    pbio.3003645.s033.xlsx (107.4KB, xlsx)
    Attachment

    Submitted filename: Response to Reviewers.docx

    pbio.3003645.s036.docx (4.7MB, docx)
    Attachment

    Submitted filename: Response_Round#2_Jannetal_PLOS_Biology.docx

    pbio.3003645.s037.docx (4.4MB, docx)

    Data Availability Statement

    All data are available in the main text or in the Supporting information files. The numerical data underlying all figures are provided in the files S1 Data (corresponding to Fig 1B), S2 Data (corresponding to Figs 1D, S2, S4, S6, S8, and S9), S3 Data (corresponding to Figs 1C, S1, S3, S5, S7, and S9), S4 Data (corresponding to S10 and S11 Figs), S5 Data (corresponding to Figs 2B and S13), S6 Data (corresponding to Figs 2B and S13), S7 Data (corresponding to Fig 2C), S8 Data (corresponding to S12B Fig) and S9 Data (corresponding to S14 Fig). Sequencing data are available in the NCBI Sequence Read Archive (SRA) under BioProject PRJNA1269095. All code and processed data to reproduce the analyses, results, and figures are available at https://github.com/Landrylab/Jann_et_al_2025 and archived on Zenodo (DOI: https://doi.org/10.5281/zenodo.18342596).


    Articles from PLOS Biology are provided here courtesy of PLOS

    RESOURCES