Summary
A major challenge in genetics is to identify genetic variants driving natural phenotypic variation. However, current methods of genetic mapping have limited resolution. To address this gap, we developed a CRISPR-Cas9-based high-throughput genome editing approach that can introduce thousands of specific genetic variants in a single experiment. This enabled us to study the fitness consequences of 16,006 natural genetic variants in yeast. We identified 572 variants with significant fitness differences in glucose media; these are highly enriched in promoters, particularly in transcription factor binding sites, while only 19.2% affect amino acid sequences. Strikingly, nearby variants nearly always favor the same parent’s alleles, suggesting that lineage-specific selection is often driven by multiple clustered variants. In sum, our genome editing approach reveals the genetic architecture of fitness variation at single-base resolution, and could be adapted to measure the effects of genome-wide genetic variation in any screen for cell survival or cell-sortable markers.
Graphical Abstract

eTOC Blurb
A highly efficient Cas9-based method for high-throughput precise genome editing is developed and used to measure the fitness effects of thousands of natural genetic variants in yeast at single-base resolution.
Introduction
Traditionally, the phenotypic effects of natural genetic variation have been studied via quantitative trait locus (QTL) mapping (Mackay et al., 2009; Pickrell et al., 2010; Rockman and Kruglyak, 2006). In this approach, genetic crosses result in hundreds or even thousands of offspring, each with a distinct genotype that is measured at SNPs or other genetic variants spanning the genome. These genotypes are then correlated with a phenotype of interest to identify QTLs where genotype predicts phenotype, indicating a genomic region harboring one or more variants that affect the phenotype.
Although genetic mapping of phenotypic variation has been possible for over a century (Sturtevant, 1913), mapping resolution is limited by recombination events and few maps have been resolved to the level of causal genetic variants (Rockman, 2012). Genome-wide association studies (GWAS) offer improved mapping resolution, but still only rarely are able to unambiguously implicate specific causal variants (Schaid et al., 2018). As a result, we still do not know the variants, genes, and molecular mechanisms underlying phenotypic variation, even for most well-studied traits (Rockman, 2012).
Since pinpointing a causal variant requires laborious follow-up experiments, investigators usually prioritize which variants to test within a QTL. This has led to a well-known ascertainment bias: investigators prioritize QTLs with large effects for further study, and within these QTLs, they further prioritize missense variants within promising candidate genes (Rockman, 2012). Although each of these large-effect variants represents a fascinating example, these outliers may not be very informative for understanding the evolutionary process more generally, which is thought to be dominated by variants of much smaller effect (Pritchard et al., 2010). An unbiased method for systematically identifying causal variants is therefore required to fully understand the genes, molecular mechanisms, genetic architectures, and selection pressures underlying phenotypic variation.
In principle, QTL mapping—which can require years of effort, especially for slow-breeding species—could be circumvented if each individual genetic variant differing between the two parents could be introduced into the genome of a single reference strain and scored for the phenotype of interest. Indeed, CRISPR/Cas9 has recently transformed our ability to characterize individual genetic variants via genome editing (Cong et al., 2013; Dicarlo et al., 2013; Jinek et al., 2012; Mali et al., 2013). Following a Cas9-mediated double-strand break, precise edits can be engineered by homology-directed repair (HDR) utilizing a donor DNA containing the mutation of interest (Rouet et al., 1994). However, this process is typically low-throughput, and often has variable efficiency of successful edits (Doudna and Charpentier, 2014; Hsu et al., 2014; Liu et al., 2017).
In contrast, high-throughput genome editing has been performed by creating double-strand breaks in the absence of any donor DNA. These are repaired by non-homologous end-joining (NHEJ), often resulting in small insertions or deletions (indels) that lead to frameshifts when targeting exons. In contrast to HDR editing, NHEJ mutations are heterogenous and thus are not useful for studying specific variants of interest. In principle a library of donor DNAs, encoding desired mutations, could be introduced together with the sgRNAs. However, two limitations have prevented this. First, since donor DNAs are typically encoded on separate molecules from the sgRNAs, there would be no way to ensure that the corresponding sgRNA/donor DNA pairs (targeting the same site) would get into the same cells – the combinations would instead be random. Second, even if they could be delivered to the same cells, the efficiency of precise editing is often low and variable across different sites (Liu et al., 2017), introducing an unacceptable level of noise into any downstream phenotyping assay. Several methods have begun to explore high-throughput precision genome editing, but these involve HDR from high-copy plasmids that may limit their general applicability (Bao et al., 2018; Garst et al., 2017; Guo et al., 2018; Roy et al., 2018; Sadhu et al., 2018) (see Table S1).
To boost editing efficiency and enable high-throughput screens, we reasoned that generating a large number of potential donor DNA molecules within the nucleus would maximize the chance of HDR. Bacterial retrons may be ideal candidates for generating these donor DNAs. Retrons are natural DNA elements coding for a reverse transcriptase (RT), as well as a template on which the RT acts, to create a multi-copy single-stranded DNA (msDNA) product (Hsu et al., 1990, 1992; Shimamoto et al., 1993). These msDNAs are covalently tethered to their template RNA by the RT. Following their discovery in the 1980s, retrons were found to be active in yeast and mammalian cells, and any desired sequence can be reverse-transcribed into msDNA (Mirochnitchenko et al., 1994; Miyata et al., 1992). As we show below, coupling retrons with CRISPR/Cas9 enables precise genome editing with very high efficiency and throughput. This allows us to pinpoint the functional variants underlying variation in fitness—the only phenotype that matters for natural selection—in any given environment.
Results
High-efficiency genome editing via retron-generated donor DNA
To enable parallel, high-efficiency HDR editing, we devised a strategy employing a bacterial retron element for generating donor ssDNAs in vivo. We reasoned that retrons could generate many single-stranded donors that are more efficient in HDR than dsDNA donors (Richardson et al., 2016). We designed a chimeric RNA of E. coli Ec86 retron sequence joined with a single guide RNA (sgRNA) at the 3’-end (Figure 1a). A portion of the bacterial retron sequence was replaced with a 100 nt donor sequence that contained the desired mutation flanked by ~50 bp homology arms (Figure 1b). Southern blot analysis showed that msDNA was produced in yeast in an RT-dependent manner (Figure 1c). We refer to this method as Cas9 Retron precISe Parallel Editing via homologY (CRISPEY).
Figure 1. CRISPEY is highly efficient and precise.
(a) Schematic for generation of msDNA (black) as single-stranded oligodeoxynucleotide repair donor in vivo through reverse transcription of a hybrid retron-guide RNA molecule. The components shown are: retron scaffold from Ec86 retron (orange), donor template to be reverse-transcribed (blue), guide sequence targeting genomic loci (dark blue), sgRNA scaffold RNA for SpCas9 binding (red). (b) CRISPEY construct for generating retron-guide RNA in yeast. Retron-guide RNA sequence is flanked by two self-cleaving ribozymes for removing mRNA cap and poly(A) tail after pGAL7 driven Polymerase II transcription. (c) Southern blot analysis of msDNA species from galactose-induced yeast total RNA. Note size shift after RNase, indicating removal of retron RNA attached to msDNA. Band (2) indicates partial degradation of the retron-guide RNA. (d) Quantification of retron-mediated CRISPR/Cas9 editing in ADE2. Yeast were subjected to editing for 48 hours via induced expression of Cas9, RT, and the CRISPEY construct targeting ADE2. The fractions (±STD) of edited (ADE2 knockout phenotype expected from HDR repair) and non-edited (wildtype phenotype) are shown for combinations of ADE2 editing retron donor and ADE2-targeting guide (mark by +ADE2 Donor / +ADE2 Guide); or “non-functional” BFP retron donor and GFP-targeting guide (marked by − ADE2 Donor / − ADE2 Guide). (e) CRISPEY also allows insertion of a 765 bp sequence with low error rate. Top, schematic for long-insert editing of genomic DNA at the ADE1 locus. Bottom, diagram indicating outcome of editing, with 87.4% showing the intended edit, 4.6% showing 1 bp error within the insertion (possibly due to RT-induced error) and 8% have no edit. See also Table S1, Table S2, Table S3 and Table S4.
We first tested CRISPEY at a single locus in haploid Saccharomyces cerevisiae. To visualize editing efficiency, we introduced a nonsense mutation into ADE2 that results in pink-colored colonies. S. pyogenes Cas9 (SpCas9) and Ec86 reverse transcriptase (Ec86-RT) were integrated in the genome, and the retron-sgRNA was encoded on a low-copy plasmid (Figure 1b). Remarkably, editing efficiency reached 100% after 48 hours (Figure 1d; Table S2), with 0/95 sequenced colonies having undesired mutations in ADE2 (Table S3). As expected, editing was dependent on expression of all components: pairing of ADE2 targeting retron-donor sequence and sgRNA, as well as expression of the RT are required for high-efficiency (Figure 1d). In addition, CRISPEY can insert a 765 nt sequence with 92% efficiency and <0.01% mutation rate within the insertion (Figure 1e; Table S3 and S4). In summary, we found that CRISPEY is highly efficient and precise.
Genome-wide measurement of the fitness effects of natural genetic variants
We next turned our attention to a long-standing question in evolutionary biology: which of the genetic differences between strains, populations, or species are selectively advantageous in any given environment? In principle, CRISPEY should allow us to introduce individual variants, one at a time, onto a common genetic background, and enable comparisons even between reproductively incompatible lines. As a first test of this approach we investigated fitness differences between two well-studied S. cerevisiae strains: BY (a common lab strain) and RM (a vineyard isolate), in which thousands of QTLs have been mapped for a wide variety of traits (Albert et al., 2014a; Bloom et al., 2013; Ehrenreich et al., 2009), but few causal variants have been identified (Fay, 2013).
We designed 32,000 guide/donor pairs targeting 16,006 SNPs and short indels, with edits designed to introduce a single RM allele into each BY cell (Table S5). After transforming our plasmid library into BY and editing for 48 hours, the edited yeast pools competed in log-phase growth for ~25 generations in minimal media with 2% glucose. We sequenced guide/donor oligomers from six replicate flasks at each of 11 time points and estimated the abundance of each edited strain (Figure 2a, Figure S1). Abundance measurements at similar time points were highly correlated between replicates (Pearson’s R = 0.952–0.992; Figure S1f,j). We then estimated the fitness effect of each edit by measuring the change in abundance of guide/donor oligomers introducing each RM allele.
Figure 2. CRISPEY screen for fitness effects of natural variants.
(a) Schematic for experiment to assay phenotypic effects of CRISPEY libraries. (b) Example of one edit increasing fitness (red), one decreasing fitness (blue), and random guides representing wildtype BY (black). Each guide/donor pair is shown with abundances from six biological replicate cultures. (c) Irreproducible discovery rate analysis, based on agreement between independently edited biological replicates. The greater number of positive effects (RM allele fitter) is likely due to asymmetric power. (d) Edits without a reproducible fitness effect (blue) show little reproducibility between independent guides, as expected if their fitness effect estimates are dominated by noise or random drift. In contrast, edits with reproducible effects (orange) are more highly correlated. (e) We selected 14 edits for validation: 12 with positive fitness effects, 1 with no effect, and 1 with negative effect. Our estimates of log2 fold change per generation from pooled competition were slightly smaller than those from the validation experiments, suggesting that our fitness effects are not over-estimated due to the “winner’s curse”. Data are represented as mean ± SEM. See also Figure S1, Table S1, Table S2, Table S3, Table S4 and Table S5.
We observed a large variation in fitness across strains (i.e. cells containing a specific guide/donor oligomer), with some strains changing >30-fold in frequency during the competition (Figure 2b; Figure S1c). Fitting a linear model to each strain’s log2 abundance over time in generations, we estimated the relative fitness effect of each edit (approximated by the log2 fold change in abundance per generation; Figure S1a-e; STAR Methods for details) (Ritchie et al., 2015).
Fitness estimates from two independent editing experiments were only moderately correlated (Pearson’s R = 0.5; Figure S1g-h), as expected if most genetic variants have no measurable fitness effect and thus no correlation between replicates. We therefore calculated the Irreproducible Discovery Rate (IDR) (Li et al., 2011) to define a set of reproducible effects (Figure 2c, Figure S1h). Positive fitness effects were more easily detectable, since the starting abundance of each strain leaves more dynamic range to increase than to decrease (especially for strains with relatively low starting abundance), and lower abundances are more affected by random drift and low sequencing depth. Nevertheless, both positive and negative effects were highly reproducible across independently edited biological replicates, with 572 variants below a 5% IDR (Li et al., 2011) (Figure 2c, Figure S1i).
Our library design included 8,902 variants with multiple gRNAs creating the same edit. As another test of reproducibility, we compared the fitness effects estimated for these independent gRNAs. As expected, edits with minor fitness effects showed little correlation (Pearson’s R = 0.10, p-value<10−10 for edits with IDR>0.05); however significant edits were far more reproducible between guides (Pearson’s R = 0.59, p-value<10−34 for edits with IDR<0.05; Figure 2d).
As a validation, we further tested 14 edits with varying fitness effect estimates. We performed CRISPEY to create these 14 individual (non-pooled) strains, and sequence-verified each edit (at these loci we observed 96% editing efficiency and 0% NHEJ; Table S3). Competing each edited strain against its BY parent, we found that all 14 validated in both direction and approximate fitness effect, though the fitness effects estimated from pooled competition tended to be slightly smaller than those from individual competitions (Figure 2e, Table S2). This orthogonal validation lends further support to the accuracy of our genome-wide CRISPEY results.
Characterizing Cas9 target mismatch sensitivity
Next, we investigated the target specificity of Cas9 by examining CRISPEY toxicity during the editing phase. Our guides were filtered to not have highly similar secondary targets elsewhere in the genome (STAR Methods). However, both the edited target site and donor sequence on the plasmid represent possible off-target sites, differing from the intended on-target site by just one base in the case of a SNP edit. At these two potential off-target sites, the edit that is introduced by the donor will determine the mismatch between the guide and the off-target. We reasoned that since each Cas9 cut creates a fitness burden (note that plasmid loss is lethal in our growth media), guide/donor oligomers that cut their target and off-target (i.e. the donor) more efficiently may have lower fitness during the 48-hour editing phase. Therefore, examining the toxicity of CRISPEY during the editing phase as a function guide-donor mismatches can inform us about Cas9 target sensitivity. For this purpose, we defined the difference between the fitness (log2 fold change per generation) of cells with a specific guide/donor oligomer and the median fitness of strains with random guides (that do not promote cutting) as a proxy for the off-target cutting level.
We found that the fitness effect of each guide/donor during editing was highly reproducible (Pearson’s R = 0.97 for guide/donor oligomer abundance log fold change during editing), with guide/donors that edit the PAM or “seed” positions (−7:−1) in the guide having reduced toxicity, suggesting lower levels of off-target cutting (Figure 3a, Figure S2a-d). Interestingly, the sensitivity of the guide to a mismatch depends on the base change induced by the edit, with lower sensitivity (and thus higher rates of cutting) for guanine to adenine mismatches in the seed region (Figure 3b, Figure S2a-c). These results are consistent with trends previously observed for in vitro target specificity for four guides (Figure S3) (Pattanayak et al., 2013).
Figure 3. Cas9 is sensitive to mismatches in the seed region and depends on the mismatched nucleotides.
(a) Each Cas9-induced double strand break (DSB) is potentially toxic, so edits that prevent repeated cutting—either of the genome or of the donor sequence on the plasmid—are expected to show higher fitness during the 48-hour editing phase of our experiment. We observed the lowest toxicity for edits in the PAM or seed region (positions −1 to −7 in the guide), consistent with previous results on Cas9 mismatch-tolerance(Doench et al., 2016; Fu et al., 2016). For comparison, “None” shows the fitness of random guides that are not expected to cut anywhere. Whiskers show the distribution range (excluding outliers) (b) Off-target cutting level as a function of the edit position in the guide and the nucleotides in the guide and in the off-target sense strand. Off-target cutting level is defined as the difference between the fitness of cells with specific guide/donor and the median fitness of strains with random DNA guide/donor (i.e. guide/donor that do not promote any cutting; “None” in (a)). Note the lower sensitivity to guanine to adenine mismatches in the seed relative to other mismatches in this region. (c) and (d) show agreement of the off target cutting level shown in (b) with in vitro measurements of dCas9 binding equilibrium (Pearson’s R=0.85 and 0.65, p-value = 1.7×10−8 and 2×10−4 for the guides marked by red and green respectively) and binding rate (Pearson’s R=−0.027 and −0.12, p-value = 0.89 and 0.52 for the guides marked by red and green respectively) from Boyle et al.(Boyle et al., 2017). See also Figure S2, Figure S3 and Table S5.
To investigate the patterns of mismatch tolerance, we compared our results to a study of dCas9 binding in vitro. We found a strong agreement between our off-target cutting level and equilibrium dCas9 binding to the same mismatch positions/nucleotides, but no association with dCas9 binding rates for these same mismatched guides (Figure 3c-d) (Boyle et al., 2017). This suggests that dissociation before cutting may be common at off-target sites in vivo, consistent with results observed in vitro and in human cell lines (Farasat and Salis, 2016; Gong et al., 2018; Ma et al., 2016a).
Overall, our results are in agreement with biochemical (Boyle et al., 2017; Fu et al., 2016) and in vivo (Fu et al., 2016; Morgens et al., 2017; Tsai et al., 2017) characterization of Cas9 target mismatch sensitivity (Figure S3), and extend them by testing over ten thousand on-target/off-target pairs. Note that Cas9 toxicity should not affect the growth competition since Cas9 and the gRNAs are not expressed in glucose media; consistent with this, the guide score and edit position in the guide were not correlated with fitness during growth competition (Figure S2e-g).
Variants affecting fitness are enriched in cis-regulatory regions
We then characterized the variants with significant fitness effect during growth competition (Figure 4a, Figure S4a-b). Many of these variants had surprisingly large effect sizes: 171 had >1% fitness difference per generation, and 17 were >5%. This suggests that many of these variants may involve fitness tradeoffs—where each allele confers a selective advantage in some environments—since unconditional fitness effects of this magnitude would quickly lead to fixation of the fitter allele (see Discussion).
Figure 4. Characterizing variants affecting fitness.
(a) Distribution of significant fitness effect variants throughout the genome. “Disrupting coding gene category” includes out-of-frame indels and gain or loss of stop codon. Positive = RM-fitter, negative = BY-fitter. (b) Quantile-quantile plot showing the extent of significant RM-fitter hits for the major annotation categories. Values larger than 60 were set to 60 and marked by triangles (see Figure S4c for non-truncated version). (c) Variants with strong fitness effects (RM-fitter alleles, IDR < 0.05) are enriched in promoters, and depleted in coding regions. (d) Locations of significant variants (RM-fitter alleles, IDR < 0.05) with respect to TSSs (all ORFs were scaled to the same length for visualization; 900 bp of flanking regions are also shown). (e) Variants in or near transcription factor binding sites are most likely to affect fitness. The blue line shows the result over all intergenic variants. (f) Missense variants with IDR < 0.05 are enriched for smaller BLOSUM62 scores, indicating less conservative amino acid changes. See also Figure S4 and S5.
We next asked what types of genomic regions harbor these significant variants. Most previously identified variants affecting trait variation between species are within protein-coding regions (Fay, 2013; Hoekstra and Coyne, 2007; Stern and Orgogozo, 2008), though these may be influenced by ascertainment biases (Hoekstra and Coyne, 2007; Rockman, 2012). Strikingly, we found that our top 23 hits were all in promoter regions, despite promoters accounting for only 29.8% of the edits (hypergeometric p-value < 10−11). Divergent promoters—those upstream of two divergently transcribed genes—were the most enriched for significant positive effects, followed by unidirectional promoters (2.66 and 1.81 fold enrichment, binomial p-value < 10−15 and 10−7 respectively; Figure 4b-c, Figure S4c,f). This pattern persisted when controlling for chromatin accessibility, which could potentially affect editing efficiency (Figure S4d). When significant variants were visualized as a function of their position within genes or flanking noncoding regions (Figure 4d, Figure S4e), the distribution was similar to the positions of gene expression QTLs (eQTLs) mapped in diverse S. cerevisiae strains (Kita et al., 2017), suggesting that cis-regulation may underlie the majority of these effects.
To investigate the promoter variants further, we tested whether they were enriched in transcription factor binding sites (TFBSs), which are key sequence elements in cis-regulation. We found a striking pattern: 33% of variants within known TFBS had significant fitness effects, compared to only 9.6% of all tested promoter variants (hypergeometric p-value<10−12; Figure 4e). Interestingly, the fraction of significant hits showed a strong dependence on distance from the nearest TFBS, with variants within 10 bp still highly enriched—consistent with results from human studies (Farh et al., 2015; Tehranchi et al., 2016)— while those >40 bp from a TFBS were depleted of fitness effects (Figure 4e, Figure S4g-h). The most highly enriched binding sites were for Reb1 (9 sites, 1.94 fold, hypergeometric p-value = 0.011), Abf1 (8 sites, 2.02 fold, hypergeometric p-value = 0.013) and Rap1 (8 sites, 7 of which are within ribosomal protein promoters, 2.42 fold, hypergeometric p-value = 0.002).
To test whether these promoter variants impact cis-regulation, we examined the allele-specific expression of their neighboring genes in a BY/RM hybrid (Albert et al., 2014b). We found that genes with significant upstream variants were slightly more likely to show significant allelic imbalance in mRNA levels (Wilcoxon p-value = 0.00068) and translation (Wilcoxon p-value = 0.00069) in a BY/RM hybrid (Figure S5a-d). We also validated the effects of two variants on the mRNA levels of RPL19B (42% increase, unpaired t-test with Welch’s correction p = 0.0082) and RPL26A (21% increase, unpaired t-test with Welch’s correction p = 0.0024) using RT-qPCR (Figure S5e-g).
Altogether, these results strongly support a predominance of cis-regulatory variants affecting BY/RM fitness variation, in contrast with the predominance of coding variants impacting phenotypes that has been observed across single-locus studies in yeast, plants, and metazoans (Fay, 2013; Hoekstra and Coyne, 2007; Stern and Orgogozo, 2008). This difference may be due to the absence of ascertainment bias in our CRISPEY screen, consistent with trends observed in similarly systematic genome-wide association studies in humans (Hindorff et al., 2009).
Synonymous and missense variants have similar impacts on fitness
Although coding variants were under-represented among significant variants (Figure 4c), we identified 156 synonymous and 95 missense variants at IDR < 0.05. Missense variants that change protein sequences were about equally likely to affect fitness as synonymous variants (Figure 4c, Figure S4j). While this may seem counterintuitive, this does not suggest that random missense and synonymous mutations will have similar effects, since deleterious variants have already been filtered by natural selection. We hypothesized that missense variants may affect fitness via changes to proteins, while synonymous variants may affect translation via codon usage. Consistent with this, we found that significant missense variants were more likely to cause nonconservative amino acid changes (Figure 4f, Figure S4k), and that significant synonymous variants were more likely to be present in genes with strong codon usage bias (Figure S4l). It is also likely that some coding variants affect fitness via cis-regulation of transcription or mRNA decay, considering the abundance of eQTLs within yeast protein-coding regions (Kita et al., 2017). Our finding of widespread fitness effects of synonymous variants casts doubt on the common assumption that synonymous mutations are neutral, consistent with several other lines of evidence (Chamary et al., 2006; Lawrie et al., 2013; Plotkin and Kudla, 2011; She and Jarosz, 2018).
Polygenic adaptation of ribosomal gene promoters
We then asked whether we could detect lineage-specific selection from the patterns of fitness effects. Because BY and RM have evolved in different environments, it is likely that they have each adapted in distinct ways, and that variants with large fitness effects (e.g. those above 5% in Figure 2c) represent local adaptations to each strain’s typical environment. By searching over all gene ontologies (Eden et al., 2009), we found that “cytoplasmic translation” (mostly ribosomal) genes were strongly enriched for having significant RM-fitter variants in their promoters (FDR = 5.5×10−5).
To test whether this enrichment was driven by selection, we applied a sign test (Fraser, 2011; Orr, 1998), which tests a null model that under neutral evolution, any gene set (e.g. pathways or protein complexes) should show a similar ratio of BY-fitter / RM-fitter alleles as the rest of the genome. In contrast, if a pathway has been under different selection pressures in the BY and RM lineages, then we may observe a bias of fitness effects in that pathway favoring one strain. We found that 25 “cytoplasmic translation” (mostly ribosomal) genes have significantly RM-fitter variants, compared to just 2 with BY-fitter variants (Binomial p-value < 5×10−9; Figure 5a-b), which is not consistent with neutral evolution and suggests polygenic adaptation of the ribosome.
Figure 5. Detecting lineage-specific selection.
(a) Locations of significant variants (RM-fitter alleles, IDR < 0.05) with respect to TSSs for 25 cytoplasmic translation genes. Most RM-fitter variants are in the flanking noncoding regions. (b) Example of a ribosomal subunit with five significant variants clustered within its promoter; all five variants affect fitness in the same direction, and two (indicated by red dotted lines) are adjacent positions next to conserved Fhl1/Rap1 binding sites, both with validated fitness effects (Figure 2e). Larger dots indicate significant fitness effects. (c) Example of a divergent promoter with five significant variants clustered around Fhl1/Rap1 binding sites (Harbison et al., 2004); all five variants affect fitness in the same direction, and three (indicated by red dotted lines) have validated fitness effects (Figure 2e). (d) Variants at IDR < 0.05 favor the same parent 98% of the time when separated by up to 50 bp; the agreement decreases at larger distances, but is still greater than random (blue line) up to 1 kb. e. Illustration of selection within a locus leading to reinforcing variant effect directions and a resulting composite QTL. See also Figure S4 and Figure S5.
To test whether these ribosomal promoter variants may affect the translation of other genes via their effects on the ribosome, we analyzed QTLs for protein levels (pQTLs) from a BY/RM cross (Albert et al., 2014a). We found that the 25 translation genes with RM-fitter variants were 2.3-fold enriched among pQTL loci whose RM alleles decrease protein levels of one or more other genes, without affecting mRNA levels (permutation p-value = 0.012; STAR Methods). No enrichment was observed for pQTLs whose RM alleles increase protein levels (permutation p-value = 0.81), suggesting that attenuated translation may underlie the fitness advantage of the RM alleles.
Among BY-fitter variants, promoters for genes involved in pseudohyphal growth were specifically enriched (enrichment FDR = 0.0057; binomial p-value = 0.026), suggesting another target of lineage-specific selection.
Lineage-specific selection leads to clusters of large-effect variants
With our screen’s variant-level resolution, we can also test for selection without the need for gene set annotations. In a previous application of the sign test to BY/RM, we found that pairs of cis and trans-acting eQTLs targeting the same gene were more likely to act in the same direction, providing evidence of selection (Fraser et al., 2010). Extending this logic to our CRISPEY results, we tested whether neighboring variants also tend to favor the same parental strain. Among intergenic variants with significant (IDR < 0.05) effects, we found that variants within 50 bp of one another nearly always (98%, 88 pairs, binomial p-value = 4.8×10−20) favored the same parent (Figure 5c-d, Figure S4i). This bias decreased to near-background levels (65% agreement; 83 pairs, binomial p-value = 0.0655) at >200 bp. This trend suggests that most clusters of variants affecting fitness resulted from lineage-specific selection, rather than neutral evolution.
This finding has important implications for the interpretation of QTLs: it has long been recognized that closely linked variants acting in the same direction could create strong QTLs (Stam and Laurie, 1996; Visscher and Haley, 1996) (Figure 5e), but the prevalence of these “composite QTLs” is unknown. If selection on closely linked variants is also common in other species, composite QTLs are likely to be quite frequent, and thus genetic architectures may be far more complex than can be inferred from QTLs alone.
Discussion
In sum, our results show that genotype-phenotype relationships can now be assayed at single-base resolution in parallel, greatly boosting our ability to identify causal variants for polygenic traits. We observed that yeast genetically-encoded fitness differences are mainly cis-regulatory, and often involve multiple tightly clustered causal variants per locus affecting fitness in the same direction.
Our results raise several important questions. First, why are such large fitness effects observed among segregating variants? Theory predicts that selection should be very efficient in large populations such as S. cerevisiae, leading to rapid fixation or loss of alleles with fitness effects greater than ~4/Ne (where Ne is the effective population size), which is likely to be several orders of magnitude smaller than the 1% effects we observed for 171 variants. One plausible explanation is that BY and RM are adapted to different environments, and the large fitness effects reflect these lineage-specific adaptations. This would predict that CRISPEY screens performed in environments more closely resembling RM's historical environment would lead to greater fitness advantages for RM alleles (and vice versa for BY). Alternatively, other forms of selection could maintain large-effect variants even without local adaptation, such as balancing selection or seasonally fluctuating selection (Sellis et al., 2016; Wittmann et al., 2017).
Another question is to what extent the patterns we observed—such as the predominance of noncoding variants and clustered variants acting in the same direction—will be seen in other growth conditions, strains, and species. While there is no doubt that the quantitative fitness effects of many variants will change substantially depending on the environment and genetic background, we hypothesize that the more general patterns are unlikely to be environment- or strain-specific. This reasoning is supported by a similar predominance of noncoding variants in human phenotypic variation (Hindorff et al., 2009) and recent adaptation (Enard et al., 2014; Fraser, 2013). Nevertheless, it will be fascinating to compare results of CRISPEY screens across diverse environments to explore gene-by-environment interactions with single-variant resolution.
Our work has several limitations. First, as noted above, any pooled screen will have more power to detect strains with increasing than with decreasing abundance. However, this effect could be minimized in the future by generating fewer edits, increasing the population size, sequencing more deeply, including more replicates, and/or performing lineage tracking with unique molecular identifier (UMI) barcodes (Michlits et al., 2017). These and other refinements in our approach should allow the detection of even smaller fitness effects, and more symmetric power to detect effects in both directions. A second limitation is that our estimated fitness effects could be underestimates if editing efficiencies at some loci are imperfect. Indeed, we observed ~4% unedited colonies when sequencing 14 loci (Figure 2e). However, these editing efficiencies are high enough that they should only introduce a small amount of error into fitness estimates (as seen in the validation set, Figure 2e). Moreover, we have not seen a correlation between the guide quality score or the edit position in the guide and the measured effect on fitness (Figure S2e-g). However, we cannot exclude the possibility of some false negatives due to low guide efficiency. We also note that in experiments where the average fitness significantly increases over time, the analysis should account for this. Third, our method is limited by the requirement of a PAM sequence near the SNP. Approximately 70% of BY/RM SNPs can be targeted by a Cas9 protein that recognizes the “NGG” PAM sequence; performing CRISPEY with modified Cas9 proteins with different PAM sequences will increase this coverage further (Hirano et al., 2016; Hu et al., 2018). Finally, if CRISPEY editing has any unintended effects (e.g. epigenetic changes) that are heritable over many generations, this could also affect our results as well as other CRISPR-based screens.
Our approach has several advantages over other recent methods of high-throughput editing in yeast (summarized in Table S1). First, our method does not rely on high-copy plasmids as dsDNA donors, allowing adaptation to species that do not propagate plasmids. Second, ssDNA is known to be more efficient in HDR repair than dsDNA (Richardson et al, 2016), which may be critical for adapting our editing system to model systems with low HDR efficiency, such as mammalian cells. Third, our approach produces covalent tethering of the donor template to the guide RNA. This was shown to considerably increase the editing efficiency in both human and yeast (Aird et al., 2018; Lee et al., 2017; Roy et al., 2018). Fourth, retron-mediated editing is highly efficient and does not require additional genetic manipulation, such as NEJ1 knockout or linearized vectors, which limits general applicability of editing systems. Lastly, although Adeno-Associated Virus (AAV)-based genome editing approaches are very efficient in human, they are not suitable for pooled screens due to high multiplicity of infection (MOI) (Bak et al., 2017; Dever et al., 2016; Gaj et al., 2017; Nishiyama et al., 2017; Suzuki et al., 2016; Yang et al., 2016). We expect that transient expression of the retron msDNA using low MOI vectors may circumvent this limitation and enable highly parallel precise genome editing in human cells. To conclude, retron-mediated HDR is highly efficient and potentially adaptable to many species.
In addition to measuring fitness across diverse conditions or strain backgrounds, CRISPEY screens could easily be adapted to any trait that can lead to differential strain or allele abundance (including cell sorting based on expression of fluorescent reporter genes). Looking ahead, implementing CRISPEY in other species—including mammalian cells, in which retrons are functional (Mirochnitchenko et al., 1994)—could potentially allow rapid, base-pair level investigation of a wide range of traits and diseases.
STAR ★ Methods
Contact for Reagent and Resource Sharing
Further information and requests for reagents may be directed to, and will be fulfilled by the corresponding author Hunter Fraser (hbfraser@stanford.edu).
Experimental Model and Subject Details
All strains used in this study were derivatives of S. cerevisiae BY4742 (Brachmann et al., 1998). Construction of strains with integrated SpCas9 and Ec86 Reverse Transcriptase is described below.
Method Details
Construction of strains and editing vector
CRISPEY yeast strain ZRS111 for pooled screen and validation was created by BY4742 integration of plasmid pZS157, a derivative of pRS406 (Stratagene), containing S. cerevisiae HIS3 gene, pGAL1::SpCas9::tCYC and pGAL10::Ec86RT::tGAL10. Ec86 SpCas9 sequence was subcloned from p414-TEF1p-Cas9-CYC1t (Addgene #43802). Reverse Transcriptase (RT) DNA fragment was yeast codon optimized and synthesized by IDT as Gblock DNA oligo. pZS157 was linearized by KpnI digestion and homologous recombination to the his3 locus. CRISPEY editing vector plasmid pZS165 was derived from (yeast centromere plasmid YCp) pRS416 (Sikorski and Hieter, 1989), containing pGAL7::Hammerhead ribozyme::Ec86retron::NotI::sgRNA::HDV ribozyme::tGAL7. Retron sequence from Ec86 was synthesized by IDT as a Gblock DNA oligo.
Media
Expression measurements and competition experiments were performed in glucose (SD -HIS/-URA: 6.7 g/L yeast nitrogen base (YNB), 1.92g/L Drop-out Mix Synthetic Minus Histidine, Uracil, Adenine Rich w/o Yeast Nitrogen Base, 2% glucose). Editing was performed in raffinose (YNB -HIS/-URA 2% raffinose: 6.7 g/L YNB, 1.92g/L Drop-out Mix Synthetic Minus Histidine, Uracil, Adenine Rich w/o Yeast Nitrogen Base, 2% raffinose) and galactose (YNB -HIS/-URA 2% galactose: 6.7 g/L YNB, 1.92g/L Drop-out Mix Synthetic Minus Histidine, Uracil, Adenine Rich w/o Yeast Nitrogen Base, 2% galactose) media. ADE2 editing efficiencies were assayed on glucose low adenine (SD - URA low-ADE: 6.7 g/L YNB, 1.92g/L Drop-out Mix Synthetic Minus Uracil w/o Yeast Nitrogen Base, 2% glucose, 2% agar) plates.
Evaluating editing efficiency
Parental strain BY4742 was derived into two strains through plasmid integration: (1) ZRS81 contained pGAL1::SpCas9::tCYC1, and (2) ZRS82 contained both pGAL1::SpCas9::tCYC1 and pGAL10::Ec86RT::tGAL10. The parental strain BY4742, ZRS81, and ZRS82 were then transformed with pZS160 (a plasmid derived from pZS165 in which Hammerhead ribozyme was replaced with HDV ribozyme) with 3 different guide/donor inserts: ade2hdr (retron donor introducing non-sense mutation to ADE2 gene, facilitating ADE2 knockout)-sgAde2(guide sequencing targeting ADE2), gfp2bfp (retron donor introducing point mutation to convert EGFP to BFP, irrelevant to ADE2)-sgAde2, or ade2hdr-sgGFP (guide sequence targeting EGFP) (Table S4). The oligo inserts were assembled with NotI-digested pZS160 by Gibson assembly. Nine editing experiments were conducted, with all combinations between three parental strains and three plasmids. Single colonies were inoculated into pre-editing media (YNB -HIS/-URA 2% raffinose) for 24 hours, then sub-cultured twice into editing media (YNB -HIS/-URA 2% galactose) for 24 hours, for a total of 48 hours of editing under galactose induction. Each culture was diluted and plated on YNB –URA low-ADE plates in triplicates. After 3 days of incubation at 30C, pink (ADE2 knockout phenotype) and white (parental phenotype) colonies were scored to estimate editing efficiency of ADE2 knockout.
To determine the mode of editing (HDR or NHEJ) and evaluate the rate of local off-target effects, we sequenced the ADE2 locus from edited, pink colonies from a ADE2 knockout editing assay using ZRS111 strain containing pZS165 with ade2hdr-sgAde2 guide/donor insert. Strains were grown in raffinose, then edited in galactose. After 48 hours of editing, cultures were plated on SD -URA low-ADE plates and incubated for 3 days at 30C. Genomic DNA from 96 pink colonies were prepared and the ADE2 locus was amplified with PCR (Table S4). All reactions were Sanger sequenced and one failed reaction was excluded (Table S3).
Long insert integration
To test the efficiency and accuracy of CRISPEY-mediated editing to introduce large insert sequences, a guide/donor plasmid (pZS163) was generated to insert EGFP sequence into the ADE1 locus. pZS163 was derived from pZS160 with a 902 nt donor containing ADE1 homology arms flanking EGFP coding sequence plus linker (765 nt) (Table S4). Similar to ADE2 knockout, cells with ADE1 knockout exhibit pink color when grown on low adenine media, indicative of editing. ZRS111 transformants carrying pZS163 were grown in YNB -HIS/-URA 2% raffinose for 18-24 hours and YNB -HIS/-URA 2% galactose for 48 hours with 1:30 dilution every 24 hrs. After 48 hours of editing, cultures were diluted, plated, and incubated at 30C until colonies were visible. Genomic DNA from 40 pink colonies were prepared and the ADE1 locus was PCR amplified for Sanger sequencing (Table S3 and Table S4).
Southern blot analysis of msDNA
Guide/donor oligonucleotide containing gfp2bfp-sgGFP oligo was cloned into pZS160 (Table S4). The plasmid was transformed into 3 yeast strains: BY4742, ZRS81 and ZRS82 as described above. Yeast transformants were grown in YNB -HIS/-URA 2% raffinose for 18-24 hours and YNB -HIS/-URA 2% galactose for 48 hours with 1:30 dilution every 24 hrs. Cell pellets were homogenized with glass beads in Trizol (Invitrogen) and total RNA was extracted following the manufacturer’s instructions. Total RNA were further treated with RQ1 DNase (Promega) or RNase A (Invitrogen) before ethanol precipitation. Untreated and nuclease-treated total RNA were loaded to Novex 10% Urea-TBE gels (Thermo Fisher Scientific) for size separation and bands were visualized by staining with SYBR GOLD before transferring to Hybond N+ nylon membranes (GE healthcare) for blotting and UV-crosslinking. For Southern blot analysis, 5’- Digoxigenin labelled DNA probe (Integrated DNA Technologies) targeting msDNA (homology to BFP) was hybridized to the membrane at 55C in DIG Easy Hyb buffer (Roche) (Table S4). Membranes were then washed with 2x SSC, 0.5% SDS twice at 55C, hybridized to anti-Digoxigenin-AP antibody (Roche) and washed 3 times in 1x PBS-T. Antibody localization was visualized by addition of 1-Step NBT/BCIP substrate (Thermo Fisher Scientific).
Variant Editing Library Design
Our library contains 31,870 20 bp guides targeting 16,006 genetic differences (variants, which we refer to as SNPs, although they also contain small insertions or deletions) between BY4742 and RM11-1a strains of Saccharomyces cerevisiae. To enrich for fitness differences, we focused on 121 fitness QTLs previously mapped between these strains in eight conditions (Bloom et al., 2013) (YNB, YPD, paraquat, ethanol, neomycin, diamide, cobalt-chloride, and tunicamycin). For each variant, the library contained all possible guides that do not have exact match off-targets. Off-targets were identified using Bowtie2 (Langmead and Salzberg, 2012) with the following parameters: “ -f -D 20 -R 3 -N 1 -L 10 -I S,1,0.50 --gbar 3 --end-to-end -k 30 --no-head -t --rdg 10,6 --rfg 10,6”. Guides that had an off-target with 0-3 mismatches were excluded from all analysis of the growth competition. Guide quality scores were calculated using Azimuth (Doench et al., 2016). For each guide, we designed one 100 bp donor that did not contain a 10 bp homopolymer (this limitation is imposed by the oligo synthesis technology). In most cases the donor is centered around the Cas9 cut site, however to avoid 10 bp homopolymers some donor sequences were shifted up/downstream. In these cases, at least 30 bp on each side of the Cas9 cut site were required, otherwise the variant was not targeted. We refer to a combination of a guide and a donor as “oligo” (see oligo design in Figure 1b). The sequences were designed using UCSC’s sacCer3 genome (Kent et al., 2002). We also design 50 pairs of oligos targeting essential genes (genes annotated as “inviable” in SGD, 5/12/2016). In each pair, one oligo knocks out the gene by introducing a nonsense mutation to a cysteine or a tyrosine codon, while the other oligo introduces a synonymous mutation in the same codon. Lastly, we designed 30 guides and donors with random sequence (referred as “Random DNA guides”). In total, we designed 32,000 guide/donor oligos. Oligonucleotide libraries were manufactured by Twist Biosciences (Table S5).
Yeast Library Preparation
Oligonucleotide libraries were amplified using primers 310 and 313 with Q5 hot-start DNA polymerase (New England Biolabs) following manufacturer’s instructions, producing amplicons with flanking 20 bp sequences homologous to plasmid pZS165 (Table S4). PCR-amplified double-stranded DNA were cloned into NotI-digested pZS165 using NEBuilder HiFi DNA Assembly Cloning Kit (New England Biolabs). Assembly was performed with a molar ratio of vector:insert = 1:5. The assembled plasmids were ethanol precipitated and diluted to a concentration of 80 ng/ul.
Libraries were subsequently transformed via electroporation into Endura electrocompetent cells (Lucigen). We performed 8 electroporations, each with 25ul of Endura cells and 80 ng of library plasmid DNA. Electroporations were conducted in 0.1cm-gap cuvettes in a GenePulser (Bio-Rad). Transformed bacteria was allowed to recover in recovery media (supplied with Endura competent cells) for one hour, and then plated on LB/Carbenicillin plates for incubation at 37C overnight. Dilutions were also plated to estimate cfu/transformation.
Plasmids were extracted using EZNA Plasmid Maxi Kit (Omega Biotek). Plasmids were eluted in 3 mL of kit-provided elution buffer. Samples were subsequently digested with NotI-HF and CIP (New England Biolabs) in order to linearize empty vectors. Plasmids were then ethanol precipitated to concentrate library at approximately 0.8 ug/uL.
Yeast transformation was performed via electroporation of plasmid library into yeast strain ZRS111. Electrocompetent yeast were prepared by modifying previously described methods (Benatuil et al., 2010). Yeast transformants were selected by plating on SD -HIS/-URA 2% agar plates supplemented with 1M sorbitol. Transformation efficiency was estimated by plating diluted transformation recovery culture on both YPD and SD -HIS/-URA 2% agar plates. Yeast transformants were scraped off plates using cell spreaders, resuspended in SD -HIS/-URA and frozen in 15% glycerol stocks at −80C for subsequent editing experiments. 6 million transformants were generated, with an average representation of 176 cells per guide/donor oligonucleotide.
To assess whether each cell contained only one guide/donor oligo, we performed colony PCR followed by Sanger sequencing for 48 yeast transformant clones to detect multi-vector transformation. DNA was extracted with lithium acetate/SDS and PCR was performed using Q5 hot-start DNA polymerase (New England Biolabs) following manufacturer’s instructions (Lõoke et al., 2011). Out of 48 clones, 45 were successfully sequenced and all of them contained only a single sequencing trace, while the remaining 3 clones had failed sequencing reactions, showing no evidence of multiple plasmids (Table S3).
Pooled editing and growth competition
For pooled genomic editing, transformed yeast libraries were inoculated in 200 mL YNB -HIS/-URA 2% raffinose at OD600 = 0.3, for 24 hours at 30C (approximately 5 generations). This is equivalent to 1.8×108 cells, representing ~56,000 cells per guide/donor coverage. The pre-editing cultures were then inoculated into 200 mL YNB -HIS/-URA 2% galactose in two replicates, starting from OD600 = 0.3 for two consecutive 24-hour growth periods at 30C (total of 48 hours under galactose induction, approximately 12 generations). Each editing replicate of edited yeast libraries were briefly amplified in SD -HIS/-URA media for 6 hours starting from OD600 = 0.3 and collected at OD600 = 0.8, for 3 biological replicates of growth competition at 30C. The competition culture was diluted with fresh SD -HIS/-URA media roughly every 3 hours to let cell density oscillate between OD600 = 0.2 and 0.8 for 5 cycles and then every 6 hours oscillating between OD600 = 0.1 and 0.8 for 5 cycles, and collected at every dilution for sequencing library preparation (Figure S1e and Table S5). Our library contained 32,000 guide/donor pair, therefore 200 mL of OD600 = 0.1 contain an average representation of 18,750 cells per guide/donor.
Library sequencing
2×109 cells were collected from each time point and digested with 250 ug/mL Zymolyase 20T in 50 mM KPi pH 7.4 buffer at 37C for 1 hour. Plasmids were extracted from yeast cells using Plasmid Plus MAXI kit (Qiagen) with manufacturer’s instructions starting from the P2 Lysis Buffer addition step. Plasmids were further concentrated with DNA Clean and Concentrator kit (Zymo Research). The guide/donor oligonucleotide libraries were amplified using primer 73 and 8 staggered reverse primers with homology to vector sequence and partial Illumina adaptor sequences (primers 443-450), using Q5 hot-start DNA polymerase (New England Biolabs) following manufacturer’s instructions (Table S4). Staggered primers add 1-8 nucleotides to mitigate low complexity sequences at the 3’ end for read 2. Illumina sequencing adaptors with custom barcode indices (P5:122-124; P7: 54-71, 455-460) were added to amplicons by PCR, followed by size selection, allowing multiplexed pooling of 24 samples per sequencing lane. The libraries were sequenced using 3 lanes of Illumina HiSeq 4000, 150bp paired-end (24 samples per lane), dual index workflow with a custom read 1 primer (75) (Table S4).
Estimating edited variant effect on fitness
Paired-end sequencing reads were aligned to the oligo library and the full plasmid sequence using the STAR aligner (Dobin et al., 2013) with parameters “-- outFilterMultimapNmax 1” and “--outFilterMismatchNmax 0” (Table S5). Only reads that had a fragment length of 173-175, mapped to position 1 of an oligo, and that did not contain any mismatches, insertions, or deletions were used as counts for oligo abundance. At least one paired-end read mapped to 60-70% of the designed oligos in each sample from the growth competition phase (Figure S1a-b). To account for the dependence of the measurement variance on the mean, each oligo’s effect on growth rate was estimated using limma-voom (Ritchie et al., 2015) (for variance versus mean plot see Figure S1e; we note that DESeq2 (Love et al., 2014), which uses a similar model to MAGeCK (Li et al., 2014), yielded highly similar results to limma-voom but with smaller p-values for the significant effects. We therefore used limma-voom to be conservative). Guide/donor oligomer editing effects on fitness were estimated as the effect of time in “generations” (culture doublings), that were estimated by the optical density measurements, on the log fraction of paired-end reads that mapped to each oligo out of the total mapped reads in the sample (see reproducibility in Figure 2 and Figure S1). The model also contains an effect for replicate ID (which controls for different fractions of the oligo at the first time point in different replicates) and for sequencing lane. Thus, the full linear model was
where fi,t is the fraction of reads mapped to oligomer i at time t out of all mapped reads in this sample, and t is the time in generations from the start of the growth competition. The goal was to estimate β3, i – the fitness effect (log2 fold change) of oligomer i. Note that this modeling approach approximates the relative fitness by assuming a fixed average fitness over time. This is only an approximation as the average fitness is expected to increase to some extent over time. However: a. According to optical density measurements the growth rate is very stable over time (Figure S1d) b. Assuming that the significant effects that we have detected are correct and real, the abundance measurements can be used to estimate how the average fitness increases over the experiment. According to this, the change in average fitness is about 5% of the relative fitness of the variant with the smallest, significantly detected effect. In conclusion, assuming fixed average fitness should not have large effect on our estimates. Detection of reproducible effects set was performed using IDR (Li et al., 2011) with similar parameters as the ENCODE project (mu=0.1, sigma=1.0, rho=0.2, p=0.5, eps=10−6, max.ite=3000) on the −log10(p-values). We removed seven variants whose replicates did not agree on the direction of the effect. The effects of multiple oligos per variant were combined by taking a weighted average with the inverse of the variance used as weights. The combined p-value was calculated using Fisher’s method separately for p-values calculated using a one-tailed positive test and a one-tailed negative test and using the more significant p-value. The above data can be found in Table S5.
Genomic annotation, GO and TFBS analyses
Yeast BY-RM variants were annotated using Variant Effect Predictor (VEP) (McLaren et al., 2016). The annotations were parsed to retrieve the closest annotation and the genes on both sides on the variant. For intergenic variants, the annotations were used to call whether it is in a divergent promoter, unidirectional promoter or convergent intergenic (between two gene ends). Dubious genes were filtered out from the analysis. GO enrichments were calculated using Gorilla (Eden et al., 2009). Three different sets of transcription factor binding sites were analyzed for overlap with variants: Pachov et al. (Pachkov et al., 2013) was download from SwissRegulon and filtered for score>0.95, Harbison et al. (Harbison et al., 2004) and ORegAnno (Griffith et al., 2008) were downloaded from UCSC genome browser sacCer3 genome version. Gene coordinates were obtained using biomaRt (Durinck et al., 2009) R package from the “scerevisiae_gene_ensembl” dataset of Ensembl.
Validation experiments
14 guide/donor oligos were selected for validation, resynthesized, and cloned into pZS165 (IDT). The ADE2 gene was knocked out in ZRS111 to create ZRS112. ZRS112 colonies are pink when grown on low adenine media. Individual plasmids containing the validation guide-donor oligos as well as a non-editing, control plasmid (pZS165 containing ade2hdr-sgGFP oligo) were transformed into both ZRS111 and ZRS112. The transformants were grown in YNB -HIS/-URA 2% raffinose for 18-24 hours and YNB -HIS/-URA 2% galactose for 48 hours with 1:30 dilution every 24 hrs. Individual clones were genotyped by PCR amplification of target loci and Sanger sequencing (106 clones in total, 102 edited, 4 non-edited, see Table S3 and Table S4). Clones with the correct edit, were selected for growth competition experiment. Equal amounts of control strain and edited strain were mixed with a starting optical density at 600 nm (OD600) of 0.025, with a volume of 1 mL in SD -HIS/-URA media. Importantly, the competition was setup in a reciprocal manner, with ZRS111 and ZRS112 alternating as control and edited strain. This would control for any genetic interactions between the edited variant and ADE2 knockout allele in ZRS112. The cultures were grown for 5 hours shaking at 30C (3 generations) and plated to SD -HIS/-URA low-ADE plates in triplicates. The cultures were then diluted 1:32 by volume every 8 hours for 40 hours, spanning approximately 25 generations. The cultures at 40 hours were plated to SD -HIS/-URA low-ADE plates in triplicate. The plates were incubated at 30C for 2 days and moved to room temperature for 2 days. The number of white and pink colonies on each plate was recorded to determine the relative fitness of control and edited strains (Table S2).
Estimating editing effect in validation strains
In the validation experiments described above, the growth of a control strain (BY) is compared to a single strain, which was edited by a specific guide/donor oligo, by growing both in the same well. The number of cells of each of the two strains at the start and end of the growth competition is estimated by plating cells on solid media and counting colonies - one strain’s colonies are pink while the other strain’s colonies are white. The effect on growth of a guide/donor edit in these validation experiments was estimated using the following scheme. Consider M competition experiments, in which a control strain c is competing against one of the 14 validated strains a. Let be the total number of colonies collected at the start (t = 0) and be the number of colonies out of that are of the edited strain. Similarly let and be the total and strain a number of colonies collected at the end (t = t1) of the m ‘th experiment. Let , , , be the number of cells in the medium of the control and the edited strain at the start and end of the m ‘th experiment. Let βA and βC be the exponential growth rate coefficients of the edited and control strains and let Δβ = βA – βC. We assume that βC is independent of βA and is thus constant across strain competitions. Our goal is to estimate Δβ, which is expected to be equal to the log2 fold change in the pooled experiment. Finally, let . Then: . Under our model of deterministic / non-stochastic growth,
where σ is the sigmoid function. Using this we can write
and
We estimate Δβ by fitting the data using the statmodels package in Python.
The selection coefficient (relative to the BY strain) is s = 2Δβ – 1. The validation experiment estimates the fitness of the edited strain relative to BY strain while the pooled experiment estimates the fitness relative to the average fitness. Therefore, we deduced the BY fitness that was estimated by the pooled experiment (median fitness of “Random DNA guides”) from the edited strain pooled estimated fitness.
Measuring gene expression in validation strains
For testing the effect of editing on expression of nearby genes in the validation strains, 3 edited clones per strain were harvested during log-phase growth and RNA was extracted with YeaStar RNA kit (Zymo Research). Yeast total RNA was treated with RQ1 DNase (Promega) and ethanol precipitated, followed by cDNA synthesis using SuperScript First-strand synthesis system (Invitrogen). Gene expression was assayed by Real-Time quantitative PCR (RT-qPCR), amplified using iQ SYBR Green Supermix (Bio-Rad) and custom qPCR primers (Table S4) on the CFX384 Real-Time PCR detection System (Bio-Rad). Expression of target genes were calculated by normalizing to reference gene ALG9 mRNA expression. Each qPCR experiment was performed in three replicate wells. Statistical tests between validation strains (RM allele) and control strain (BY allele, labeled as WT) were analyzed using Prism GraphPad software, with unpaired t-test with Welch’s correcti on.
Testing enrichment of ribosomal pQTLs
We sought to test the hypothesis that the fitness effects of variants in promoters of “cytoplasmic translation” genes (hereafter referred to as ribosomal) might act via changes in the ribosome--e.g. total ribosomal abundance, or differential abundance of particular subunits that affect translation of specific target genes (Shi et al., 2017). In principle, this could be tested by asking whether the presence of BY vs. RM alleles at each ribosomal gene predicts the protein levels of other genes in trans, more than expected by chance. QTLs for protein abundance have previously been mapped in a BY/RM cross (Albert et al., 2014a); while some of these are likely to reflect translation variation, others agree in direction with eQTLs (QTLs for mRNA levels) for the same QTL/target gene pairs, suggesting that some pQTLs are a result of changes in mRNA. To define a conservative set of likely “post-transcriptional pQTLs”, we considered only pQTLs where the eQTL signal (Brem and Kruglyak, 2005) was in the opposite direction—e.g. the BY allele increasing protein level but showing no increase in mRNA level (note this does not require the eQTL to be significant, only different in direction). We further split the analysis by allelic direction, testing pQTLs where the BY allele increases protein levels separately from those where the BY allele decreases levels. Out of 1025 total reported pQTLs (Albert et al., 2014a), we classified 339 as likely post-transcriptional (with 187 BY-upregulating and 152 RM-upregulating).
Out of these 187 BY-upregulating pQTLs, we found 79 whose 2-LOD confidence intervals overlapped with one of the 25 ribosomal gene promoters with RM-fitter variants (i.e., an average of ~3.2 target proteins per ribosomal gene), compared to only 20 RM-upregulating. To determine whether this was different from what is expected by chance, we tested two randomization approaches. First, we chose 10,000 groups of 25 genes at random, and performed the same pQTL overlap analysis. Second, we chose 10,000 groups of 25 ribosomal genes at random (to control for the possibility that ribosomal genes are generally enriched for post-transcriptional pQTLs). We found that only 1.2% of random ribosomal gene groups and 1.0% of genome-wide gene groups had at least 79 BY-upregulating post-transcriptional pQTLs, suggesting that this enrichment is unlikely to occur by chance. In contrast, we found most random gene groups had at least 20 RM-upregulating post-transcriptional pQTLs (81% of ribosomal gene groups and 79% of genome-wide groups). Therefore, we hypothesize that the RM-fitter variants in 25 ribosomal gene promoters may result in lower translation of some genes. Since only 160 genes were tested as potential pQTL targets (Albert et al., 2014a), we had little power to find functional enrichments among the affected genes, and did not identify any enriched GO categories when accounting for the 160-gene background set.
Quantification and Statistical Analysis
Guide/donor oligo abundances were measured using Illumina HiSeq 4000, 150bp paired-end reads as described in the “Library sequencing” section. Sequencing reads were mapped to the 32,000 guide/donor oligo using STAR as described in the “Estimating edited variant effect on fitness” section to obtain guide/donor oligo counts in each sample (see Snakemake workflow in https://github.com/eilon-s/CRISPEY). The counts of 11 samples from each of 6 growth competition replicates (3 from each of the 2 editing replicates) were used to estimate the editing effect on fitness using limma-voom as described in the “Estimating edited variant effect on fitness” section. Effects sizes were also estimated separately for each of the 3 growth competition replicates following each editing phase replicate. These were used for the IDR analysis. Estimation of editing effects on fitness in the validation experiment is described in the “Estimating editing effect in validation strains” section. Analysis of transcription factor binding sites and gene ontology enrichment is described in the “Genomic annotation, GO and TFBS analyses” section. RT-qPCR were performed with 3 edited clones per strain or control clones as biological replicates and triplicate wells per strain for technical replication. Statistical analysis for RT-qPCR was described in the “Measuring gene expression in validation strains” section.
Data and Software Availability
All raw sequencing data have been deposited in the NCBI Sequence Read Archive (SRP126558). All software and code use for the design and getting oligo counts are available on https://github.com/eilon-s/CRISPEY. Code used to create specific figures is available upon request. Sequence for oligonucleotides used in this study can be found in Table S4. Raw Sanger sequencing reads can be found at: http://dx.doi.org/10.17632/jtbnrrmstt.1.
Supplementary Material
Figure S1. Reproducibility of experimental estimation of relative fitness effects, related to Figure 2. (a) and (b) show strain coverage during growth competition. (a) Each line shows the distribution of read pairs mapping to each oligo in one sample during growth competition. (b) Each line shows the distribution of estimated cells containing each oligo in one sample during growth competition right before the dilution. The number of cells was calculated by multiplying the fraction of reads mapping to each oligo by the total number of cells. The total number of cells before dilution was estimated to be 0.8 OD600 * 200 ml * 3 * 107 cells. The culture goes through a series of 4.5, 4 and 8-fold dilutions, so to get the size of the bottlenecks caused by the dilutions, divide the cell count prior to dilution by these numbers. (c) Abundance of strains across timepoints. The heatmap shows the log2 fold change during growth competition for strains which had reproducible (IDR<0.05) effects on growth rate across two sets of three growth competition experiments each starting from independent editing experiments. (d) Showing culture doubling (which we refer to as a generation) during growth competition based on OD measurements (y-axis) as a function of time. Colors mark the sample editing and growth competition replicate number. Black line shows a linear fit to the data. (e) Variance-mean trend plot produced by limma-voom (Ritchie et al., 2015). Limma-voom was used to model the effects of each guide/donor on growth rate during growth competition. (f) The heatmap shows clustering based on the Pearson correlation between raw read counts for each pair of samples. X and Y labels state the generation from growth competition start, replicate editing phase number and replicate growth competition number. Dendrogram colors match the time of the sample (i.e. sample number and dilution iteration). Notice that the samples cluster according to sampling time. (g) Correlation between relative fitness estimates using a single replicate. Note that this correlation includes both significant effects that should be correlated and many more neutral or nearly neutral effects that should not show high correlation. (h) Comparing relative fitness estimates using three growth competition replicates. Each set of three replicates follow an independent editing experiment (Pearson’s R = 0.5). An IDR analysis of the log10(p-values) was used to identify a set of highly reproducible effects (IDR<0.05, orange). Non-significant effects (IDR>0.05, blue) show far weaker correlation, as expected. (i) IDR analysis of combined −log10(p-value) of each RM variant’s effect on growth, multiplied by the direction of the effect. Guide/donor p-values were combined using Fisher’s method. Orange dots show values with IDR<0.05. (j) A summary of the reproducibility analysis.
Figure S2. CRISPR-Cas9 editing toxicity is sensitive to the target and edited target sequence, related to Figure 3. Panels (a), (b) and (c) y-axes show the difference between the mean log2 fold change per generation of oligos that contain a random DNA guide (and therefore will not cut the target) and the oligo log2 fold change per generation during editing. Higher values reflect higher toxicity and therefore more cutting events. (a) shows the guide (target) nucleotide in the edited position. (b) shows the donor (edited target) nucleotide in the edited position. (c) shows both the guide (target) and the donor (edited target) nucleotide in the edited position. Bar colors represent a specific change from guide (target) nucleotide to donor (edited target) nucleotide, as indicated in the inset. (d) Show log2 fold changes per generation during 48h editing by guide Azimuth (Doench et al., 2016) score. Panels (a)-(d) show results only for oligos that edit a synonymous mutation by 1bp replacement. In comparison to the above, panels (e)-(g) show results for changes in guide/donor oligo abundance during the growth competition (in which Cas9 is not expressed). (e) and (g) both show the log2 fold change per generation and (f) shows the fraction of guide/donor oligos with significant log2 fold change per generation as a function of the edit position in the guide ((e) and (f)) and of guide quality score (Doench et al., 2016) (g). Whiskers in panels (a), (b), (c) and (e) show the distribution range (excluding outliers).
Figure S3. CRISPR-Cas9 off-target cutting level is correlated with previous Cas9 mismatch sensitivity in vitro and in vivo measurements, related to Figure 3. (a) and (b) show an agreement between CRISPR-Cas9 off-target cutting level and previous measurement of Cas9 sensitivity to mismatches in in vivo (Fu et al., 2016) and in vitro (Fu et al., 2016; Pattanayak et al., 2013). Previous measurements were done using few guides and therefore may be affected by the specific guide-target context. Nevertheless, data from Pattanayak et al. agree with our observation of lower sensitivity to mismatches with adenine when the matching guide nucleotide is guanine (c) and to having guanine in the guide when the mismatch is with adenine (d). Data are represented as mean ± SEM.
Figure S4. Characterizing variants affecting fitness, related to Figure 4 and Figure 5. (a) Manhattan plot showing the relative fitness of editing a site from the BY variant to the RM variant. Larger dots mark IDR<0.05. (b) similar to (a) showing only site effects with IDR<0.05. Horizontal gray lines in (a) and (b) mark +/−1% selection coefficient (2log2(fold change) per generate −1). (c) Quantile-quantile plot of fitness effect p-values by annotation category. Positive = RM-fitter, negative = BY-fitter. (d) Enrichment of fitness effects in promoters is not caused by higher chromatin accessibility. Y-axis shows the fraction of variants with significant effect in each nucleosome coverage under exponential growth in YPD (Schep et al., 2015) bin by genomic annotation. (e) Alignment of the fraction of significantly negative effect variants relative to TSS. Showing the percent of variants with significant effect in each bin of variants with similar position relative to the translation start site. Bins outside the TSS are 50bp wide. Variants in the coding region were assigned a value according to their relative position in the coding sequence: (variant position in CDS / CDS length) * 1600 bp, and then were binned in to non-overlapping 50 bp bins. (f) Annotation enrichment among variants with significantly negative effect (IDR<0.05). Enrichments are weak due to weaker signal overall. (g) and (h) show the percent of variants with significant effect as a function of their distance from the nearest transcription factor binding site. (g) shows the results for transcription factor binding sites from ORegAnno (Griffith et al., 2008). (h) shows the results for transcription factor binding sites from Harbison et al. (Harbison et al., 2004). The blue lines show the result over all non-genic variants. (i) Nearby sites with significant effect on fitness (IDR < 0.05) favor the same parent strain. This panel show the same analysis as Figure 5d except with finer bin resolution. Bin 351-400bp does not contain data points. (j) Quantile-quantile plot showing the p-value distributions of the effects of synonymous and missense mutations. (k) Non-conservative amino acid changes are enriched in missense variants with significant effects. The plot show that missense variants with negative BLOSUM62 are enriched in the set of missense variants with significant effect (IDR < 0.05; Binomial p-value < 10−3). (l) Synonymous variants with significant effect on growth tend to be in genes with high codon bias. Whiskers show the distribution range (excluding outliers). Codon bias was calculated by CodonW, and downloaded from SGD. The difference between significant (IDR<0.05) variants and the rest was significant for synonymous variants (Kolmogorov-Smirnov, p-value = 0.017). No difference was observed for missense variants (Kolmogorov Smirnov, p-value = 0.50), suggesting that the difference is not due to increased importance/fitness effects of high codon-bias genes overall.
Figure S5. The effect of BY-RM variants on gene expression, related to Figure 4. Panels (a)-(d) show that genes with significat upstream variants are more likely to exhibit allele-specific expression. (a) and (b) Scatter plots showing adjusted p-value of the most significant variant upstream of the gene versus allele-specific mRNA expression (a) and ribosome footprints (b) from Albert et al. (Albert et al., 2014b). (c) and (d) compare allele-specific mRNA expression (c) and ribosome footprints (d) between genes with and without significant variant upstream (significant is defined as adjusted p-value < 10−4 and variants are considered up to 1000 bp upstream). Whiskers in panels (c) and (d) show the distribution range (excluding outliers). Panels (e)-(g) show that RM-allele replacement alters expression of translation-related genes. (e) Editing chrII:168064 site at the YBL028C-RPL19B promoter to the RM allele increases expression of RPL19B, a ribosomal subunit, by 42% (unpaired t-test, Welch’s corrected p-value = 0.0082). (f) Editing chrXII:818996 site at the RPL26A promoter to the RM allele reduces expression of RPL26A, a ribosomal subunit, by 21% (unpaired t-test, Welch’s corrected p-value = 0.0024). (g) Editing chrII:477248 site at the TKL2-TEF2 promoter locus to the RM allele reduces expression of TEF2 by 15% (unpaired t-test, Welch’s corrected p-value = 0.0267), a eukaryotic translation elongation factor. Strains that were edited to the RM allele and control strains (WT, BY genotype) were individually grown to log-phase and assayed with RT-qPCR. Data are represented as mean ± SEM. Expression changes of edited strain relative to WT for each gene were detected using unpaired t-test between RM-allele and BY-allele strains with Welch’s correction (*: p < 0.05; **: p< 0.01).
Table S2. Spot assay data for ADE2 and 14 validation strains, related to Figure 1 and Figure 2. This table contains (1) Pink/white colony counts for estimating ADE2 knockout editing efficiency shown in Figure 1d (ADE2 knockout phenotype is pink); (2) Sample data and colony counts for pairwise experiment of 14 validation strains shown in Figure 2e.
Table S3. Sanger sequencing of edited targets and library plasmids, related to Figure 1 and Figure 2. This table contains Sanger sequencing information for (1) Genotyping of ADE2 knockout edits and EGFP insertion into ADE1 (long insert) for editing efficiency estimation, shown in Figure 1d and Figure 1e; (2) Quality control of 45 pooled library transformants for estimating whether each transformant carried only a single plasmid. (3) Genotyping of target loci in validation strain clones.
Table S4. Oligonucleotides, PCR and sequencing primer description, related to Figure 1 and Figure 2. This table contains the sequences and description of all primers/oligonucleotides that were used in this study.
Table S5. Pooled growth competition sample information, library guide/donor oligonucleotide annotations, mapped read count and growth rate estimates, related to Figure 2 and Figure 3. This table contains information for: (1) Annotations for all guide/donor oligonucleotide synthesized in the CRISPEY oligonucleotide library; (2) Sample information for pooled editing and growth competition; (3) Counts of sequencing reads that map to each guide/donor oligonucleotide; and (4) Estimated growth coefficient for each guide/donor oligonucleotide; (5) Off target (plasmid) cutting level during editing, related to Figure 3b.
Highlights.
CRISPEY – a method for highly efficient, parallel precise genome editing
Applied to measure fitness effects of thousands of natural genetic variants in yeast
Variants affecting fitness are enriched in promoters and TF binding sites
Nearby variants mostly favor the same strain’s alleles, indicating natural selection
Acknowledgments
We thank M.C. Bassik, D.A. Petrov, D.F. Jarosz, L. Qi, E.A. Boyle, A. Harpak, D.A. Knowles, N. Sinnott-Armstrong, J. Lee, Atish Agarwala and members of the Fraser and Pritchard labs for helpful discussions. ES was partially funded by EMBO LTF 646-2014. SAC was partially funded by NIH 5T32GM007276-42. This work was supported by NIH grant 2R01GM097171-05A1. We dedicate this work to the memory of Zachery R. Smith.
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Declaration of Interests
HBF is a co-inventor of a patent application describing the CRISPEY approach.
References
- Aird EJ, Lovendahl KN, St. Martin A, Harris RS, and Gordon WR (2018). Increasing Cas9-mediated homology-directed repair efficiency through covalent tethering of DNA repair template. Commun. Biol. 1, 54. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Albert FW, Treusch S, Shockley AH, Bloom JS, and Kruglyak L (2014a). Genetics of single-cell protein abundance variation in large yeast populations. Nature 506, 494–497. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Albert FW, Muzzey D, Weissman JS, and Kruglyak L (2014b). Genetic influences on translation in yeast. PLoS Genet 10, e1004692. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bak RO, Dever DP, Reinisch A, Hernandez DC, Majeti R, and Porteus MH (2017). Multiplexed genetic engineering of human hematopoietic stem and progenitor cells using CRISPR/Cas9 and AAV6. Elife 6, e27873. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bao Z, HamediRad M, Xue P, Xiao H, Tasan I, Chao R, Liang J, and Zhao H (2018). Genome-scale engineering of Saccharomyces cerevisiae with single-nucleotide precision. Nat. Biotechnol 36, 505–508. [DOI] [PubMed] [Google Scholar]
- Benatuil L, Perez JM, Belk J, and Hsieh C-M (2010). An improved yeast transformation method for the generation of very large human antibody libraries. Protein Eng. Des. Sel 23, 155–159. [DOI] [PubMed] [Google Scholar]
- Bloom JS, Ehrenreich IM, Loo WT, Lite T-LVLV, and Kruglyak L (2013). Finding the sources of missing heritability in a yeast cross. Nature 494, 234–237. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Boyle EA, Andreasson JOL, Chircus LM, Sternberg SH, Wu MJ, Guegler CK, Doudna JA, and Greenleaf WJ (2017). High-throughput biochemical profiling reveals sequence determinants of dCas9 off-target binding and unbinding. Proc. Natl. Acad. Sci. U. S. A 114, 5461–5466. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Brachmann CB, Davies A, Cost GJ, Caputo E, Li J, Hieter P, and Boeke JD (1998). Designer deletion strains derived from Saccharomyces cerevisiae S288C: A useful set of strains and plasmids for PCR-mediated gene disruption and other applications. Yeast 14, 115–132. [DOI] [PubMed] [Google Scholar]
- Brem RB, and Kruglyak L (2005). The landscape of genetic complexity across 5,700 gene expression traits in yeast. Proc. Natl. Acad. Sci. U. S. A. 102, 1572–1577. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chadwick AC, Wang X, and Musunuru K (2017). In Vivo Base Editing of PCSK9 (Proprotein Convertase Subtilisin/Kexin Type 9) as a Therapeutic Alternative to Genome Editing. Arterioscler. Thromb. Vasc. Biol 37, 1741–1747. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chamary JV, Parmley JL, and Hurst LD (2006). Hearing silence: non-neutral evolution at synonymous sites in mammals. Nat. Rev. Genet 7, 98–108. [DOI] [PubMed] [Google Scholar]
- Cong L, Ran FA, Cox D, Lin S, Barretto R, Habib N, Hsu PD, Wu X, Jiang W, Marraffini LA, et al. (2013). Multiplex genome engineering using CRISPR/Cas systems. Science 339, 819–823. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dever DP, Bak RO, Reinisch A, Camarena J, Washington G, Nicolas CE, Pavel-Dinu M, Saxena N, Wilkens AB, Mantri S, et al. (2016). CRISPR/Cas9 β-globin gene targeting in human haematopoietic stem cells. Nature 539, 384–389. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dicarlo JE, Norville JE, Mali P, Rios X, Aach J, and Church GM (2013). Genome engineering in Saccharomyces cerevisiae using CRISPR-Cas systems. Nucleic Acids Res 41, 4336–4343. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, and Gingeras TR (2013). STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Doench JG, Fusi N, Sullender M, Hegde M, Vaimberg EW, Donovan KF, Smith I, Tothova Z, Wilen C, Orchard R, et al. (2016). Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR-Cas9. Nat. Biotechnol. 34, 184–191. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Doudna JA, and Charpentier E (2014). Genome editing. The new frontier of genome engineering with CRISPR-Cas9. Science 346, 1258096. [DOI] [PubMed] [Google Scholar]
- Durinck S, Spellman PT, Birney E, and Huber W (2009). Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt. Nat. Protoc. 4, 1184–1191. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Eden E, Navon R, Steinfeld I, Lipson D, and Yakhini Z (2009). GOrilla: A tool for discovery and visualization of enriched GO terms in ranked gene lists. BMC Bioinformatics 10, 48. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ehrenreich IM, Gerke JP, and Kruglyak L (2009). Genetic Dissection of Complex Traits in Yeast: Insights from Studies of Gene Expression and Other Phenotypes in the BYxRM Cross. Cold Spring Harb. Symp. Quant. Biol. 74, 145–153. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Enard D, Messer PW, and Petrov DA (2014). Genome-wide signals of positive selection in human evolution. Genome Res. 24, 885–895. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Farasat I, and Salis HM (2016). A Biophysical Model of CRISPR/Cas9 Activity for Rational Design of Genome Editing and Gene Regulation. PLoS Comput. Biol. 12, e1004724. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Farh KK-H, Marson A, Zhu J, Kleinewietfeld M, Housley WJ, Beik S, Shoresh N, Whitton H, Ryan RJH, Shishkin AA, et al. (2015). Genetic and epigenetic fine mapping of causal autoimmune disease variants. Nature 518, 337–343. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fay JC (2013). The molecular basis of phenotypic variation in yeast. Curr. Opin. Genet. Dev. 23, 672–677. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fraser HB (2011). Genome-wide approaches to the study of adaptive gene expression evolution. BioEssays 33, 469–477. [DOI] [PubMed] [Google Scholar]
- Fraser HB (2013). Gene expression drives local adaptation in humans. Genome Res 23, 1089–1096. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fraser HB, Moses AM, and Schadt EE (2010). Evidence for widespread adaptive evolution of gene expression in budding yeast. Proc. Natl. Acad. Sci. U. S. A 107, 2977–2982. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fu BXH, St Onge RP, Fire AZ, and Smith JD (2016). Distinct patterns of Cas9 mismatch tolerance in vitro and in vivo. Nucleic Acids Res. 44, 5365–5377. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gaj T, Staahl BT, Rodrigues GMC, Limsirichai P, Ekman FK, Doudna JA, and Schaffer DV (2017). Targeted gene knock-in by homology-directed genome editing using Cas9 ribonucleoprotein and AAV donor delivery. Nucleic Acids Res. 45, e98. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Garst AD, Bassalo MC, Pines G, Lynch SA, Halweg-Edwards AL, Liu R, Liang L, Wang Z, Zeitoun R, Alexander WG, et al. (2017). Genome-wide mapping of mutations at single-nucleotide resolution for protein, metabolic and genome engineering. Nat. Biotechnol. 35, 48–55. [DOI] [PubMed] [Google Scholar]
- Gaudelli NM, Komor AC, Rees HA, Packer MS, Badran AH, Bryson DI, and Liu DR (2017). Programmable base editing of A*T to G*C in genomic DNA without DNA cleavage. Nature 551, 464–471. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gong S, Yu HH, Johnson KA, and Taylor DW (2018). DNA Unwinding Is the Primary Determinant of CRISPR-Cas9 Activity. Cell Rep 22, 359–371. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Griffith OL, Montgomery SB, Bernier B, Chu B, Kasaian K, Aerts S, Mahony S, Sleumer MC, Bilenky M, Haeussler M, et al. (2008). ORegAnno: an open-access community-driven resource for regulatory annotation. Nucleic Acids Res. 36, D107–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Guo X, Chavez A, Tung A, Chan Y, Kaas C, Yin Y, Cecchi R, Garnier SL, Kelsic ED, Schubert M, et al. (2018). High-throughput creation and functional profiling of DNA sequence variant libraries using CRISPR-Cas9 in yeast. Nat. Biotechnol 36, 540–546. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Harbison CT, Gordon DB, Lee TI, Rinaldi NJ, Macisaac KD, Danford TW, Hannett NM, Tagne J-B, Reynolds DB, Yoo J, et al. (2004). Transcriptional regulatory code of a eukaryotic genome. Nature 431, 99–104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hess GT, Frésard L, Han K, Lee CH, Li A, Cimprich KA, Montgomery SB, and Bassik MC (2016). Directed evolution using dCas9-targeted somatic hypermutation in mammalian cells. Nat. Methods 13, 1036–1042. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hess GT, Tycko J, Yao D, and Bassik MC (2017). Methods and Applications of CRISPR-Mediated Base Editing in Eukaryotic Genomes. Mol. Cell 68, 26–43. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hindorff LA, Sethupathy P, Junkins HA, Ramos EM, Mehta JP, Collins FS, and Manolio TA (2009). Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc. Natl. Acad. Sci. U. S. A 106, 9362–9367. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hirano S, Nishimasu H, Ishitani R, and Nureki O (2016). Structural Basis for the Altered PAM Specificities of Engineered CRISPR-Cas9. Mol. Cell 61, 886–894. [DOI] [PubMed] [Google Scholar]
- Hoekstra HE, and Coyne JA (2007). The locus of evolution: evo devo and the genetics of adaptation. Evolution (N. Y). 61, 995–1016. [DOI] [PubMed] [Google Scholar]
- Hsu MY, Inouye M, and Inouye S (1990). Retron for the 67-base multicopy single-stranded DNA from Escherichia coli: a potential transposable element encoding both reverse transcriptase and Dam methylase functions. Proc. Natl. Acad. Sci. U. S. A. 87, 9454–9458. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hsu MY, Eagle SG, Inouye M, and Inouye S (1992). Cell-free synthesis of the branched RNA-linked msDNA from retron-Ec67 of Escherichia coli. J. Biol. Chem. 267, 13823–13829. [PubMed] [Google Scholar]
- Hsu PD, Lander ES, and Zhang F (2014). Development and applications of CRISPR-Cas9 for genome engineering. Cell 157, 1262–1278. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hu JH, Miller SM, Geurts MH, Tang W, Chen L, Sun N, Zeina CM, Gao X, Rees HA, Lin Z, et al. (2018). Evolved Cas9 variants with broad PAM compatibility and high DNA specificity. Nature 556, 57–63. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jinek M, Chylinski K, Fonfara I, Hauer M, Doudna JA, and Charpentier E (2012). A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity. Science 337, 816–821. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, and Haussler a. D. (2002). The Human Genome Browser at UCSC. Genome Res. 12, 996–1006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kim D, Lim K, Kim ST, Yoon SH, Kim K, Ryu SM, and Kim JS (2017a). Genome-wide target specificities of CRISPR RNA-guided programmable deaminases. Nat. Biotechnol. 35, 475–480. [DOI] [PubMed] [Google Scholar]
- Kim K, Ryu SM, Kim ST, Baek G, Kim D, Lim K, Chung E, Kim S, and Kim JS (2017b). Highly efficient RNA-guided base editing in mouse embryos. Nat. Biotechnol. 35, 435–437. [DOI] [PubMed] [Google Scholar]
- Kim YB, Komor AC, Levy JM, Packer MS, Zhao KT, and Liu DR (2017c). Increasing the genome-targeting scope and precision of base editing with engineered Cas9-cytidine deaminase fusions. Nat. Biotechnol. 35, 371–376. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kita R, Venkataram S, Zhou Y, and Fraser HB (2017). High-resolution mapping of cis-regulatory variation in budding yeast. Proc. Natl. Acad. Sci. U. S. A 114, E10736–E10744. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Komor AC, Kim YB, Packer MS, Zuris JA, and Liu DR (2016). Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage. Nature 533, 420–424. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Langmead B, and Salzberg SL (2012). Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lawrie DS, Messer PW, Hershberg R, and Petrov DA (2013). Strong purifying selection at synonymous sites in D. melanogaster. PLoS Genet. 9, e1003527. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lee K, Mackley VA, Rao A, Chong AT, Dewitt MA, Corn JE, and Murthy N (2017). Synthetically modified guide RNA and donor DNA are a versatile platform for CRISPR-Cas9 engineering. Elife 6, e25312. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li Q, Brown JB, Huang H, and Bickel PJ (2011). Measuring reproducibility of high-throughput experiments. Ann. Appl. Stat 5, 1752–1779. [Google Scholar]
- Li W, Xu H, Xiao T, Cong L, Love MI, Zhang F, Irizarry RA, Liu JS, Brown M, and Liu XS (2014). MAGeCK enables robust identification of essential genes from genome-scale CRISPR/Cas9 knockout screens. Genome Biol 15, 554. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liang P, Sun H, Sun Y, Zhang X, Xie X, Zhang J, Zhang Z, Chen Y, Ding C, Xiong Y, et al. (2017). Effective gene editing by high-fidelity base editor 2 in mouse zygotes. Protein Cell 8, 601–611. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liu Z, Liang Y, Ang EL, and Zhao H (2017). A New Era of Genome Integration - Simply Cut and Paste! ACS Synth. Biol 6, 601–609. [DOI] [PubMed] [Google Scholar]
- Lõoke M, Kristjuhan K, and Kristjuhan A (2011). Extraction of genomic DNA from yeasts for PCR-based applications. Biotechniques 50, 325–328. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Love MI, Huber W, and Anders S (2014). Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ma H, Tu L-C, Naseri A, Huisman M, Zhang S, Grunwald D, and Pederson T (2016a). CRISPR-Cas9 nuclear dynamics and target recognition in living cells. J. Cell Biol. 214, 529–537. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ma Y, Zhang J, Yin W, Zhang Z, Song Y, and Chang X (2016b). Targeted AID-mediated mutagenesis (TAM) enables efficient genomic diversification in mammalian cells. Nat. Methods 13, 1029–1035. [DOI] [PubMed] [Google Scholar]
- Mackay TFC, Stone EA, and Ayroles JF (2009). The genetics of quantitative traits: challenges and prospects. Nat. Rev. Genet 10, 565–577. [DOI] [PubMed] [Google Scholar]
- Mali P, Yang L, Esvelt KM, Aach J, Guell M, DiCarlo JE, Norville JE, and Church GM (2013). RNA-guided human genome engineering via Cas9. Science 339, 823–826. [DOI] [PMC free article] [PubMed] [Google Scholar]
- McLaren W, Gil L, Hunt SE, Riat HS, Ritchie GRS, Thormann A, Flicek P, and Cunningham F (2016). The Ensembl Variant Effect Predictor. Genome Biol 17, 122. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Michlits G, Hubmann M, Wu S-H, Vainorius G, Budusan E, Zhuk S, Burkard TR, Novatchkova M, Aichinger M, Lu Y, et al. (2017). CRISPR-UMI: single-cell lineage tracing of pooled CRISPR-Cas9 screens. Nat. Methods 14, 1191–1197. [DOI] [PubMed] [Google Scholar]
- Mirochnitchenko O, Inouye S, and Inouye M (1994). Production of single-stranded DNA in mammalian cells by means of a bacterial retron. J. Biol. Chem 269, 2380–2383. [PubMed] [Google Scholar]
- Miyata S, Ohshima A, and Inouye S (1992). In vivo production of a stable single-stranded cDNA in Saccharomyces cerevisiae by means of a bacterial retron. Proc. Natl. Acad. Sci. U. S. A 89, 5735–5739. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Morgens DW, Wainberg M, Boyle EA, Ursu O, Araya CL, Tsui CK, Haney MS, Hess GT, Han K, Jeng EE, et al. (2017). Genome-scale measurement of off-target activity using Cas9 toxicity in high-throughput screens. Nat. Commun 8, 15178. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nishida K, Arazoe T, Yachie N, Banno S, Kakimoto M, Tabata M, Mochizuki M, Miyabe A, Araki M, Hara KY, et al. (2016). Targeted nucleotide editing using hybrid prokaryotic and vertebrate adaptive immune systems. Science 353, aaf8729. [DOI] [PubMed] [Google Scholar]
- Nishiyama J, Mikuni T, and Yasuda R (2017). Virus-Mediated Genome Editing via Homology-Directed Repair in Mitotic and Postmitotic Cells in Mammalian Brain. Neuron 96, 755–768.e5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Orr HA (1998). Testing natural selection vs. genetic drift in phenotypic evolution using quantitative trait locus data. Genetics 149, 2099–2104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pachkov M, Balwierz PJ, Arnold P, Ozonov E, and van Nimwegen E (2013). SwissRegulon, a database of genome-wide annotations of regulatory sites: recent updates. Nucleic Acids Res 41, D214–20. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pattanayak V, Lin S, Guilinger JP, Ma E, Doudna JA, and Liu DR (2013). High-throughput profiling of off-target DNA cleavage reveals RNA-programmed Cas9 nuclease specificity. Nat. Biotechnol 31, 839–843. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pickrell JK, Marioni JC, Pai AA, Degner JF, Engelhardt BE, Nkadori E, Veyrieras J-B, Stephens M, Gilad Y, and Pritchard JK (2010). Understanding mechanisms underlying human gene expression variation with RNA sequencing. Nature 464, 768–772. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Plotkin JB, and Kudla G (2011). Synonymous but not the same: the causes and consequences of codon bias. Nat. Rev. Genet 12, 32–42. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pritchard JK, Pickrell JK, and Coop G (2010). The genetics of human adaptation: hard sweeps, soft sweeps, and polygenic adaptation. Curr. Biol 20, R208–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rees HA, Komor AC, Yeh W-H, Caetano-Lopes J, Warman M, Edge ASB, and Liu DR (2017). Improving the DNA specificity and applicability of base editing through protein engineering and protein delivery. Nat. Commun 8, 15790. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Richardson CD, Ray GJ, DeWitt MA, Curie GL, and Corn JE (2016). Enhancing homology-directed genome editing by catalytically active and inactive CRISPR-Cas9 using asymmetric donor DNA. Nat. Biotechnol 34, 339–344. [DOI] [PubMed] [Google Scholar]
- Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, and Smyth GK (2015). limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res 43, e47. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rockman MV (2012). The QTN program and the alleles that matter for evolution: all that’s gold does not glitter. Evolution 66, 1–17. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rockman MV, and Kruglyak L (2006). Genetics of global gene expression. Nat. Rev. Genet 7, 862–872. [DOI] [PubMed] [Google Scholar]
- Rouet P, Smih F, and Jasin M (1994). Introduction of double-strand breaks into the genome of mouse cells by expression of a rare-cutting endonuclease. Mol. Cell. Biol 14, 8096–8106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Roy KR, Smith JD, Vonesch SC, Lin G, Tu CS, Lederer AR, Chu A, Suresh S, Nguyen M, Horecka J, et al. (2018). Multiplexed precision genome editing with trackable genomic barcodes in yeast. Nat. Biotechnol 36, 512–520. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sadhu MJ, Bloom JS, Day L, Siegel JJ, Kosuri S, and Kruglyak L (2018). Highly parallel genome variant engineering with CRISPR-Cas9. Nat. Genet. 50, 510–514. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Satomura A, Nishioka R, Mori H, Sato K, Kuroda K, and Ueda M (2017). Precise genome-wide base editing by the CRISPR Nickase system in yeast. Sci. Rep 7, 2095. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schaid DJ, Chen W, and Larson NB (2018). From genome-wide associations to candidate causal variants by statistical fine-mapping. Nat. Rev. Genet 19, 491–504. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schep AN, Buenrostro JD, Denny SK, Schwartz K, Sherlock G, and Greenleaf WJ (2015). Structured nucleosome fingerprints enable high-resolution mapping of chromatin architecture within regulatory regions. Genome Res 25, 1757–1770. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sellis D, Kvitek DJ, Dunn B, Sherlock G, and Petrov DA (2016). Heterozygote Advantage Is a Common Outcome of Adaptation in Saccharomyces cerevisiae. Genetics 203, 1401–1413. [DOI] [PMC free article] [PubMed] [Google Scholar]
- She R, and Jarosz DF (2018). Mapping Causal Variants with Single-Nucleotide Resolution Reveals Biochemical Drivers of Phenotypic Change. Cell 172, 478–490.e15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shi Z, Fujii K, Kovary KM, Genuth NR, Röst HL, Teruel MN, and Barna M (2017). Heterogeneous Ribosomes Preferentially Translate Distinct Subpools of mRNAs Genome-wide. Mol. Cell 67, 71–83.e7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shimamoto T, Hsu MY, Inouye S, and Inouye M (1993). Reverse transcriptases from bacterial retrons require specific secondary structures at the 5’-end of the template for the cDNA priming reaction. J. Biol. Chem 268, 2684–2692. [PubMed] [Google Scholar]
- Sikorski RS, and Hieter P (1989). A system of shuttle vectors and yeast host strains designed for efficient manipulation of DNA in Saccharomyces cerevisiae. Genetics 122, 19–27. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stam LF, and Laurie CC (1996). Molecular dissection of a major gene effect on a quantitative trait: the level of alcohol dehydrogenase expression in Drosophila melanogaster. Genetics 144, 1559–1564. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stern DL, and Orgogozo V (2008). The loci of evolution: how predictable is genetic evolution? Evolution (N. Y) 62, 2155–2177. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sturtevant AH (1913). The linear arrangement of six sex linked factors in Drosophila, as shown by their mode of association. J. Exp. Zool. 14, 43–59. [Google Scholar]
- Suzuki K, Tsunekawa Y, Hernandez-Benitez R, Wu J, Zhu J, Kim EJ, Hatanaka F, Yamamoto M, Araoka T, Li Z, et al. (2016). In vivo genome editing via CRISPR/Cas9 mediated homology-independent targeted integration. Nature 540, 144–149. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tehranchi AK, Myrthil M, Martin T, Hie BL, Golan D, and Fraser HB (2016). Pooled ChIP-Seq Links Variation in Transcription Factor Binding to Complex Disease Risk. Cell 165, 730–741. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tsai SQ, Nguyen NT, Malagon-Lopez J, Topkar VV, Aryee MJ, and Joung JK (2017). CIRCLE-seq: a highly sensitive in vitro screen for genome-wide CRISPR-Cas9 nuclease off-targets. Nat. Methods 14, 607–614. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Visscher PM, and Haley CS (1996). Detection of putative quantitative trait loci in line crosses under infinitesimal genetic models. Theor. Appl. Genet. 93, 691–702. [DOI] [PubMed] [Google Scholar]
- Wittmann MJ, Bergland AO, Feldman MW, Schmidt PS, and Petrov DA(2017). Seasonally fluctuating selection can maintain polymorphism at many loci via segregation lift. Proc. Natl. Acad. Sci. U. S. A 114, E9932–E9941. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yang Y, Wang L, Bell P, McMenamin D, He Z, White J, Yu H, Xu C, Morizono H, Musunuru K, et al. (2016). A dual AAV system enables the Cas9-mediated correction of a metabolic liver disease in newborn mice. Nat. Biotechnol 34, 334–338. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang Y, Qin W, Lu X, Xu J, Huang H, Bai H, Li S, and Lin S (2017). Programmable base editing of zebrafish genome using a modified CRISPR-Cas9 system. Nat. Commun 8, 118. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Figure S1. Reproducibility of experimental estimation of relative fitness effects, related to Figure 2. (a) and (b) show strain coverage during growth competition. (a) Each line shows the distribution of read pairs mapping to each oligo in one sample during growth competition. (b) Each line shows the distribution of estimated cells containing each oligo in one sample during growth competition right before the dilution. The number of cells was calculated by multiplying the fraction of reads mapping to each oligo by the total number of cells. The total number of cells before dilution was estimated to be 0.8 OD600 * 200 ml * 3 * 107 cells. The culture goes through a series of 4.5, 4 and 8-fold dilutions, so to get the size of the bottlenecks caused by the dilutions, divide the cell count prior to dilution by these numbers. (c) Abundance of strains across timepoints. The heatmap shows the log2 fold change during growth competition for strains which had reproducible (IDR<0.05) effects on growth rate across two sets of three growth competition experiments each starting from independent editing experiments. (d) Showing culture doubling (which we refer to as a generation) during growth competition based on OD measurements (y-axis) as a function of time. Colors mark the sample editing and growth competition replicate number. Black line shows a linear fit to the data. (e) Variance-mean trend plot produced by limma-voom (Ritchie et al., 2015). Limma-voom was used to model the effects of each guide/donor on growth rate during growth competition. (f) The heatmap shows clustering based on the Pearson correlation between raw read counts for each pair of samples. X and Y labels state the generation from growth competition start, replicate editing phase number and replicate growth competition number. Dendrogram colors match the time of the sample (i.e. sample number and dilution iteration). Notice that the samples cluster according to sampling time. (g) Correlation between relative fitness estimates using a single replicate. Note that this correlation includes both significant effects that should be correlated and many more neutral or nearly neutral effects that should not show high correlation. (h) Comparing relative fitness estimates using three growth competition replicates. Each set of three replicates follow an independent editing experiment (Pearson’s R = 0.5). An IDR analysis of the log10(p-values) was used to identify a set of highly reproducible effects (IDR<0.05, orange). Non-significant effects (IDR>0.05, blue) show far weaker correlation, as expected. (i) IDR analysis of combined −log10(p-value) of each RM variant’s effect on growth, multiplied by the direction of the effect. Guide/donor p-values were combined using Fisher’s method. Orange dots show values with IDR<0.05. (j) A summary of the reproducibility analysis.
Figure S2. CRISPR-Cas9 editing toxicity is sensitive to the target and edited target sequence, related to Figure 3. Panels (a), (b) and (c) y-axes show the difference between the mean log2 fold change per generation of oligos that contain a random DNA guide (and therefore will not cut the target) and the oligo log2 fold change per generation during editing. Higher values reflect higher toxicity and therefore more cutting events. (a) shows the guide (target) nucleotide in the edited position. (b) shows the donor (edited target) nucleotide in the edited position. (c) shows both the guide (target) and the donor (edited target) nucleotide in the edited position. Bar colors represent a specific change from guide (target) nucleotide to donor (edited target) nucleotide, as indicated in the inset. (d) Show log2 fold changes per generation during 48h editing by guide Azimuth (Doench et al., 2016) score. Panels (a)-(d) show results only for oligos that edit a synonymous mutation by 1bp replacement. In comparison to the above, panels (e)-(g) show results for changes in guide/donor oligo abundance during the growth competition (in which Cas9 is not expressed). (e) and (g) both show the log2 fold change per generation and (f) shows the fraction of guide/donor oligos with significant log2 fold change per generation as a function of the edit position in the guide ((e) and (f)) and of guide quality score (Doench et al., 2016) (g). Whiskers in panels (a), (b), (c) and (e) show the distribution range (excluding outliers).
Figure S3. CRISPR-Cas9 off-target cutting level is correlated with previous Cas9 mismatch sensitivity in vitro and in vivo measurements, related to Figure 3. (a) and (b) show an agreement between CRISPR-Cas9 off-target cutting level and previous measurement of Cas9 sensitivity to mismatches in in vivo (Fu et al., 2016) and in vitro (Fu et al., 2016; Pattanayak et al., 2013). Previous measurements were done using few guides and therefore may be affected by the specific guide-target context. Nevertheless, data from Pattanayak et al. agree with our observation of lower sensitivity to mismatches with adenine when the matching guide nucleotide is guanine (c) and to having guanine in the guide when the mismatch is with adenine (d). Data are represented as mean ± SEM.
Figure S4. Characterizing variants affecting fitness, related to Figure 4 and Figure 5. (a) Manhattan plot showing the relative fitness of editing a site from the BY variant to the RM variant. Larger dots mark IDR<0.05. (b) similar to (a) showing only site effects with IDR<0.05. Horizontal gray lines in (a) and (b) mark +/−1% selection coefficient (2log2(fold change) per generate −1). (c) Quantile-quantile plot of fitness effect p-values by annotation category. Positive = RM-fitter, negative = BY-fitter. (d) Enrichment of fitness effects in promoters is not caused by higher chromatin accessibility. Y-axis shows the fraction of variants with significant effect in each nucleosome coverage under exponential growth in YPD (Schep et al., 2015) bin by genomic annotation. (e) Alignment of the fraction of significantly negative effect variants relative to TSS. Showing the percent of variants with significant effect in each bin of variants with similar position relative to the translation start site. Bins outside the TSS are 50bp wide. Variants in the coding region were assigned a value according to their relative position in the coding sequence: (variant position in CDS / CDS length) * 1600 bp, and then were binned in to non-overlapping 50 bp bins. (f) Annotation enrichment among variants with significantly negative effect (IDR<0.05). Enrichments are weak due to weaker signal overall. (g) and (h) show the percent of variants with significant effect as a function of their distance from the nearest transcription factor binding site. (g) shows the results for transcription factor binding sites from ORegAnno (Griffith et al., 2008). (h) shows the results for transcription factor binding sites from Harbison et al. (Harbison et al., 2004). The blue lines show the result over all non-genic variants. (i) Nearby sites with significant effect on fitness (IDR < 0.05) favor the same parent strain. This panel show the same analysis as Figure 5d except with finer bin resolution. Bin 351-400bp does not contain data points. (j) Quantile-quantile plot showing the p-value distributions of the effects of synonymous and missense mutations. (k) Non-conservative amino acid changes are enriched in missense variants with significant effects. The plot show that missense variants with negative BLOSUM62 are enriched in the set of missense variants with significant effect (IDR < 0.05; Binomial p-value < 10−3). (l) Synonymous variants with significant effect on growth tend to be in genes with high codon bias. Whiskers show the distribution range (excluding outliers). Codon bias was calculated by CodonW, and downloaded from SGD. The difference between significant (IDR<0.05) variants and the rest was significant for synonymous variants (Kolmogorov-Smirnov, p-value = 0.017). No difference was observed for missense variants (Kolmogorov Smirnov, p-value = 0.50), suggesting that the difference is not due to increased importance/fitness effects of high codon-bias genes overall.
Figure S5. The effect of BY-RM variants on gene expression, related to Figure 4. Panels (a)-(d) show that genes with significat upstream variants are more likely to exhibit allele-specific expression. (a) and (b) Scatter plots showing adjusted p-value of the most significant variant upstream of the gene versus allele-specific mRNA expression (a) and ribosome footprints (b) from Albert et al. (Albert et al., 2014b). (c) and (d) compare allele-specific mRNA expression (c) and ribosome footprints (d) between genes with and without significant variant upstream (significant is defined as adjusted p-value < 10−4 and variants are considered up to 1000 bp upstream). Whiskers in panels (c) and (d) show the distribution range (excluding outliers). Panels (e)-(g) show that RM-allele replacement alters expression of translation-related genes. (e) Editing chrII:168064 site at the YBL028C-RPL19B promoter to the RM allele increases expression of RPL19B, a ribosomal subunit, by 42% (unpaired t-test, Welch’s corrected p-value = 0.0082). (f) Editing chrXII:818996 site at the RPL26A promoter to the RM allele reduces expression of RPL26A, a ribosomal subunit, by 21% (unpaired t-test, Welch’s corrected p-value = 0.0024). (g) Editing chrII:477248 site at the TKL2-TEF2 promoter locus to the RM allele reduces expression of TEF2 by 15% (unpaired t-test, Welch’s corrected p-value = 0.0267), a eukaryotic translation elongation factor. Strains that were edited to the RM allele and control strains (WT, BY genotype) were individually grown to log-phase and assayed with RT-qPCR. Data are represented as mean ± SEM. Expression changes of edited strain relative to WT for each gene were detected using unpaired t-test between RM-allele and BY-allele strains with Welch’s correction (*: p < 0.05; **: p< 0.01).
Table S2. Spot assay data for ADE2 and 14 validation strains, related to Figure 1 and Figure 2. This table contains (1) Pink/white colony counts for estimating ADE2 knockout editing efficiency shown in Figure 1d (ADE2 knockout phenotype is pink); (2) Sample data and colony counts for pairwise experiment of 14 validation strains shown in Figure 2e.
Table S3. Sanger sequencing of edited targets and library plasmids, related to Figure 1 and Figure 2. This table contains Sanger sequencing information for (1) Genotyping of ADE2 knockout edits and EGFP insertion into ADE1 (long insert) for editing efficiency estimation, shown in Figure 1d and Figure 1e; (2) Quality control of 45 pooled library transformants for estimating whether each transformant carried only a single plasmid. (3) Genotyping of target loci in validation strain clones.
Table S4. Oligonucleotides, PCR and sequencing primer description, related to Figure 1 and Figure 2. This table contains the sequences and description of all primers/oligonucleotides that were used in this study.
Table S5. Pooled growth competition sample information, library guide/donor oligonucleotide annotations, mapped read count and growth rate estimates, related to Figure 2 and Figure 3. This table contains information for: (1) Annotations for all guide/donor oligonucleotide synthesized in the CRISPEY oligonucleotide library; (2) Sample information for pooled editing and growth competition; (3) Counts of sequencing reads that map to each guide/donor oligonucleotide; and (4) Estimated growth coefficient for each guide/donor oligonucleotide; (5) Off target (plasmid) cutting level during editing, related to Figure 3b.
Data Availability Statement
All raw sequencing data have been deposited in the NCBI Sequence Read Archive (SRP126558). All software and code use for the design and getting oligo counts are available on https://github.com/eilon-s/CRISPEY. Code used to create specific figures is available upon request. Sequence for oligonucleotides used in this study can be found in Table S4. Raw Sanger sequencing reads can be found at: http://dx.doi.org/10.17632/jtbnrrmstt.1.





