Abstract
Synonymous mutations in protein-coding genes do not alter protein sequences so are generally presumed neutral or nearly so1–5. To experimentally verify this presumption, we constructed 8,341 yeast mutants each carrying a synonymous, nonsynonymous, or nonsense mutation in one of 21 endogenous genes with diverse functions and expression levels, and measured their fitness relative to the wild-type in a rich medium. Surprisingly, three-quarters of synonymous mutations reduce the fitness significantly, and the distribution of fitness effects is overall similar albeit nonidentical between synonymous and nonsynonymous mutations. We find that both synonymous and nonsynonymous mutations frequently disturb the mutated gene’s mRNA level and that the extent of the disturbance partially predicts the fitness effect. Investigations in additional environments reveal greater across-environment fitness variations for nonsynonymous than synonymous mutants despite their similar fitness distributions in each environment, suggesting a smaller proportion of nonsynonymous than synonymous mutants that are always non-deleterious in a changing environment to permit fixation, potentially explaining substantially lower nonsynonymous than synonymous substitution rates commonly observed. The strong non-neutrality of most synonymous mutations, if also true for other genes and in other organisms, would require reexamining numerous biological conclusions about mutation, selection, effective population size, divergence time, and disease mechanism that rely on the neutral assumption of synonymous mutations.
The cracking of the genetic code in the 1960s revealed that between a quarter and a third of single nucleotide mutations in protein-coding genes do not alter protein sequences1,2. Although these synonymous mutations are not strictly neutral because they could influence many processes6–8 such as transcription factor (TF) binding9, transcription10, pre-mRNA splicing7, mRNA folding11 and stability12,13, translational initiation14, efficiency15,16 , and accuracy17,18, and co-translational protein folding19,20, the vast majority of them are presumed to be at least nearly neutral1–5, contrasting nonsynonymous mutations, which alter protein sequences and frequently the fitness3–5. The (near) neutrality of synonymous mutations is widely assumed in inferring mutation rate, pattern, and mechanism, testing natural selection, estimating effective population sizes (Ne) and neutral genetic diversities commonly considered in conservation policymaking in addition to population and evolutionary biology, and dating evolutionary events such as population or species divergences and gene or genome duplication3–5. This assumption also diverts the mechanistic study of disease from synonymous mutations21.
Nevertheless, synonymous mutations affecting the fitness by >1% are known16,20,22–24. Some even reported comparable fitness effects of synonymous and nonsynonymous mutations25–27. These reports, however, were based on either relatively few genes and mutations25 or many natural polymorphisms26,27 that may not represent random mutations. Here we test the (near) neutrality of synonymous mutations by measuring the fitness effects of thousands of coding mutations in 21 genes in the budding yeast Saccharomyces cerevisiae.
Quantifying mutational fitness effects
The 21 chosen genes participate in diverse biological processes such as metabolism, chromatin remodeling, transcription, translation, and cell wall synthesis (Data S1) and vary by 1000 times in their expression levels (Fig. 1a). These genes are nonessential but their deletions lower the fitness by discernable amounts28 such that the mutational fitness effects are quantifiable. In each gene, we picked an approximately 150-nucleotide coding sequence and chemically synthesized all 450 possible variants that deviate from the wild-type by a point mutation (Fig. 1b). The wild-type sequence at its native genomic location was replaced by the variant sequences using CRISPR/Cas9 genome editing of a haploid strain, followed by confirmation of the respiratory function of the mutant library (Extended Data Fig. 1a, b). All mutants of a gene, together with a wild-type control that went through the same CRISPR/Cas9 editing (Extended Data Fig. 1c), were competed en masse in a rich medium (YPD) at 30°C, with no diploidization observed (Extended Data Fig. 1d). Four separate competitions were performed using a common starting population (T0), and the focal gene was respectively amplified from T0 and the four replicate populations at 12 (T12) and 48 (T48) hrs, followed by 250-nucleotide paired-end Illumina sequencing (Fig. 1b). The sequences informed genotypes and allowed tabulating genotype frequencies (Data S2) in each population29.
Fig. 1. Estimating the fitness effects of coding mutations in 21 yeast genes.

a, The mRNA expression levels in YPD of the 21 genes (dots) measured by RPKM (Reads Per Kilobase of transcript per Million mapped reads) and their ranks among all yeast genes. b, Experimental procedure. WT, wild-type. T0, T12, and T48 respectively refer to 0, 12, and 48 hrs after competition. c, Mutant fitness estimated in the first two of four biological replicates. Each dot is a mutant (n = 8,341 mutants) and the dotted line indicates the diagonal. Pearson’s correlation (r) and its associated P-value are presented. d, Sequencing-based and growth rate-based fitness estimates are highly correlated. Each dot represents a synonymous (yellow) or nonsynonymous (blue) mutant. Mutants used in monoculture growth rate-based fitness estimation and those used in en masse competition followed by sequencing-based fitness estimation are independently constructed. Error bars show the standard error of the mean. Pearson’s correlation r and its associated P-value are presented (r = 0.89 and 0.90 for the 9 synonymous and 15 nonsynonymous mutants, respectively).
For the 21 genes, we identified a total of 8,341 variants with read counts ≥50 at T0, including 1,866 synonymous, 6,306 nonsynonymous, and 169 nonsense mutants, respectively. The observed relative numbers of synonymous and nonsynonymous mutants reflect those designed (Extended Data Fig. 2a). Changes in genotype frequencies between T0 and T48 (or T12) were used to estimate the fitness of each mutant relative to the wild-type. The fitness estimates (Data S3) were highly correlated between replicates, with a mean Pearson’s r of 0.92 (Fig. 1c, Extended Data Fig. 2b–f). Fitness estimates from the en masse competitions agreed well with those measured from monoculture growths for 24 reconstructed synonymous and nonsynonymous mutants (Fig. 1d).
Comparing mutational fitness effects
The median fitness of the 169 nonsense mutants is 0.940 (Extended Data Fig. 3a). As expected, the corresponding value for the 6,306 nonsynonymous mutants is much higher, reaching 0.988 (Fig. 2a). Surprisingly, the median fitness of the 1,866 synonymous mutants is 0.989, much closer to that of nonsynonymous mutants than to the neutral expectation of 1; the same trend holds for mean fitness (Fig. 2a). While the fitness distributions look similar for synonymous and nonsynonymous mutants (Fig. 2a), they are statistically distinct due to a higher density of nonsynonymous than synonymous mutants in the fitness range of 0.91-0.97 but the reverse in the range of 0.97–0.99 (Fig. 2b, Extended Data Fig. 3b). A significant fitness difference was observed between synonymous and nonsynonymous mutants in only five of the 21 genes, with all five exhibiting higher fitness for synonymous than nonsynonymous mutants (Fig. 2c, Extended Data Fig. 3c). Even in these five genes, however, the median fitness of synonymous mutants is much closer to that of nonsynonymous mutants than to 1 (Fig. 2c).
Fig. 2. Mutant fitness in YPD.

a, Distributions of the fitness of 6,306 nonsynonymous (blue) and 1,866 synonymous (yellow) mutants. The two distributions are significantly different (P = 6.1×10−5, two-tailed Wilcoxon rank-sum test; P = 1.3×10−6, Kolmogorov–Smirnov test). b, Cumulative frequency distributions of fitness of nonsynonymous and synonymous mutants. c, Fitness distributions of nonsynonymous and synonymous mutants of 21 individual genes shown by box plots. Nonsynonymous and synonymous distributions of each gene are compared by a two-tailed Wilcoxon rank-sum test followed by FDR correction (*, P < 0.05; ⁑, P < 0.01, ⁂, P < 0.001). Mutants with fitness <0.9 are not shown (see Extended Data Fig. 3c for the complete figure). d, Fractions of nonsynonymous and synonymous mutants with fitness significantly below 1 (nominal P <0.05), significantly above 1, and neither, respectively. Error bars show one standard error. Nonsynonymous and synonymous mutants are not significantly differentially distributed among the three bins (two-tailed Fisher’s exact test). Under FDR = 0.05, 72.7% and 1.5% of nonsynonymous mutations are significantly deleterious and beneficial, respectively. The corresponding values are 72.5% and 1.1% for synonymous mutations. e, Mutant fitness is lower when the mutation is not observed than when it is observed in the genomes of five related yeast species. There are 5839, 169, 1087, 714 mutants in the four bins, respectively. P-values are from two-tailed Wilcoxon rank-sum test. Mutants with fitness <0.95 or >1.025 are not shown (see Extended Data Fig. 3d for the complete figure). In c and e, each data point is a mutant. The lower and upper edges of a box represent the first (qu1) and third (qu3) quartiles, respectively, the horizontal line inside the box indicates the median (md), the whiskers extend to the most extreme values inside inner fences, md ± 1.5(qu3-qu1), and the dots show outliers.
Classifying each mutant into one of three bins based on whether its fitness is significantly below 1 (nominal P < 0.05, t-test), above 1, or neither, we found similar distributions for synonymous and nonsynonymous mutants (Fig. 2d). Among synonymous mutations, 75.9% are significantly deleterious while 1.3% are significantly beneficial. The corresponding values are 75.8% and 1.6% for nonsynonymous mutations. Slightly lower values were obtained at the false discovery rate (FDR) of 0.05 (Fig. 2d legend). The smallest absolute fitness effect found significant in our study is 0.001, orders of magnitude greater than the sensitivity (10−7) of natural selection in yeast30 (see Methods). Hence, all mutations with significant fitness effects are strongly nonneutral. Mutant fitness is lower when the mutation is unobserved in the genomes of related yeast species than when it is observed (Fig. 2e, Extended Data Fig. 3d), indicating that our laboratory fitness estimates are evolutionarily relevant.
Mechanisms of mutational fitness effects
Because synonymous codon usage bias is stronger in more highly expressed genes probably due to translational selection6, synonymous mutations from the wild-type are thought to be more deleterious in more highly expressed genes8. However, we did not detect a significant negative correlation between the expression level of a gene and the mean fitness of its synonymous mutants (Extended Data Fig. 4a). Because synonymous mutations in a gene can alter its mRNA level10,12,13, which could affect fitness31, we measured the relative expression level (REL) of the mutated gene in each mutant in four replicates by dividing its mRNA level by that of the wild-type. Briefly, from a population of cells including the wild-type and all mutants of a gene, we amplified and sequenced the DNAs of the focal gene as well as the cDNAs made from the mRNAs of the focal gene (Fig. 3a). REL is the number of cDNA-derived sequencing reads divided by the number of DNA-derived reads for a mutant, relative to that for the wild-type.
Fig. 3. Coding mutations alter the mRNA level of the mutated gene.

a, High-throughput quantification of the mRNA levels of a focal gene in all mutants of the gene. WT, wild-type. REL, the mRNA level in a mutant relative to that in the WT, is estimated from the number of cDNA-derived sequencing reads divided by the number of DNA-derived reads for the mutant, relative to that for the WT. b, Frequency distributions of REL for 5927 nonsynonymous (blue) and 1783 synonymous (yellow) mutants, respectively. The two distributions are not significantly different (P = 0.11, two-tailed Wilcoxon rank-sum test). c, Correlation between REL and rescaled fitness among mutants. The correlation is significantly different between mutants with REL <1 and >1 (P <0.0001 for both nonsynonymous and synonymous mutants based on z-test after Fisher’s r-to-z transformation). d, Positive correlation between rCAI, the CAI of a mutant relative to that of the wild-type, and REL among mutants. For visualization, in c and d, we group all mutants into 10 equal-size bins by their X-values and present the mean X- and Y-values of each bin (red dot) and the standard error of the mean Y-value (error bar).
We obtained mutant RELs for 20 of the 21 genes (Data S4). Mutant RELs are highly correlated between replicates (Extended Data Fig. 4b–g), confirming the quality of the expression estimates. REL deviates significantly from 1 (nominal P < 0.05, t-test) in 53.8% of synonymous and 55.0% of nonsynonymous mutants (39.7% and 39.6% at FDR <0.05, respectively), indicating that both synonymous and nonsynonymous mutations frequently alter the mRNA level. The REL distribution is not significantly different between synonymous and nonsynonymous mutants (Extended Data Fig. 4h; see Extended Data Fig. 4i for individual genes) and is more or less symmetrical around 1 (Fig. 3b). By contrast, the mean REL is only 0.301 for nonsense mutants (Extended Data Fig. 4j), likely owing to nonsense-mediated mRNA decay32.
Because reducing REL from 1 to 0, equivalent to gene deletion, has different fitness effects for different genes15, we rescaled mutant fitness F to f = (F-F0)/(1-F0), where F0 is the fitness of the strain lacking the focal gene. Consequently, 1-f measures the fitness effect of a mutation relative to that of deleting the focal gene, permitting analyzing the relationship between REL and fitness across mutants of different genes. REL and rescaled fitness are significantly positively correlated for both synonymous and nonsynonymous mutants under REL <1, but the correlation is much weakened under REL >1 (Fig. 3c). These observations suggest that influencing the mRNA level is likely a general mechanism underlying the fitness effects of coding mutations and that expression reduction from the wild-type level typically imposes a stronger fitness effect than the opposite (see Methods).
To understand how coding mutations impact the mRNA level, we identified TF-binding sites in the mutated region of each gene33, but mutations within and outside TF-binding sites do not show significantly different magnitudes of expression effects (Extended Data Fig. 5a, b).
Previous manipulative experiments showed that increasing the codon adaptation index (CAI)34 of a gene through synonymous mutations can boost its mRNA level by slowing mRNA degradation12,13,35 and perhaps enhancing transcription10. Because nonsynonymous mutations can also alter CAI, we computed the relative CAI (rCAI) of each mutant gene by dividing its CAI by that of the wild-type. Indeed, a significant positive correlation exists between rCAI and REL among synonymous mutants as well as among nonsynonymous mutants (Fig. 3d). The same is true between rCAI and rescaled fitness, especially under rCAI <1 (Extended Data Fig. 5c, d).
Due to the increased prevalence of preferred codons in more highly expressed genes6, synonymous mutations decreasing CAI (Extended Data Fig. 5e) and lowering the mRNA level (Extended Data Fig. 5f) are both more abundant in more highly expressed genes. Similar trends are seen for nonsynonymous mutations (Extended Data Fig. 5g, h), because a random nonsynonymous mutation from a preferred codon of an amino acid will likely arrive at a less preferred codon of another amino acid. Consequently, synonymous (Extended Data Fig. 5i) and nonsynonymous (Extended Data Fig. 5j) mutants of more highly expressed genes have lower mean rescaled fitness.
Because of the demand for mRNA folding strength (MFS)11, which is at least in part related to translational accuracy36 and co-translational protein folding37, a change in MFS caused by a coding mutation may affect fitness25. Indeed, we found a significant positive correlation between the relative MFS of a mutant and its rescaled fitness among mutants with reduced MFS (Extended Data Fig. 5k, l), although the correlation is substantially weaker than that between REL and rescaled fitness (Fig. 3c), suggesting that coding mutations’ fitness effects are likely conferred more by their influences of the mRNA level than those of the mRNA folding strength.
Fitness effects across environments
Interspecific comparisons have shown that the nonsynonymous to synonymous substitution rate ratio (dN/dS) is substantially below 1 for most genes in almost all organisms3–5 including yeast38, indicating that the probability of fixation of nonsynonymous mutations is generally much lower than that of synonymous mutations in long-term evolution, seemingly at odds with their similar distributions of fitness effects (DFEs) observed here. One possible explanation is that the two DFEs are highly dissimilar in the range of absolute fitness effects undetectable by our method, which is generally below 5×10−3. For example, when beneficial mutations are ignored as in the neutral theory39, if the fraction of nonsynonymous mutations with deleterious fitness effects smaller than the sensitivity of natural selection in yeast (10−7) is 10% of the corresponding fraction of synonymous mutations, a dN/dS of ~0.1 will result. This hypothesis is, however, difficult to test because of the much lower sensitivity of experiments than natural selection.
We wondered whether the low dN/dS can also be caused by a difference between synonymous and nonsynonymous mutants in their fitness variation among environments40,41. Considering this variation is relevant because the fixation of a neutral mutation takes on average 4Ne generations42, during which the environment is highly likely to have changed many times. In addition to influencing the mRNA level and/or mRNA folding strength that can exert a fitness effect, nonsynonymous mutations also alter the protein sequence and potentially function, which synonymous mutations do not. Because each of the molecular phenotypic effects could be environment-dependent, nonsynonymous mutants may naturally have a larger across-environment fitness variance than synonymous mutants, especially given recent reports that amino acid substitutions often show environment-specific fitness effects43–45. Under the most extreme scenario, the fraction of deleterious mutations is identical between synonymous and nonsynonymous mutations in each environment, but the specific deleterious mutations vary across environments for nonsynonymous but not synonymous mutations. Consequently, when the environment of a population fluctuates within the typical fixation time, some synonymous mutations are never deleterious so may be fixed, while virtually every nonsynonymous mutation is deleterious under some environments so cannot be fixed, resulting in dN/dS <<1. We quantitatively investigated this model using computer simulation. Assuming the YPD-based DFEs in each environment, we varied the fitness of a mutant among environments with the coefficient of variation (CV) greater for nonsynonymous than synonymous mutants. A mutant is selectively purged if its fitness is lower than a preset cutoff (e.g., 0.99 given the fitness estimation error in our experiments) in any environment, and dN/dS is inferred from the fraction of nonsynonymous mutants unpurged relative to that of synonymous mutants unpurged. As predicted, dN/dS drops precipitously with the number of different environments experienced by the population (Fig. 4a, Extended Data Fig. 6).
Fig. 4. A higher fitness CV across environments for nonsynonymous than synonymous mutants can create dN/dS <<1 despite similar DFEs of synonymous and nonsynonymous mutations in each environment.

a, Expected dN/dS from 1000 simulations of a population that experiences multiple different environments. A mutant is purged if its fitness is below a preset cutoff such as 0.98 or 0.99 in any environment. Shaded areas represent 95% confidence intervals. b-d, Distributions of nonsynonymous and synonymous mutant fitness are significantly different in SC + 37°C (P = 1.8×10−12, two-tailed Wilcoxon rank-sum test; P = 1.5×10−9, Kolmogorov–Smirnov test; b), YPD + 0.375 mM H2O2 (P = 1.9×10−7 and 7.0×10−8, respectively; c), and YPE (P = 9.9×10−5 and 2.9×10−9, respectively; d). e, Box plots showing distributions of fitness CV across the four environments for 5,671 nonsynonymous and 1,696 synonymous mutants. Box plot symbols follow those in Fig. 2e. The mean CV is 0.0163 for nonsynonymous and 0.0124 for synonymous mutants. The two distributions are significantly different (two-tailed Wilcoxon rank-sum test). f, Expected dN/dS when the population stays in a constant environment or a changing environment. Actual DFEs in the four individual environments are used and various fitness cutoffs as in panel a are considered. Fitness measurement error is considered through 1000 random samples of error per mutant. The mean expected dN/dS and the 95% confidence interval of the expected dN/dS are presented. Dots and error bars are slightly shifted horizontally to help visualization. * indicates that dN/dS is significantly lower in the fifth population, whose environment fluctuates among the four conditions, than in each of the four constant-environment populations (P <0.05). For the cutoffs where no * is shown, dN/dS is not significantly different between the fifth population and the constant-environment population with the lowest dN/dS.
To verify the key assumption on CV in the above model, we measured the DFEs of the same yeast synonymous and nonsynonymous mutations in three additional environments that differ in nutrient and stress, with three biological replicates per environment (Extended Data Fig. 7, Extended Data Fig. 8a–i, Data S3). As in YPD, in each of these three environments, the median fitness of synonymous mutants is much closer to that of nonsynonymous mutants than to 1 (Fig. 4b–d) and 52.9-62.2% of synonymous mutants are significantly nonneutral (Extended Data Fig. 8j–l). These fractions are lower than that in YPD likely because of the reduced sensitivity of our fitness measurement caused by the use of fewer replicates (see Methods). For each mutant, we computed its CV in fitness across the four environments. Indeed, CV is significantly greater for nonsynonymous than synonymous mutants with (P <10−5) or without (Fig. 4e) the control of the mean fitness in the four environments (see Methods). Additionally, the fraction of neutral mutations in one environment that become deleterious in any of the other three environments is greater for nonsynonymous than synonymous mutations (Extended Data Fig. 9). We then used the empirical DFEs and fitness estimation errors in the four environments to estimate the expected dN/dS after purging mutants whose fitness is lower than a cutoff in any of the environments. Indeed, comparing the four populations respectively staying in one of the four constant environments with the fifth population whose environment fluctuates among the four conditions (see Methods), we found that, in terms of dN/dS, the fifth population is either significantly lower than or is not statistically distinguishable from the lowest of the first four (Fig. 4f). It is expected from the simulation result (Fig. 4a) that dN/dS in the fifth population will further decline as the number of different environments experienced rises.
Discussion
Our characterization of the DFE of thousands of coding mutations in diverse yeast genes under four environments showed that, under any environment, most synonymous mutations are strongly nonneutral and that the DFEs of synonymous and nonsynonymous mutations are overall similar. There is no particular reason why our results would be restricted to yeast, but confirmations in diverse organisms are required to verify the generality of our findings. Because our experiments were performed in haploids, future studies should assess whether synonymous and nonsynonymous mutations also have similar DFEs in the heterozygous state.
Our results suggest a general mechanism through which coding mutations affect fitness—disturbing the mRNA level of the mutated gene, but do not preclude other mechanisms such as impacting mRNA folding and translation. It is currently difficult to demonstrate and quantify the causal contributions of a coding mutation’s various molecular phenotypic effects to its fitness effect, because this would require the difficult experiment of mimicking each molecular phenotypic effect of a coding mutation without disturbing the cell in any other aspect that might influence fitness. For instance, to mimic coding mutations’ influences on the mRNA level of a gene, we could use an inducible promoter to drive gene expression and adjust the promoter activity by altering the concentration of the inducer in the medium46, but this alteration disturbs the medium composition, which could affect fitness more than through the inducible promoter. Additionally, the induction of the promoter may influence the expressions of neighboring genes. Use of tunable degrons, short amino acid sequences that regulate protein degradation47, is another method, but degrons may also affect fitness by altering protein function or mRNA folding and tuning degrons could disturb the medium.
The mRNA level of a gene has a strong influence on the evolutionary rate of its protein sequence, and several mechanisms of this influence have been demonstrated48,49. Our finding that the fraction of nonsynonymous mutations reducing the mRNA level rises with the mRNA level of the gene (Extended Data Fig. 5h) and the fitness ramification of this trend (Extended Data Fig. 5j) suggest an additional mechanism (Extended Data Fig. 10).
Because many biological conclusions rely on the presumption that synonymous mutations are (nearly) neutral3–5, its invalidation has broad implications. For example, many tests infer selection on a gene by comparing its synonymous and nonsynonymous polymorphisms and/or substitutions. Given that most synonymous mutations are deleterious, making the same inference would require assuming that synonymous and nonsynonymous mutations are subject to equal selections that are unrelated to protein sequence and function. While seemingly reasonable, this assumption may not always hold11, so further empirical verifications are needed. That most synonymous mutations are strongly nonneutral means that mutation rate, pattern, and mechanism inferred from synonymous polymorphisms or substitutions may have been distorted. For the same reason, Ne inferred from synonymous polymorphisms in natural populations is likely substantially underestimated, impacting evolutionary studies and certain conservation-related decisions. Similarly, synonymous substitution-based dating of evolutionary divergences may be unjustifiable in some cases. Our results also imply that synonymous mutations are nearly as important as nonsynonymous mutations in causing disease and call for strengthened effort in predicting and identifying pathogenic synonymous mutations50. Given that gene expression anomaly can cause disease51, our results further suggest the disturbance of the mRNA level as a potentially common disease mechanism of coding mutations.
METHODS
Data source
The mRNA expression levels of yeast genes in YPD (Fig. 1a) were from Chou et al.52. The fitness values of yeast gene deletion strains under YPD (Data S1) were from Qian et al.28. Yeast gene functions (Data S1) were based on Saccharomyces Genome Database (https://www.yeastgenome.org/).
Media
Standard media of YPD (1% yeast extract, 2% peptone, and 2% glucose), YPD + 0.375 mM H2O2, YPE (1% yeast extract, 2% peptone, and 2% ethanol), and YPG (1% yeast extract, 2% peptone, and 2% glycerol) were used. Synthetic complete (SC) media contained 0.017% yeast nitrogen base without amino acids, 0.5% sulfate, and 2% glucose, with the addition of appropriate SC mix or SC drop-out mix. 5-FOA (5-fluoroorotic acid) plates contained 0.017% yeast nitrogen base without amino acids, 0.5% sulfate, 2% glucose, SC mix, and 0.15% 5-FOA.
Construction of yeast gene deletion strains
We had three primary considerations in choosing the genes for study. First, because a previous study of DFEs of synonymous and nonsynonymous mutations analyzed only two ribosomal protein genes25, we wanted to include genes with a larger array of functions to complement that study. Second, knowing that synonymous mutations’ fitness effects may depend on the gene expression level8, we wanted to choose genes with a wide range of expression levels to gain a broad picture. Third, because our experiment involved deleting the gene of choice, we must study nonessential genes. Furthermore, the deletions must alter the fitness by detectable amounts such that the mutational fitness effects are quantifiable. The decision of using a 150-nucleotide region per gene was based on the read length of paired-end Illumina sequencing. The starting site of the 150-nucleotide region was randomly chosen in the first half of the coding sequence of a gene as long as the chosen 150 nucleotides are entirely within the coding region. Two exceptions were RPL39 and RPS7A, where 147 nucleotides and 141 nucleotides were respectively studied because of these genes’ short coding sequences.
For each chosen gene, we used CRISPR/Cas9 to delete from the genome of wild-type (BY4742) cells the 150-nucletide target sequence and its 25-nucleotide downstream sequence that would be used as a primer binding site to amplify the gene (see Data S5 for all primer sequences). In the deletion step, the wild-type sequence was replaced by a 23-nucleotide designed sequence (20-nucleotide Cas9 target sequence plus 3-nucleotide PAM site) that would be used as the CRISPR/Cas9 recognition site in the mutant sequence insertion step. The deletion was then verified by Sanger sequencing.
Chemical synthesis of gene variants
For each gene, we had GENEWIZ (https://www.genewiz.com/en) synthesize in an oligo-mix format all 450 variants that each deviate from the wild-type by a single point mutation (except for RPL39 that had 441 variants and RPS7A that had 423 variants due to their shorter sequences). With the exception of oligos for RPL39 and RPS7A, each oligo has 200 nucleotides, including the 150-nucleotide target sequence and its 25-nucleotide upstream and 25-nucleotide downstream flanking sequences. The flanking sequences would serve as primer binding sites for the amplification of the variant sequences. The guaranteed amount of each oligo was 3 nmol, more than enough as the DNA template for polymerase chain reaction (PCR) amplification.
Construction of mutant libraries
The pool of the synthesized single-strand variant oligos of each gene was amplified from the oligo-mix by PCR. High-fidelity Q5 polymerase (NEB) was used in all PCR reactions. The PCR-amplified double-stranded mutant sequences were transformed along with a CRISPR/Cas9 plasmid (pML104-URA3)53 into the strain with the wild-type gene deleted. The Cas9 protein would recognize the aforementioned 23-nucleotide sequence and cause double-stranded breaks. The variant sequences were inserted into the genome at the native genomic location of the focal gene via homologous recombination repair. For each gene, over 10,000 colonies were collected on SC minus uracil plates by washing with sterile water. The large number of colonies collected ensured the inclusion of most mutational variants of each gene. The variant cells were then counter-selected on the 5-FOA plates to get rid of the CRISPR/Cas9 plasmid. The cells were then stored in 30% glycerol at −80°C.
Construction of the wild-type control
We amplified the wild-type ASC1 gene from the genome of the haploid strain BY4742 by PCR and inserted it into the ΔASC1 cell using CRISPR/Cas9. Three colonies were picked and the insertion was confirmed by Sanger sequencing. The cells were then counter-selected on 5-FOA plates to remove the CRISPR/Cas9 plasmid. These three independently reconstituted wild-type strains (WT1, WT2, and WT3) were then stored in 30% glycerol at −80°C.
We measured the maximum growth rate of BY4742 and each of the three reconstituted wild-type strains using Biotek Gen5™ Microplate Reader. The cells were first grown overnight. About 5000 cells were added into 0.1 mL YPD in a well of a Costar™ 96-well plate, which was in continuous shaking at 30°C. Sixteen replicate growth curves were collected per strain, except that one replicate of BY4742 was contaminated so was discarded. The maximum growth rate was calculated following a previous protocol54. The maximum growth rate was not significantly variable among the four strains (Extended Data Fig. 1c). For instance, the maximum growth rate of WT1 was not significantly different from that of WT2, WT3, or BY4742 (Extended Data Fig. 1c). WT1 was used as the wild-type control in en masse competitions and mutant fitness estimation. Our results would remain virtually the same should the growth rate of WT2 or WT3 be used in mutant fitness calculation.
En masse competitions in YPD
A frozen sample of cells carrying the variants of a gene and a frozen sample of the wild-type control cells were revived at 30°C in YPD (with shaking at 250 RPM) for 3 hrs. These cells were then mixed in an approximately 1:50 ratio of wild-type control cells to all mutant cells combined (i.e., the population should contain about 2% wild-type control cells). Four replicate competitions were then started by dilution of this common starting population into four 14 mL Falcon tubes, each containing 6 mL of YPD medium. Upon dilution, the cell density of the starting population was 1×105 cells/mL. The competition was performed in a shaking incubator (250 RPM) at 30°C. Every 12 hrs, the cell culture was diluted to 1×105 cells/mL by transferring to 6 mL fresh YPD. The competition lasted for 48 hrs. The population aliquots at 0 (T0), 12 (T12), and 48 (T48) hrs were stored in 30% glycerol at −80°C. We performed a total of 84 competitions for the 21 genes (4 × 21).
Library preparation and Illumina sequencing
Genomic DNA was extracted from population aliquots (Masterpure™ Yeast DNA Purification Kit), followed by amplification of gene variants by PCR. One primer was targeted at the 25-nucleotide sequence immediately downstream of the mutated region while the other primer was annealed upstream of the mutated region beyond the homologous recombination repair sequence. This design ensured that only those variant sequences that were inserted at the native genomic location of the focal gene were amplified. The primers included Illumina sequencing adapter and i5/i7 index sequences. The amplicons were sequenced by 250-nucleotide paired-end Illumina sequencing (HiSeq2500). Paired reads for variant sequences were required to be identical to be counted. To ensure relative accuracy in fitness estimation, we considered only those genotypes with at least 50 read pairs in T0.
Sequencing-based fitness estimation
We estimated the fitness of each mutant relative to the wild-type control by (P’MTPWT)/(PMTP’WT)(1/G), where PMT and PWT are the respective frequencies of the mutant and wild-type control at the beginning of the competition, P’MT and P’WT are the corresponding frequencies at the end of the competition, and G is the number of generations of the wild-type control in the competition and equals 7.25 for 12 hrs and 29 for 48 hrs. In theory, the above formula works in an en masse competition under the assumption of no strain-strain interaction, as was confirmed by our computer simulation. The strong correlation between mutant fitness estimated from en masse competition and that estimated from monoculture growth (Fig. 1d) supports the assumption of no strain-strain interaction. To estimate G, we first allowed a frozen sample of wild-type control cells to revive at 30°C in YPD at 250 RPM for 3 hrs. We then started a monoculture of the wild-type control at 1×105 cells/mL in 6 mL of YPD. The growth continued for 12 hrs in a shaking incubator (250 RPM) at 30°C. We then estimated G in the 12 hrs based on the culture’s optical density change. G in 48 hrs is 4 times G in 12 hrs. Mutant fitness is estimated more accurately with longer competitions. However, if the fitness of a mutant was so low that the strain disappeared in T48, we calculated the fitness using T0 and T12; otherwise, we used T0 and T48. Note that only for 36 mutants were the fitness estimated using T12 instead of T48. Based on four biological replicates, we used a t-test to determine if the fitness of a mutant deviates from 1 at the nominal P-value of 5%. The average standard error of the estimated mutant fitness was 0.005, considered as the mean detection limit of our fitness measurement. The absolute value of the smallest fitness effect with nominal P < 0.05 was 0.001. It has been estimated based on the level of synonymous polymorphism that Ne is approximately 107 in S. cerevisiae30, suggesting that natural selection can detect a fitness effect of 10−7 or greater in yeast. However, if most synonymous mutations are deleterious, as the present study shows, the actual Ne would be greater than 107 and natural selection more sensitive than considered in this study.
Verifying the respiratory function of mutants
Cells from each mutant library were first serially diluted. Equal numbers of cells were then spread on YPD and YPG plates, where respiratory functions were respectively unneeded and needed for cell growth. We allowed cell growth for two days on YPD and three days on YPG, because of faster cell growth with glucose as the carbon source. Colonies were then counted on each plate. This experiment was repeated three times for the mutant library of each gene. BY4742 was used as a positive control in the respiratory function test. As a negative control, we simultaneously deleted TOM6 and TOM7 from BY4742, because TOM6 and TOM7 are components of the TOM (translocase of outer membrane) complex that is responsible for import of mitochondrially directed proteins and is important for respiration55.
Quantifying ploidy after competition
One T48 population for each gene was randomly chosen and examined for ploidy. Approximately 107 cells were collected, washed with 1.5 mL of water, and fixed by a gentle addition of 3.5 mL of 95% ethanol and incubation for 2 hrs at room temperature. Fixed cells were collected by centrifugation for 15 s at 10,000g, followed by resuspension of the pellet in 1 mL water and transfer to a 1.5-mL microcentrifuge tube. After a brief centrifugation, we re-suspended cells in 0.5 mL RNase solution (2 mg/mL RNase A in 50 mM Tris pH 8.0, 15 mM NaCl, boiled for 15 min and then cooled to room temperature) and incubated the cells for at least 2 hrs at 37°C. We then collected cells from the RNase solution by centrifugation for 15 s at 10,000g. Cells were incubated in 0.2 mL protease solution (5 mg/mL pepsin and 4.5 μl/mL concentrated HCl in H2O) for 20 min at 37°C and then collected by centrifugation. Cells were re-suspended in 0.5 mL 50 mM Tris pH 7.5, and were either stored at 4°C for a few days or analyzed immediately. For analysis, 50 μl of cell suspension was transferred to 1 mL of 1 μM SYTOX Green staining solution. All samples were analyzed using iQue Screener Plus flow cytometry. First, we used the forward scatter area and side scatter area with a clustering package to remove non-cell particles. Second, we used forward scatter area and forward scatter height to remove doublets. Third, we plotted DNA content histograms of the distribution of the amount of DNA per cell. We used haploid (BY4742) and diploid (BY4743) yeast cells as controls to determine ploidy. In each of these two control profiles, there are two peaks, respectively representing cells in the G1 and G2/M cell-cycle stages (1C and 2C DNA content for haploids and 2C and 4C for diploids).
Impact of PCR and sequencing errors
The following error analysis followed Li et al.29. The error rate for Illumina sequencing is 3×10−4 per site per read (http://www.illumina.com/documents/products/technotes/technote_Q-Scores.pdf). Thus, due to sequencing error, a genotype is expected to lose U = [1-(1–3×10−4)2×150]M0 read pairs, where M0 is the true number of read pairs of the genotype and 150 is the sequence length considered. Because the fractional loss U/M0 = 0.086 is a constant for all genotypes including the wild-type in each sample, the loss of reads due to sequencing error does not affect fitness estimation. Sequencing error also causes the genotype to gain on average V = (3×10−4/3)2M1 = 10−8M1 read pairs, where M1 is the total number of read pairs for all neighbors of the focal genotype (i.e., the genotypes that differ from the focal genotype by one nucleotide). Thus, the fractional gain of read pairs for the genotype is expected to be V/M0 = 10−8M1/M0, which has virtually no impact on fitness estimation in our study. For instance, at T0, M1/M0 is expected to be 50 for the wild-type and 11 for any mutant. Hence, the fractional gain of read pairs is <10−6 for any genotype.
We similarly estimated the impact of PCR error. Q5 DNA polymerase used in PCR has a very low error rate of 5.3×10−7 per nucleotide incorporated56. The PCR used in sequencing library preparation had 25 cycles. Thus, due to PCR error, a genotype is expected to lose U = (5.3×10−7×150×25)M0 molecules, where M0 is the true number of DNA molecules of the genotype, 150 is the sequence length in nucleotides, and 25 is the number of PCR cycles. Because the fractional loss U/M0 = 0.002 is a constant for all genotypes in each sample, the loss of molecules due to PCR error does not affect fitness estimation. PCR error also causes the genotype to gain on average V = (5.3×10−7×25/3)M1 = 4.4×10−6M1 molecules, where M1 is the total number of molecules for all neighbors of the focal genotype. Thus, the fractional gain of molecules for the genotype is expected to be V/M0 = 4.4×10−6M1/M0, which has little impact on fitness estimation in our study. As mentioned, at T0, M1/M0 is expected to be 50 for the wild-type and 11 for any mutant. Hence, the fractional gain in the number of molecules is 2.2×10−4 for the wild-type and 4.9×10−5 for any mutant.
Growth curve-based fitness estimation of reconstructed mutants
We used maximum growth rates estimated from monoculture growth curves to verify the mutant fitness estimated by en masse competition followed by sequencing. We chose nine synonymous mutants of RPL29, RAD6, or RPS7A and 15 nonsynonymous mutants of TSR2, RAD6, RPS7A, or BUD23 with relatively large ranges of sequencing-based fitness estimates. We resynthesized these gene variants and remade the corresponding mutant strains. Using the method described earlier for measuring the growths of reconstituted wild-type strains, we measured the growth curves of each of these mutants as well as the wild-type control on the same 96-well plate, with eight replicates per strain. The relative fitness of a mutant was calculated by F = 2relative growth rate−1, where the relative growth rate is the maximum growth rate of the mutant divided by that of the wild-type control. The maximum growth rate was calculated following a previous protocol54. The above formula of F is derived as follows. Let r be the mutant growth rate and R be the wild-type growth rate. Let T be the wild-type generation time. By definition, mutant fitness relative to the wild-type (per generation) is F = erT/eRT. Hence, lnF = (r-R)T. Because by definition eRT = 2, T = (ln2)/R. Combining the above two equations yielded lnF = (r-R)(ln2)/R = (r/R-1)ln2. Therefore, F = 2r/R−1 = 2relative growth rate−1. If mutant cells do not divide so that its population growth rate is 0, the mutant fitness relative to the WT is 0.5. If the mutation kills cells in addition to preventing mitosis, the mutant population growth rate is negative (i.e., the population shrinks), which would lead to a mutant fitness that is lower than 0.5.
CRISPR/Cas9 could generate off-target mutations. However, the high fitness correlation (Fig. 1d) between two independently constructed sets of 24 mutants suggests that this potential off-target effect did not influence our result.
Identifying orthologs of the 21 S. cerevisiae genes in five other yeast species
To examine whether a mutation examined in S. cerevisiae is present in the genomes of other yeast species, we attempted to identify the orthologs of the 21 genes studied in our experiment in S. paradoxus, S. mikatae, S. uvarum, S. castellii, and Candida glabrata, all of which diverged from S. cerevisiae after the whole-genome duplication in yeast. We retrieved genomic coding sequence (CDS) data from the NCBI genome assembly database (https://www.ncbi.nlm.nih.gov/assembly/) if they are available (S. paradoxus, C. glabrata, and S. castellii); otherwise, we retrieved genomic DNA data (S. mikatae and S. uvarum) from the same database. For species with CDS data, we built a local blast library and performed tblastn using protein sequences of the 21 genes from S. cerevisiae as query sequences. The E-value threshold was set at 10−10. If there was a full-length-query match, the matched subject was recorded as an ortholog. If the query was partially matched to the subject, the subject was inspected manually to ensure the orthologous relationship. For each species and gene, only the hit with the lowest E-value was examined to prevent the inclusion of paralogs. For species with genomic DNA data, we similarly built a local blast library and performed tblastn under the same E-value threshold. If there was a full-length-query match, the matched subject sequence was recorded as an ortholog. If the query was partially matched to the subject (likely due to introns), the matched subject sequence was extended 100-2000 bp upstream and downstream to ensure that it included all exons of the gene; the exact length of the extension was determined manually based on the length of the unmatched part of the query as well as genomic structure. We then used AUGUSTUS57 to predict the coding region of the gene in the extended subject sequence, and manually inspected the sequence to ensure the orthologous relationship. We successfully identified almost all orthologs of the 21 genes in the five yeast species, except for EST1, for which we only identified an ortholog in S. paradoxus. We therefore excluded EST1 from the downstream analysis. We also failed to identify the EOS1 ortholog in S. castellii and IES6 ortholog in S. mikatae, but decided to include these two genes in downstream analysis except for the missing species. The orthologous coding sequences of the six yeasts were then aligned using MACSE v2 58. A mutation examined in S. cerevisiae is considered observed in the other yeasts if it appears in the genome of any of the other five yeasts and if no other nucleotide difference from S. cerevisiae exists in that genome in the codon harboring the mutation; otherwise, it is considered unobserved.
Estimating the mRNA levels of mutated genes
A frozen sample of cells carrying the variants of a focal gene and a frozen sample of the wild-type control cells were revived at 30°C in YPD with shaking at 250 RPM for 3 hrs. These cells were then mixed in an approximately 1:50 ratio of wild-type control cells to all mutant cells combined. Four replicate cultures were then started by diluting this common starting population into four 14 mL Falcon tubes, each containing 6 mL of YPD medium. The cell density of the starting population was 1×105 cells/mL. When the cells were in the log phase after 12 hrs of growth at 30°C in a shaking incubator (250 RPM), we extracted DNA and RNA from the cell cultures (Masterpure™ Yeast DNA Purification Kit and RNeasy Mini Kit, respectively). The mRNA of the focal gene was reverse transcribed (SuperScript® III First-Strand Synthesis System for RT-PCR) using about 20 nucleotides within the 25-nucleotide sequence immediately downstream of the variant sequence as the gene-specific primer.
We amplified the mutant gene segments by 25 cycles of PCR from genomic DNA and cDNA, respectively. The cDNA libraries of EST1 were not successfully amplified, which may be because EST1 has the lowest expression level among the 21 genes studied (Fig. 1a). As described earlier, one primer was targeted within the 25-nucleotide sequence downstream of the variant sequence while the other primer was upstream of the variant sequence and beyond the homologous recombination repair sequence. There were Illumina-adapter and i5/i7 index sequences on the primers. The amplicons were subjected to 250-nucleotide paired-end Illumina sequencing (NovaSeq). Paired reads for variant gene sequences must be identical to be counted. To ensure accuracy in expression estimation, we excluded genotypes with fewer than 50 read pairs from the genomic DNA.
The relative mRNA expression level (REL) of a mutant is the number of cDNA-derived read pairs divided by the number of DNA-derived read pairs for the mutant, relative to the corresponding value of the wild-type control. We estimated the REL for 7,795 mutants with fitness estimates in YPD. With the four replicates in REL estimation, we used a t-test to determine if the REL of a mutant significantly deviates from 1 at a nominal P-value of 5%. Virtually identical results were obtained when REL was first log-transformed before the t-test.
Following the sequencing and PCR error analyses presented earlier, we estimated the impact of reverse transcription errors on REL estimation. The reverse transcriptase used is a version of M-MLV RT, with an error rate of 4×10−5 per nucleotide incorporated (https://www.thermofisher.com/us/en/home/life-science/cloning/cloning-learning-center/invitrogen-school-of-molecular-biology/rt-education/reverse-transcriptase-attributes.html). Due to reverse transcription error, a genotype is expected to lose U = (4×10−5×150)M0 molecules, where M0 is the expected number of cDNA molecules of the genotype and 150 is the sequence length. Because the fractional loss U/M0 = 0.006 is a constant for all genotypes in each sample, the loss of molecules due to reverse transcription error does not affect expression estimation. Reverse transcription error also causes the genotype to gain on average V = 4×10−5/3 M1 = 1.3×10−5M1 molecules, where M1 is the expected total number of cDNA molecules for all neighbors of the focal genotype. Thus, the fractional gain of molecules for the genotype is expected to be V/M0 = 1.3×10−5M1/M0, which has little impact on expression estimation in our study. M1/M0 is expected to be about 50 for the wild-type and 11 for mutants whose expression levels are comparable with that of the wild-type. The corresponding fractional gains of molecules are 6.5×10−4 and 1.4×10−4, respectively. Even if a mutant has a REL as low as 0.1, M1/M0 is 110 and the fractional gain of the number of molecules is 1.4×10−3. As described, PCR and sequencing errors had virtually no effect. Hence, the overall error from reverse transcription, PCR, and sequencing is negligible in expression estimation.
In addition to correlating mutant REL with rescaled fitness (Fig. 3c), we used a linear mixed model to assess the relative importance of REL and mutation type (synonymous vs. nonsynonymous) to rescaled fitness, with gene identity added as a random effect. We separately analyzed mutants with REL <1 and those with REL >1, because of their apparently different relationships with rescaled fitness (Fig. 3c). For mutants with REL <1, the fraction of variance of rescaled fitness explained by REL is 61.5% (P < 2.2×10−16), while that explained by mutation type is only 0.2% (P = 0.0002). For mutants with REL >1, the fraction of variance of rescaled fitness explained by REL is 7.4% (P < 2.2×10−16), while that explained by mutation type is only 0.4% (P = 1.5×10−7). These results demonstrate that REL explains a substantially larger fraction of variance of rescaled fitness than does mutation type.
Additionally, after accounting for gene-specific effects using a mixed-effect model, we found the positive correlation between the rescaled fitness and REL to remain significant when REL <1 (P = 2.3×10−47). There is a marginally significant negative correlation between the rescaled fitness and REL when REL >1 (P = 0.048). We also attempted to fit a quadratic model using log2(REL) as an independent variable and accounted for a random effect of gene identity. Indeed, the hypothesis that the fitness peak is at REL = 1 could not be rejected.
Codon adaption index (CAI)
We computed CAI for the entire coding sequence of each wild-type or mutant gene, using previously reported yeast relative synonymous codon usage (RSCU) estimates34, which are highly correlated with those derived from the 200 most highly expressed genes (r = 0.995) 15.
mRNA folding strength (MFS)
The minimum free energy at 30°C was calculated for each wild-type or mutant mRNA sequence using RNAfold in ViennaRNA (2.4.17) with default parameters except for the temperature59. We define mRNA folding strength (MFS) as the absolute value of the minimum free energy.
TF-binding sites
TF-binding sites were searched in the wild-type for the 150-nucleotide target sequence plus the 20-nucleotide flanking sequence on each side using the database Yeastract33.
DFE estimation in SC + 37°C, YPD + 0.375mM H2O2, and YPE
The experiment followed that in DFE estimation in YPD, except that the competitions lasted for 20 generations (cells were transferred 6.5 and 13 generations after the start of the competition) and had three replicates per environment. Sequencing library preparation was unsuccessful for mutants of EST1 and PAF1 likely because of primer degradations. Therefore, we acquired the fitness data of mutants of 19 genes in these three additional environments. The fraction of mutants whose fitness is significantly different from 1 is lower here than in YPD, likely because of the reduced statistical power due to the lowered number of replications. Indeed, when we randomly sampled three of the four replicates from YPD, the fraction of mutants whose fitness is significantly different from 1 (nominal P <0.05) decreased to an average of 0.63 and 0.64 for synonymous and nonsynonymous mutants, respectively, similar to those observed in these three additional environments (Extended Data Fig. 8j–l). To examine whether the difference between synonymous and nonsynonymous mutants in fitness CV across the four environments is entirely due to a potential difference in mean fitness, we controlled the mean fitness in the four environments when comparing the across-environment fitness CV between synonymous and nonsynonymous mutants. Specifically, we used an identity index of 0 for each synonymous mutant and 1 for each nonsynonymous mutant. The partial Spearman’s correlation between the identity index and CV upon the control of the mean fitness in the four environments is 0.052 (P = 7.7×10−6).
Simulation of the impact of environmental changes on dN/dS
Our simulation assumed that the DFEs of synonymous and nonsynonymous mutations estimated from YPD hold in each environment, but the fitness effect of a mutation can vary across environments. We respectively constructed cumulative fitness distribution functions (CFDFs) of synonymous and nonsynonymous mutants from the corresponding fitness data collected in YPD. We started from all synonymous mutants with fitness measured in YPD, and ranked these mutants from low to high by their YPD fitness. We then added a random noise drawn from the normal distribution N(0, σ2) to each fitness value, and ranked the mutants by their new fitness values. Let us assume that, after the addition of noise, the mutant originally ranked i now had a rank of j. We then randomly sampled M synonymous mutants from the CFDF and ranked them by their fitness, where M is the number of synonymous mutants with fitness measured in YPD. We assigned the fitness of the mutant ranked the jth in these M sampled mutants to mutant i as its fitness in a new environment. The above procedure was repeated for each environment considered. Fitness CV among environments was controlled by adjusting σ2, with larger σ2 yielding greater CV. Many σ2 values were tried to achieve a target CV (difference between observed and target CV <0.0001). The same was done for nonsynonymous mutants. We set a higher CV for nonsynonymous than synonymous mutants. We set a fitness cutoff (0.98 or 0.99) and assumed that any mutant with fitness below the cutoff in any environment was purged. We then computed dN/dS by the fraction of unpurged nonsynonymous mutants divided by the fraction of unpurged synonymous mutants. Under each parameter set, we repeated the simulation 1000 times and reported the mean dN/dS and its 95% confidence interval.
Expected dN/dS in the four environments examined
To predict the expected dN/dS in long-term evolution in each of the four environments where DFEs were measured here, we considered all of the synonymous and nonsynonymous mutants with fitness measured in the environment. Because the fitness measures contained measurement errors, we added a random error term drawn from the normal distribution N(0, σse2) to the measured fitness, where σse is the mutant-specific standard error of the measured fitness estimated from the experimental replicates in the environment. We set a fitness cutoff and assumed that any mutant with fitness in the environment below the cutoff was purged. We then computed dN/dS by the fraction of unpurged nonsynonymous mutants divided by the fraction of unpurged synonymous mutants. In an environment that varies among the four individual conditions, we assumed that any mutant with fitness below the cutoff in any condition was purged. Because of random measurement errors considered, we repeated the prediction 1000 times and presented the 95% confidence interval of the predicted dN/dS.
Data availability
Sequencing data generated in this study have been deposited into NCBI with the Bioproject ID of PRJNA750109. All other data are presented in the paper and associated supplementary materials. Source data are provided with this paper. Public data used include gene function annotations in the Saccharomyces Genome Database (https://www.yeastgenome.org/) and genomic coding sequences of S. paradoxus, C. glabrata, and S. castellii and genomic sequences of S. mikatae and S. uvarum from the NCBI genome assembly database (https://www.ncbi.nlm.nih.gov/assembly/).
Code availability
Custom code is available at https://github.com/song88180/Mutational-Fitness-Effects and https://doi.org/10.5281/zenodo.5908478 (DOI: 10.5281/zenodo.5908478).
Extended Data
Extended Data Fig. 1. Properties of wild-type and mutant strains analyzed.

a, Experimental procedure for testing cellular respiratory functions. Cells from each of the 21 mutant libraries were spread on YPD and YPG plates, followed by colony counting after growth. Respiration is needed for cell growth on YPG but not on YPD. b, Mean ratio of YPD colony number to YPG colony number for each mutant library, based on three replicates per library. Error bars show the standard error of the mean. The negative control is deficient in respiration due to gene deletions (see Methods). c, Maximum growth rates of three reconstituted wild-type strains and BY4742. WT1 was used as the wild-type control in en masse competitions with mutants. The red error bar indicates the standard error of the mean based on 16 replicates each shown by a dot (15 for BY4742). P-values are from two-tailed t-tests. The growth rate is not significantly different among the four strains (P = 0.58, one-factor ANOVA test). d, Ploidy of one T48 population per mutant library assessed by flow cytometry. SYTOX Green fluorescence was analyzed using the BL2 detector that measured the output from the 488-nm laser (blue). In control flow cytometry profiles, the two peaks respectively represent cells in the G1 and G2/M cell-cycle stages (1C and 2C DNA content for haploids while 2C and 4C for diploids).
Extended Data Fig. 2. Mutant fitness quantification.

a, Fractions of synonymous (yellow) and nonsynonymous (blue) mutants among designed but unobserved mutants and those among observed mutants. Nonsense mutants are not considered. Numbers in the bars are numbers of mutants. The distributions of synonymous and nonsynonymous mutants among the unobserved and observed mutant groups are not significantly different (P > 0.05, Fisher’s exact test). b-f, Correlation between every two of the four replicates in estimated mutant fitness under YPD at 30°C. The correlation between replicate 1 and replicate 2 is presented in Fig. 1c. Each dot is a mutant and the dotted line indicates the diagonal. Pearson’s correlation r and its associated P-value are presented. Among-genotype sum of squares explains 93.8% of the total sum of squares (one-factor ANOVA).
Extended Data Fig. 3. Mutant fitness distribution under YPD at 30°C.

a, Distribution of the fitness of 169 nonsense mutants. The peak around 0.94 is caused by 26 nonsense mutants of GET1 that all have fitness of about 0.94. b, Cumulative frequency distributions of log10(mutant fitness) of nonsynonymous (blue) and synonymous (yellow) mutants. c, The full figure of Fig. 2c, including low-fitness mutants that are not shown in Fig. 2c. d, The full figure of Fig. 2e, including low-fitness and high-fitness mutants that are not shown in Fig. 2e.
Extended Data Fig. 4. Coding mutations influence the mRNA level of the mutated gene.

a, Non-significant negative correlation between the mean fitness of synonymous mutants of a gene and the expression level of the gene. Each dot represents a gene. Spearman’s correlation ρ and associated P-value are presented. b-g, Correlation in mutant REL between replicates, which are indicated on the axes of each panel. Each dot is a mutant, and the dotted line indicates the diagonal. Pearson’s correlation r and its associated P-value are presented. Among-genotype sum of squares explains 89.7% of total sum of squares (one-factor ANOVA). h, Cumulative frequency distributions of REL of nonsynonymous and synonymous mutants. i, Relative expression level (REL) distributions of nonsynonymous (blue) and synonymous (yellow) mutants of 20 individual genes shown by box plots. The lower and upper edges of a box represent the first (qu1) and third (qu3) quartiles, respectively, the horizontal line inside the box indicates the median (md), the whiskers extend to the most extreme values inside inner fences, md ± 1.5(qu3-qu1), and the dots show outliers. Nonsynonymous and synonymous distributions of each gene are compared by a two-tailed Wilcoxon rank-sum test, with FDR-adjusted P-values indicated as follows: *, P < 0.05; ⁑, P < 0.01, ⁂, P < 0.001. j, Distribution of REL of nonsense mutants.
Extended Data Fig. 5. Mechanisms underlying coding mutations’ fitness effects.

a-b, Box plots showing similar absolute fractional changes in the mRNA level induced by nonsynonymous (a) or synonymous (b) mutations within and outside TF-binding sites. The lower and upper edges of a box represent the first (qu1) and third (qu3) quartiles, respectively, the horizontal line inside the box indicates the median (md), the whiskers extend to the most extreme values inside inner fences, md ± 1.5(qu3-qu1), and the dots show outliers. P-values are from two-tailed Wilcoxon rank-sum test (n = 1191, 4736, 367, and 1411, respectively, for the four bars from left to right). c-d. Positive correlation between rCAI and rescaled fitness among nonsynonymous (c) and synonymous (d) mutants, respectively. e, Fraction of synonymous mutations lowering CAI increases with the expression level of the gene. f, Fraction of synonymous mutations lowering the expression level increases with the expression level of the gene. g, Fraction of nonsynonymous mutations lowering CAI increases with the expression level of the gene. h, Fraction of nonsynonymous mutations lowering the expression level increases with the expression level of the gene. i, Mean rescaled fitness of synonymous mutants declines with the expression level of the gene. j, Mean rescaled fitness of nonsynonymous mutants declines with the expression level of the gene. Because deleting a more highly expressed gene tends to cause a greater fitness reduction60, the finding in panel j means that the mean fitness reduction caused by a nonsynonymous mutation should rise with the expression level of the gene. In e-j, each dot represents a gene. k-l, positive correlation between the relative mRNA folding strength (rMFS) of a nonsynonymous (k) or synonymous (l) mutant and its rescaled fitness when rMFS is below 1. The rMFS of a mutant is its mRNA folding strength (i.e., the absolute value of its minimal folding energy) divided by that of the wild-type. In each panel, the correlation is separately computed for mutants with rMFS <1 and those with rMFS >1. In c-l, rank correlations (ρ) and associated P-values are shown.
Extended Data Fig. 6. A higher coefficient of variation (CV) of fitness across environments for nonsynonymous than synonymous mutants can create a nonsynonymous to synonymous substitution rate ratio (dN/dS) that is substantially below 1 despite similar fitness effects of synonymous and nonsynonymous mutations in each environment.

a, Mean expected dN/dS from 1000 simulations of a population that experiences multiple different environments. A mutant is purged if its fitness is lower than a preset cutoff such as 0.98 or 0.99 in any environment. Shaded areas represent 95% confidence intervals. a. Results with CV = 0.004 for synonymous mutants. b, Results with CV = 0.005 for synonymous mutants. Note that, under the fitness cutoff of 0.99, dN/dS starts to increase with the number (m) of environments when m is large. Raising m reduces the fraction of synonymous mutations that are always neutral (FANS) as well as the fraction of nonsynonymous mutations that are always neural (FANN). Because the fitness CV is larger for nonsynonymous than synonymous mutants in the simulation, FANN decreases with m more quickly than does FANS when m is small. When m is large, FANN is small, making it possible for FANS to decrease with m more quickly than FANN. As a result, dN/dS might increase with m when m is large.
Extended Data Fig. 7. Pairwise correlation between replicates in estimated mutant fitness in each of the three additional environments used.

a-c, Correlation between every two of the three replicates in estimated mutant fitness under SC at 37°C. Each dot is a mutant and the dotted line indicates the diagonal. Pearson’s correlation r and its associated P-value are presented. Among-genotype sum of squares explains 96.1% of the total sum of squares (one-factor ANOVA). d-f, Correlation between every two of the three replicates in estimated mutant fitness under YPD + 0.375 mM H2O2. Among-genotype sum of squares explains 94.4% of the total sum of squares. g-i, Correlation between every two of the three replicates in estimated mutant fitness under YPE. j, Correlation between replicates 1 and 3 in estimated mutant fitness under YPE after the exclusion of SNF6 mutants. k, Correlation between replicates 2 and 3 in estimated mutant fitness under YPE after exclusion of SNF6 mutants. Panels g-k suggest that the fitness estimates of SNF6 mutants in replicate 3 under YPE are unreliable, so are unused in fitness estimation in YPE. When SNF6 is excluded, among-genotype sum of squares explains 91.0% of the total sum of squares in YPE.
Extended Data Fig. 8. Mutant fitness in the three additional environments used.

a-c, Fractions of synonymous (yellow) and nonsynonymous (blue) mutants among designed but unobserved mutants and those among observed mutants in each environment. Nonsense mutants are not considered. Numbers in the bars are numbers of mutants. The distributions of synonymous and nonsynonymous mutants among the unobserved and observed mutant groups are not significantly different in each environment (P >0.05, Fisher’s exact test). d-f, Cumulative frequency distributions of fitness of nonsynonymous and synonymous mutants in each environment. g-i, Fitness distributions of nonsynonymous and synonymous mutants of 19 individual genes shown by box plots in each environment. The lower and upper edges of a box represent the first (qu1) and third (qu3) quartiles, respectively, the horizontal line inside the box indicates the median (md), the whiskers extend to the most extreme values inside inner fences, md ± 1.5(qu3-qu1), and the dots show outliers. Nonsynonymous and synonymous distributions for each gene are compared by a two-tailed Wilcoxon sum-rank test, with the FDR-adjusted P-value indicated as follows: *, P < 0.05; ⁑, P < 0.01, ⁂, P < 0.001. j-l, Fractions of mutants with fitness significantly below 1 (P <0.05), significantly above 1, and neither, respectively, in each environment. The error bar shows one standard error. The distributional difference between synonymous and nonsynonymous mutants among the three bins is tested by two-tailed Fisher’s exact test, with the P-value indicated. At FDR = 0.05, 40.7% and 0.7% of nonsynonymous mutations and 34.8% and 0.5% of synonymous mutations are significantly deleterious and beneficial, respectively, in SC+37°C. These values become 35.5%, 1.7%, 31.9% and 1.6% in YPD+H2O2, and 47.6%, 1.4%, 45.6%, and 1.0% in YPE.
Extended Data Fig. 9. Fractions of nonsynonymous (blue) and synonymous (yellow) neutral mutations in one environment (indicated on the X-axis) that become deleterious in any of the other three environments.

The fractions are higher for nonsynonymous than synonymous mutations (P <0.05, paired t-test). A mutation is considered deleterious if its fitness is significantly lower than 1 (P <0.05) and neutral if its fitness is not significantly different from 1.
Extended Data Fig. 10. A new model explaining the widespread negative correlation between the mRNA level of a gene and its evolutionary rate measured by the nonsynonymous or amino acid substitution rate.

Compared with nonsynonymous mutations in lowly expressed genes, those in highly expressed genes tend to reduce the gene expression level and hence tend to be deleterious. As a result, the evolutionary rate of a gene measured by the nonsynonymous or amino acid substitution rate is negatively correlated with the gene expression level. The height of a symbol represents the quantity considered.
Supplementary Material
Acknowledgements
We thank P. Chen, H. Liu, and H. Xu for technical assistance and W. Qian, X. Wei, J.-R. Yang, and members of the Zhang laboratory for valuable comments. This work was supported by the U.S. National Institutes of Health research grant R35GM139484 to J.Z.
Footnotes
Competing interests
The authors declare no competing interests.
References
- 1.Kimura M Genetic variability maintained in a finite population due to mutational production of neutral and nearly neutral isoalleles. Genet Res 11, 247–269 (1968). [DOI] [PubMed] [Google Scholar]
- 2.King JL & Jukes TH Non-Darwinian evolution. Science 164, 788–798 (1969). [DOI] [PubMed] [Google Scholar]
- 3.Nei M & Kumar S Molecular Evolution and Phylogenetics (Oxford University Press, 2000). [Google Scholar]
- 4.Li W-H Molecular Evolution (Sinauer, 1997). [Google Scholar]
- 5.Graur D, Sater AK & Cooper TF Molecular and Genome Evolution (Sinauer, 2016). [Google Scholar]
- 6.Hershberg R & Petrov DA Selection on codon bias. Annu Rev Genet 42, 287–299 (2008). [DOI] [PubMed] [Google Scholar]
- 7.Chamary JV, Parmley JL & Hurst LD Hearing silence: non-neutral evolution at synonymous sites in mammals. Nat Rev Genet 7, 98–108 (2006). [DOI] [PubMed] [Google Scholar]
- 8.Plotkin JB & Kudla G Synonymous but not the same: the causes and consequences of codon bias. Nat Rev Genet 12, 32–42 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Stergachis AB et al. Exonic transcription factor binding directs codon choice and affects protein evolution. Science 342, 1367–1372 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Zhou Z et al. Codon usage is an important determinant of gene expression levels largely through its effects on transcription. Proc Natl Acad Sci U S A 113, E6117–E6125 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Park C, Chen X, Yang JR & Zhang J Differential requirements for mRNA folding partially explain why highly expressed proteins evolve slowly. Proc Natl Acad Sci U S A 110, E678–E686 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Presnyak V et al. Codon optimality is a major determinant of mRNA stability. Cell 160, 1111–1124 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Chen S et al. Codon-resolution analysis reveals a direct and context-dependent impact of individual synonymous mutations on mRNA level. Mol Biol Evol 34, 2944–2958 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Kudla G, Murray AW, Tollervey D & Plotkin JB Coding-sequence determinants of gene expression in Escherichia coli. Science 324, 255–258 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Qian W, Yang JR, Pearson NM, Maclean C & Zhang J Balanced codon usage optimizes eukaryotic translational efficiency. PLOS Genet 8, e1002603 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Frumkin I et al. Codon usage of highly expressed genes affects proteome-wide translation efficiency. Proc Natl Acad Sci U S A 115, E4940–E4949 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Akashi H Synonymous codon usage in Drosophila melanogaster: natural selection and translational accuracy. Genetics 136, 927–935 (1994). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Sun M & Zhang J Preferred synonymous codons are translated more accurately: Proteomic evidence, among-species variation, and mechanistic basis. bioRxiv doi: 10.1101/2022.02.22.481448 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Buhr F et al. Synonymous codons direct cotranslational folding toward different protein conformations. Mol Cell 61, 341–351 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Walsh IM, Bowman MA, Soto Santarriaga IF, Rodriguez A & Clark PL Synonymous codon substitutions perturb cotranslational protein folding in vivo and impair cell fitness. Proc Natl Acad Sci U S A 117, 3528–3534 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Gilissen C, Hoischen A, Brunner HG & Veltman JA Disease gene identification strategies for exome sequencing. Eur J Hum Genet 20, 490–497 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Agashe D, Martinez-Gomez NC, Drummond DA & Marx CJ Good codons, bad transcript: large reductions in gene expression and fitness arising from synonymous mutations in a key enzyme. Mol Biol Evol 30, 549–560 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Kristofich J et al. Synonymous mutations make dramatic contributions to fitness when growth is limited by a weak-link enzyme. PLOS Genet 14, e1007615 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Lebeuf-Taylor E, McCloskey N, Bailey SF, Hinz A & Kassen R The distribution of fitness effects among synonymous mutations in a gene under directional selection. eLife 8, e45952 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Lind PA, Berg OG & Andersson DI Mutational robustness of ribosomal protein genes. Science 330, 825–827 (2010). [DOI] [PubMed] [Google Scholar]
- 26.Sharon E et al. Functional genetic variants revealed by massively parallel precise genome editing. Cell 175, 544–557 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.She R & Jarosz DF Mapping causal variants with single-nucleotide resolution reveals biochemical drivers of phenotypic change. Cell 172, 478–490 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Qian W, Ma D, Xiao C, Wang Z & Zhang J The genomic landscape and evolutionary resolution of antagonistic pleiotropy in yeast. Cell Rep 2, 1399–1410 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Li C, Qian W, Maclean M & Zhang J The fitness landscape of a tRNA gene. Science 352, 837–840 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Chen P & Zhang J Asexual experimental evolution of yeast does not curtail transposable elements. Mol Biol Evol 38, 2831–2842 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Keren L et al. Massively parallel interrogation of the effects of gene expression levels on fitness. Cell 166, 1282–1294 (2016). [DOI] [PubMed] [Google Scholar]
- 32.Chang YF, Imam JS & Wilkinson MF The nonsense-mediated decay RNA surveillance pathway. Annu Rev Biochem 76, 51–74 (2007). [DOI] [PubMed] [Google Scholar]
- 33.Monteiro PT et al. YEASTRACT+: a portal for cross-species comparative genomics of transcription regulation in yeasts. Nucleic Acids Res 48, D642–D649 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Sharp PM & Li WH The codon adaptation index—a measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acids Res 15, 1281–1295 (1987). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Radhakrishnan A et al. The DEAD-Box protein Dhh1p couples mRNA decay and translation by monitoring codon optimality. Cell 167, 122–132 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Yang JR, Chen X & Zhang J Codon-by-codon modulation of translational speed and accuracy via mRNA folding. PLOS Biol 12, e1001910 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Faure G, Ogurtsov AY, Shabalina SA & Koonin EV Role of mRNA structure in the control of protein folding. Nucleic Acids Res 44, 10898–10911 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Goncalves P, Valerio E, Correia C, de Almeida JM & Sampaio JP Evidence for divergent evolution of growth temperature preference in sympatric Saccharomyces species. PLOS One 6, e20739 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Kimura M The Neutral Theory of Molecular Evolution (Cambridge University Press, 1983). [Google Scholar]
- 40.Lewontin RC & Cohen D On population growth in a randomly varying environment. Proc Natl Acad Sci U S A 62, 1056–1060 (1969). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Gillespie JH Natural selection for within-generation variance in offspring number II. Discrite haploid models. Genetics 81, 403–413 (1975). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Kimura M & Ohta T The average number of generations until fixation of a mutant gene in a finite population. Genetics 61, 763–771 (1969). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Flynn JM et al. Comprehensive fitness maps of Hsp90 show widespread environmental dependence. eLife 9, e53810 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Dandage R et al. Differential strengths of molecular determinants guide environment specific mutational fates. PLOS Genet 14, e1007419 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Chen P & Zhang J Antagonistic pleiotropy conceals molecular adaptations in changing environments. Nat Ecol Evol 4, 461–469 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Azizoglu A, Brent R & Rudolf F A precisely adjustable, variation-suppressed eukaryotic transcriptional controller to enable genetic discovery. eLife 10, e69549 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Natsume T & Kanemaki MT Conditional degrons for controlling protein expression at the protein level. Annu Rev Genet 51, 83–102 (2017). [DOI] [PubMed] [Google Scholar]
- 48.Zhang J & Yang JR Determinants of the rate of protein sequence evolution. Nat Rev Genet 16, 409–420 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Wu Z et al. Expression level is a major modifier of the fitness landscape of a protein coding gene. Nat Ecol Evol 6, 103–115 (2022). [DOI] [PubMed] [Google Scholar]
- 50.Sauna ZE & Kimchi-Sarfaty C Understanding the contribution of synonymous mutations to human disease. Nat Rev Genet 12, 683–691 (2011). [DOI] [PubMed] [Google Scholar]
- 51.Lee TI & Young RA Transcriptional regulation and its misregulation in disease. Cell 152, 1237–1251 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Chou HJ, Donnard E, Gustafsson HT, Garber M & Rando OJ Transcriptome-wide analysis of roles for tRNA modifications in translational regulation. Mol Cell 68, 978–992 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Laughery MF et al. New vectors for simple and streamlined CRISPR-Cas9 genome editing in Saccharomyces cerevisiae. Yeast 32, 711–720 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Warringer J, Ericson E, Fernandez L, Nerman O & Blomberg A High-resolution yeast phenomics resolves different physiological features in the saline response. Proc Natl Acad Sci U S A 100, 15724–15729 (2003). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Honlinger A et al. Tom7 modulates the dynamics of the mitochondrial outer membrane translocase and plays a pathway-related role in protein import. EMBO J 15, 2125–2137 (1996). [PMC free article] [PubMed] [Google Scholar]
- 56.Potapov V & Ong JL Examining sources of error in PCR by single-molecule sequencing. PLOS One 12, e0169774 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Stanke M & Morgenstern B AUGUSTUS: a web server for gene prediction in eukaryotes that allows user-defined constraints. Nucleic Acids Res 33, W465–W467 (2005). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Ranwez V, Douzery EJP, Cambon C, Chantret N & Delsuc F MACSE v2: Toolkit for the alignment of coding sequences accounting for frameshifts and stop codons. Mol Biol Evol 35, 2582–2584 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Hofacker IL et al. Fast folding and comparison of RNA secondary structures. Monatshefte für Chemie 125, 167–188 (1994). [Google Scholar]
- 60.Zhang J & He X Significant impact of protein dispensability on the instantaneous rate of protein evolution. Mol Biol Evol 22, 1147–1155 (2005). [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Sequencing data generated in this study have been deposited into NCBI with the Bioproject ID of PRJNA750109. All other data are presented in the paper and associated supplementary materials. Source data are provided with this paper. Public data used include gene function annotations in the Saccharomyces Genome Database (https://www.yeastgenome.org/) and genomic coding sequences of S. paradoxus, C. glabrata, and S. castellii and genomic sequences of S. mikatae and S. uvarum from the NCBI genome assembly database (https://www.ncbi.nlm.nih.gov/assembly/).
