Skip to main content
Molecular Biology and Evolution logoLink to Molecular Biology and Evolution
. 2017 May 4;34(10):2486–2502. doi: 10.1093/molbev/msx151

Deciphering the Genic Basis of Yeast Fitness Variation by Simultaneous Forward and Reverse Genetics

Calum J Maclean 1,#, Brian PH Metzger 2,#,a, Jian-Rong Yang 3,#,b, Wei-Chin Ho 4, Bryan Moyers 5,c, Jianzhi Zhang 6,
PMCID: PMC12104513  PMID: 28472365

Abstract

The budding yeast Saccharomyces cerevisiae is the best studied eukaryote in molecular and cell biology, but its utility for understanding the genetic basis of phenotypic variation in natural populations is limited by inefficient association mapping due to strong and complex population structure. To overcome this challenge, we generated genome sequences for 85 strains and performed a comprehensive population genomic survey of a total of 190 diverse strains. We identified considerable variation in population structure among chromosomes and identified 181 genes that are absent from the reference genome. Many of these nonreference genes are expressed and we functionally confirmed that two of these genes confer increased resistance to antifungals. Next, we simultaneously measured the growth rates of over 4,500 laboratory strains, each of which lacks a nonessential gene, and 81 natural strains across multiple environments using unique DNA barcode present in each strain. By combining the genome-wide reverse genetic information gained from the gene deletion strains with a genome-wide association analysis from the natural strains, we identified genomic regions associated with fitness variation in natural populations. To experimentally validate a subset of these associations, we used reciprocal hemizygosity tests, finding that while the combined forward and reverse genetic approaches can identify a single causal gene, the phenotypic consequences of natural genetic variation often follow a complicated pattern. The resources and approach provided outline an efficient and reliable route to association mapping in yeast and significantly enhance its value as a model for understanding the genetic mechanisms underlying phenotypic variation and evolution in natural populations.

Keywords: Saccharomyces cerevisiae, GWAS, growth rate, genome sequencing, population structure, drug resistance

Introduction

Understanding the genetic basis of phenotypic variation is a major goal of modern biology. Model organisms play a prominent role in this endeavor because of the wealth of accumulated biological information and tools available for manipulating and examining these organisms. The budding yeast Saccharomyces cerevisiae has long been a favored eukaryotic model organism to molecular and cell biologists and was the first eukaryote to have its genome fully sequenced (Goffeau etal. 1996). Large-scale phenotyping of gene deletion (Giaever etal. 2002; Ohya etal. 2005; Hillenmeyer etal. 2008; Giaever and Nislow 2014) and overexpression (Sopko etal. 2006; Douglas etal. 2012) strains has provided extensive data on gene function, while the availability of genome sequences from closely related species (Kellis etal. 2003; Dujon etal. 2004; Scannell etal. 2011; Hittinger 2013; Liti etal. 2013) offers evolutionary insights into genotype–phenotype mapping.

Prior to 2009, the identification of genotype–phenotype relationships in yeast relied on only a few laboratory strains, which are now known to be phenotypic outliers (Liti etal. 2009; Warringer etal. 2011). Recent years have seen intensified research on the natural diversity and ecology of S. cerevisiae. For instance, S. cerevisiae has been isolated globally from diverse natural and man-made environments (Liti etal. 2009; Wang etal. 2012) and shown to harbor greater phenotypic diversity but much lower genetic diversity than its sister species S. paradoxus (Liti etal. 2009; Warringer etal. 2011). As such, S. cerevisiae has great potential for linking natural variation in phenotype to genetic variation in individual genes, a process that requires knowledge of both yeast genomic diversity and population structure. Such knowledge has accumulated primarily through low-coverage Sanger sequencing (Liti etal. 2009) and tiling array hybridization (Schacherer etal. 2009). These studies, as well as restriction-site associated DNA sequencing of a large strain set (Cromie etal. 2013), have revealed a complex population structure of S. cerevisiae. Higher-quality genomes produced by next-generation sequencing have further revealed the presence of both copy-number variants and genomic rearrangements (Bergström etal. 2014; Hose etal. 2015; Strope etal. 2015), as well as the origins of domestic S. cerevisiae strains (Gallone etal. 2016; Gonçalves etal. 2016). Unfortunately, while strains representing pure lineages are often phenotypically distinct, many S. cerevisiae strains are mosaics with complex ancestry from multiple lineages due to human activity (Warringer etal. 2011). This strong and complex population structure has made genome-wide association study (GWAS), an important forward genetic method for detecting influential genetic variants in many species, difficult in yeast (Connelly and Akey 2012; Diao and Chen 2012). Consequently, the use of this otherwise powerful model species for systematic analysis of the genetic basis of natural phenotypic variation has been hindered.

To overcome this hurdle, we developed a resource for efficient GWAS in S. cerevisiae that simultaneously combines forward and reverse genetic analyses. While these approaches are individually commonly used in the yeast community (Smith etal. 2011; Liti and Louis 2012; Swinnen etal. 2012; Fay 2013; Giaever and Nislow 2014; Long etal. 2015), our approach allows both forward and reverse genetic information to be gained from a single experiment. To do so, we first generated genome sequences of 85 diverse S. cerevisiae strains that are genetically and phenotypically variable. Combining this data with available genome sequences from the literature, we assembled a dataset of 190 S. cerevisiae genomes and conducted a comprehensive population genomic analysis, identifying single nucleotide polymorphisms (SNPs) at ∼3.5% of sites. From this information, we elucidated detailed phylogenetic relationships among strains and the broad population structure of the species. We detected genes from the newly sequenced genomes that are absent from the reference genome and demonstrated their expression and functions. We then barcoded the newly sequenced strains and simultaneously phenotyped them with over 4,500 single gene deletion strains by a high-throughput barcode-sequencing (bar-seq) method (Smith etal. 2009). With the assistance of this reverse genetic information, our GWAS identified potential causal genes responsible for growth rate variations in five of six environments examined. We experimentally verified a subset of these associations for high-temperature growth by a reciprocal hemizygosity test (Steinmetz etal. 2002), establishing the combination of simultaneous association mapping and reverse genetics as a powerful approach for unbiased identification of the genic basis underlying fitness variation among natural S. cerevisiae strains.

Results

Genomic analysis of S. cerevisiae Reveals a Complex Population Structure

To identify genetic variation underlying phenotypic variation among S. cerevisiae strains, we generated genome sequences of 85 strains collected in six continents and from a variety of human-associated and wild environments (fig. 1a; supplementary data S1, Supplementary Material online). We obtained an average of 3.75 million 2 × 100-nucleotide paired-end reads per strain, approximately 97% of which were successfully mapped to the S288c reference genome. This resulted in an average coverage of 60× per genome (range 38–99×) (supplementary data S2, Supplementary Material online). On average, 6% of the reference genome was not covered by a read in each sequenced strain due to stochastic sampling of reads and/or strain differences in gene content as well as repeat elements. In total, we identified 311,287 single nucleotide polymorphisms (SNPs) and 15,884 insertions/deletions (indels).

Fig. 1.


Fig. 1

Geographical, environmental, and phylogenetic relationships of the 190 S. cerevisiae strains analyzed. (a) World map indicating the geographic locations where the analyzed strains were sampled. Colors represent the environment of isolation if known. (b) Maximum composite likelihood neighbor-joining tree of the 190 strains based on genome-wide SNP data. The environment type from which each strain was isolated is indicated as a colored circle. Branches are colored to denote clade. The scale bar represents 0.1% genome sequence divergence. Strain names in black are those sequenced in this work, while those in gray were sequenced previously. The same tree with bootstrap values is shown in supplementary figure S1, Supplementary Material online. Trees based on individual chromosomes are provided in supplementary figure S2, Supplementary Material online. (c) Population structures of the 190 strains. Strains are arrayed based on clade membership in panel b. Different colors show different inferred populations, which are indicated at the top of the panel. The Y-axis shows the fraction of SNPs coming from each inferred population. (d) Linkage disequilibrium (LD) decays as the physical distance between two linked sites becomes larger. LD is measured by r2 between two linked sites minus the mean r2 between two sites located on different chromosomes.

To acquire a more comprehensive view of S. cerevisiae strain relations, we identified 105 additional strains that have publically available genome sequences (Hose etal. 2015; Strope etal. 2015) at the time of our analysis (August 2016), and applied our analysis pipeline to the Illumina sequencing reads of these strains. A neighbor-joining tree of all 190 strains was then constructed on the basis of a combined set of 421,773 SNPs (fig. 1b; supplementary fig. S1 and data S1S2, Supplementary Material online). The tree was rooted using outgroup sequences from recently identified Chinese isolates (Wang etal. 2012). We recovered phylogenetic clustering based roughly on geographical and environmental origins of the strains, consistent with early observations made from fewer strains and SNPs (Liti etal. 2009; Schacherer etal. 2009). Clustering can be seen of strains into the West African, Malaysian/North American, Sake, Laboratory, and European/Wine groups previously identified. Additionally, we identified a “Bakery” clade that was previously suggested to exist by microsatellite-based analysis (Legras etal. 2007) and a new Natural clade of three wild strains (one from soil in Illinois and two on the gums of wild cherry trees from unknown locations) (fig. 1b). The remaining strains, originating from a wide variety of environments, form a group named “Mosaics” (fig. 1b; see below). While this work was under preparation for submission, two genomic studies focusing on closely related wine and beer strains were published (Gallone etal. 2016; Gonçalves etal. 2016). Because these domestic strains largely represent a single clade in our analysis, these two studies do not offer a species-wide view of yeast's evolutionary history and the exclusion of these strains does not affect the overall phylogenetic patterns observed in our analysis.

To more closely examine the population structure of S. cerevisiae, we employed a model-based clustering algorithm implemented in fastSTRUCTURE (Raj etal. 2014) and identified seven distinct subpopulations (top row in fig. 1c) that are in agreement with the strain isolation sources and corroborate the clustering pattern seen in the phylogeny. In addition, many strains that do not fall within a specific clade are mosaics with ancestry from several lineages (top row in fig. 1c), supporting previous observations based on smaller data (Liti etal. 2009). We then conducted the fastSTRUCTURE analysis for each of the 16 chromosomes and observed prominent among-chromosome variations in population structure (fig. 1c). For example, the West African subpopulation is genetically distinct from other subpopulations for 10 of its 16 chromosomes, but is indistinct from the sake subpopulation for three chromosomes and indistinct from the Malaysian/North American subpopulation for another three chromosomes. This variation in chromosomal population structure is indicative of differences in the evolutionary histories of different chromosomes due to pervasive gene flow. Because S. cerevisiae reproduces largely asexually (Tsai etal. 2008), rare crosses between lineages can establish unique populations where distinct chromosome combinations persist in the absence of outbreeding. The observed differences in chromosomal population structure are not due to stochasticity in the population structure assessment because multiple runs on the same chromosome displayed only minor variations. Furthermore, phylogenies reconstructed using SNPs from individual chromosomes corroborate the fastSTRUCTURE results (supplementary fig. S2, Supplementary Material online).

The extent of linkage disequilibrium (LD) between SNPs is an important characteristic determining the highest possible resolution of association analysis; the lower the LD, the higher the resolution can be. We found that the mean LD measured by r2 equals 0.0164 for SNPs within 100 nucleotides and it halves as the physical distance increases to 1,200 nucleotides (fig. 1d). This fast breakdown of linkage disequilibrium is similar to previous reports (Liti etal. 2009). Given that the average distance between the beginning of one gene and that of the next gene on the chromosome is ∼2 kb in yeast, this result indicates that fine-scale mapping to the gene level should be theoretically possible by GWAS in this species. Nevertheless, due to the variation in recombination rate across the yeast genome, mapping resolution is expected to vary among genomic regions.

To further characterize the genetic variation present in S. cerevisiae, we conducted a comprehensive population genomic analysis of all 190 strains. The basic population genetic parameters are summarized in table 1. Intronic and intergenic polymorphisms (θW) and nucleotide diversities (π) are significantly lower than those at synonymous sites, suggesting pervasive purifying selection acting on noncoding regions. This is consistent with the fact that the compact nature of the yeast genome results in intergenic regions dense with promoter and other important regulatory elements and that yeast introns can regulate gene expression (Juneau etal. 2006; Parenteau etal. 2008). Consistent with previous analyses conducted using a smaller data set (Liti etal. 2009), we found no clear sign of positive selection using the majority of population genetic tests we applied, and the estimated fraction of adaptive amino acid substitutions was 0. However, removing the Wine/European strains from analysis resulted in the detection of positive section in one test, suggesting that sampling biases and population structure may confound current approaches for detecting positive selection (supplementary figs. S3–S5 and tables S1–S2, Supplementary Material online; see “Materials and Methods” for details).

Table 1.

Polymorphism (θW) and Nucleotide Diversity (π) Per kb in Different Regions of the Yeast Genome.

θ W Π
Mean SD Mean SD
Intergenic 5.63 0.018 2.67 0.015
Intronic 7.52 0.135 3.68 0.122
Synonymous 14.60 0.035 9.08 0.037
Nonsynonymous 3.63 0.010 1.37 0.007
Total 5.97 0.009 2.99 0.009

Finally, for the 85 newly sequenced genomes, we also analyzed and observed intron presence/absence polymorphisms (supplementary fig. S6, Supplementary Material online; see “Materials and Methods”) and identified aneuploidies and large segmental duplications (supplementary fig. S7, Supplementary Material online; see “Materials and Methods”). Similar to the results of a previous in-depth investigation into Saccharomyces mitochondrial DNA from a largely different set of strains (Wolters etal. 2015), we found several differences in intron content between related isolates. We also detected many small- and large-scale duplication, many of which show no clear pattern of phylogenetic distribution (Hose etal. 2015).

Some Non-Reference Genes Confer Drug Resistance

Our knowledge of S. cerevisiae genome content and function is derived largely from a few laboratory strains, which are now known to be phenotypically atypical (Warringer etal. 2011). Furthermore, S288c, the strain from which the reference set of genes are defined, was constructed in the 1980s largely from a strain that was isolated in the late 1930s (Mortimer and Johnston 1986). Because the reference strain has been maintained in relatively benign and unvarying laboratory environments and has undergone repeated population bottlenecks, often to single individuals, the possibility arises that genes important for survival outside of the laboratory environment have been lost. As a consequence, identifying coding regions not present in the reference strain has the potential to explain phenotypic variation among strains. Indeed, recent reports have confirmed the presence of several nonreference genes in the genomes of natural and industrial S. cerevisiae strains (Novo etal. 2009; Borneman and Pretorius 2015; McIlwain etal. 2016). To further our understanding of the distribution and importance of nonreference genes, we additionally performed a de novo genome assembly of the Illumina sequencing reads obtained from our 85 strains (see supplementary data S3, Supplementary Material online for assembly statistics). We identified 181 distinct nonreference genes distributed across the phylogeny (fig. 2a;supplementary fig. S8 and data S4, Supplementary Material online). The majority were found to have a BLAST hit in a previously sequenced nonS288c S. cerevisiae strain, while others had best hits in other fungi or more distantly related organisms (fig. 2a). The phylogenetic distribution of nonreference genes did not follow a straightforward pattern; distantly related strains often share nonreference genes, suggesting independent gains via introgression or horizontal gene transfer, independent losses, or segregating polymorphism (fig. 2a;supplementary fig. S8, Supplementary Material online).

Fig. 2.


Fig. 2

Origins, expressions, and functions of nonreference genes identified from the 85 newly sequenced S. cerevisiae genomes. (a) Evolutionary origins of nonreference genes. Color indicates the number of nonreference genes identified from each of the 85 genomes (strain name shown at the bottom of the panel) that received the best hit in a particular species listed at the right-hand side of the panel. The relationships of the 85 strains, as in figure 1b, are indicated by the phylogeny. (b) Expression levels of nonreference genes estimated using existing mRNA sequencing data for 23 of the 85 sequenced strains. Each circle represents a nonreference gene in the strain indicated at the bottom of the panel. The red horizontal bar represents the lower fifth percentile of gene expression levels of all reference genes in that strain. FPKM, Fragments Per Kilobase of transcript per Million mapped reads. (c) Fitness consequence of deleting Non-Ref-129 from UWOPS87-2421 in the presence of the antifungal fluconazole. Fitness is quantified by efficiency (maximum OD). Wild-type and deletion strain data are shown by black solid line and dashed gray line, respectively. P-values from t-tests of the null hypothesis of no fitness effect from the gene deletion are indicated as follows: *≤0.05; **≤0.01; ***≤0.001. Error bars represent the standard error of the mean from three replicates. Maximum growth rate and phylogenetic position of strains containing Non-Ref-129 are shown in supplementary figure S9a, Supplementary Material online. (d) Fitness consequence of deleting Non-Ref-67 from CLIB272 in the presence of the antifungal cantharidin. All notations are the same as in panel c. Maximum growth rate and phylogenetic position of strains containing Non-Ref-67 variants are shown in supplementary figure S9b, Supplementary Material online.

To begin characterization of these nonreference genes, we first estimated their expression levels in 23 S. cerevisiae strains with available transcriptome data generated by mRNA sequencing (RNA-seq) (Skelly etal. 2013). We found that on average, 54.9% of the nonreference genes examined had a higher expression level than each of the 5% most lowly expressed reference genes (fig. 2d;supplementary data S5, Supplementary Material online). Because the RNA-seq data were collected in a single benign environment, it is likely that more of the nonreference genes identified are expressed at appreciable levels in the appropriate environments due to conditional expression.

To examine the functional importance of nonreference genes, we focused on two of them for which our initial BLAST search revealed their closest hit to be within the well annotated Lachancea thermotolerans genome. Following Liti etal. (2009) and Warringer etal. (2011), we used growth curves to determine the phenotypic consequences of deleting these nonreference genes on two aspects of strain growth, maximum growth rate and efficiency (see “Materials and Methods”). Non-Ref-129, identified from the Malaysian strain UWOPS87-2421, resembles the L. thermotolerans coding region KLTH0E00528g, which is annotated as a homolog of S. cerevisiae FLR1, a multi-drug transporter responsible for the efflux of drugs such as the widely used antifungal fluconazole (Gbelska etal. 2006). Thus, Non-Ref-129, expressed even in a benign environment (5.25 RPKM; supplementary data S5, Supplementary Material online), may confer resistance to this important drug. We deleted Non-Ref-129 from haploid UWOPS87-2421 cells and exposed both the wild-type and deletion strains to various fluconazole concentrations to investigate the impact of gene deletion on strain growth (fig. 2c). Deleting Non-Ref-129 had a small but significant effect on the maximum growth rate (supplementary fig. S9a, Supplementary Material online) and a large effect on growth efficiency, especially when fluconazole concentration is comparable with a typical high-dose fluconazole treatment in clinical settings (>25 µg/ml) (Menichetti etal. 1996; Martin 1999) (fig. 2c). If Non-Ref-129 is indeed a drug transporter similar to FLR1, it may also be involved in resistance to diazaborine, benomyl, methotrexate, and other drugs (Brôco etal. 1999; Jungwirth etal. 2000).

To better understand the history of Non-Ref-129, we performed additional BLAST searches in other published S. cerevisiae genomes that were built de novo (Strope etal. 2015). We found in YJM653 an intact Non-Ref-129 and in YJM1250 an apparently pseudogenized Non-Ref-129 that is disrupted by an insertion; these two strains respectively reside at the edge of and within the Wine/European clade, both being highly diverged from UWOPS87-2421 (supplementary fig. S9a, Supplementary Material online). UWOPS87-2421 and YJM653 differ at only one nonsynonymous site and no synonymous sites in this 1644-nucleotide gene. The highly similar chromosomal locations of all three Non-Ref-129 genes in an unstable telomeric region (UWOPS87-2421, ChX:33185-34826; YJM653, ChX:30819-32462; YJM1250, ChX:22819-24457) suggest a single origin of Non-Ref-129 in S. cerevisiae.

The second nonreference gene experimentally studied, Non-Ref-67, was initially identified in strains CLIB272 and Y6 and found to be similar to the L. thermotolerans gene KLTH0H09460g, which is homologous to CRG1 in S. cerevisiae, a methyltransferase gene involved in lipid homeostasis and providing resistance to the phosphatase inhibitor cantharidin (Lissina etal. 2011). Unlike Non-Ref-129, Non-Ref-67 is not expressed in benign conditions (supplementary data S5, Supplementary Material online). However, because CRG1 expression increases by 40- to 50-fold upon exposure to cantharidin and other stresses (Lissina etal. 2011), it is possible that Non-Ref-67 expression is condition-specific. We deleted Non-Ref-67 from haploid CLIB272 cells and exposed wild-type and deletion strains to varying concentrations of cantharidin (fig. 2d; see “Materials and Methods”). The deletion of Non-Ref-67 does not alter the maximum growth rate consistently (supplementary fig. S9b, Supplementary Material online), but has a significant effect on growth efficiency at intermediate drug levels, with the deletion strain reaching only ∼50% of the maximum OD of the wild-type strain at a cantharidin concentration of 12 µM (fig. 2d). Additional BLAST searches detected Non-Ref-67 in five S. cerevisiae genomes previously published (Strope etal. 2015). We identified two distinct versions of Non-Ref-67 distributed across the phylogeny (inset of supplementary fig. S9b, Supplementary Material online) that differ at 10 sites, including four nonsynonymous sites. However, the similar genomic location of these two variants, in a telomeric region of Ch. XV, suggests a single origin.

Two possible scenarios can explain the origin and phylogenetic distribution of each of the nonreference genes studied in depth here (Non-Ref-129 and Non-Ref-67). The first is that a gene arose via horizontal gene transfer after the separation of S. cerevisiae from S. paradoxus. The acquired gene may not have been fixed in S. cerevisiae if the transfer was recent. Alternatively, the gene may have been fixed, followed by multiple losses within S. cerevisiae. The two nonreference genes have relatively low levels of sequence identity to the L. thermotolerans genes mentioned, suggesting that the donor species have yet to be identified. The second scenario is that these nonreference genes arose from gene duplication. The relatively large sequence dissimilarities of these genes with their closest paralogs in S. cerevisiae suggest that the duplication events were ancient, implying multiple independent losses of these genes in several yeast species as well as within S. cerevisiae, which is possible given their subtelomeric locations. While the first scenario appears more parsimonious, both scenarios remain possible at this stage.

Bar-Seq Allows High-Throughput Simultaneous Phenotyping of Thousands of Strains

Accurate phenotyping is central to uncovering the genetic basis of phenotypic variation. Phenotyping different natural yeast strains has primarily relied on the production and analysis of growth curves (Warringer etal. 2011) or digital photography-based colony sizes (Bloom etal. 2013), which have limited throughput and resolution and can be time-consuming. We decided to adopt bar-seq (Smith etal. 2009) for phenotyping, which allows for the simultaneous measurement of the growth rates of all strains of interest in the same test tube through Illumina sequencing of strain-specific DNA barcodes. Bar-seq was originally designed to quantify the relative growth rates of S288c-derived gene deletion strains each carrying two unique pieces of 20-nucleotide DNA (barcodes) inserted at the time of strain construction (Winzeler etal. 1999; Giaever etal. 2002). We similarly constructed a panel of barcoded strains from a subset of the 85 strains sequenced in this work. Each carries two unique barcodes which are not present in any deletion strain. This not only allows the bar-seq experiment of the natural strains but also that of natural strains and deletion strains all in one test tube (fig. 3a). Although our methodology could easily be used to expand the strain set in the future, at the time of our phenotypic analysis the additional 105 sequenced strains included in our phylogenetic analysis were unavailable.

Fig. 3.


Fig. 3

Simultaneous high-throughput phenotyping of 81 natural isolates and 4,521 gene deletion strains. (a) Flow chart showing the procedure of barcoding the natural isolates and simultaneously phenotyping natural isolates and gene deletion strains. (b) Heat map showing the fitness of 81 natural isolates relative to the reference strain (S288c-derived HO deletion strain) in each stressful environment, relative to that in the benign environment of YPD at 30 °C. The colored arrows show the color scheme for relative fitness, with the most extreme colors depicting the most extreme fitness values in each environment. Colored circles indicate the environmental origins of the natural isolates as in figure 1a. (c) The fitness of 81 natural isolates and 4,521 S288c derived gene deletion strains relative to the reference strain in the high temperature environment, relative to that in the benign environment. The natural isolates are shown by colored circles, based on the color scheme in figure 1a. The gene deletion strains are shown by black or gray circles depending on their locations on odd-numbered or even-numbered chromosomes, respectively, and are arranged by chromosomal position. Dashed vertical lines indicate the regions identified as significant by GWAS with those shown in red denoting those further investigated by hemizygosity tests in figure 4.

We successfully inserted the unique barcodes, flanking a G418 sulfate resistance marker (KanMX4), at the HO (YDL227C) locus of 81 of the 85 diploid strains sequenced here (supplementary fig. S10 and data S1, Supplementary Material online; see “Materials and Methods”). Of these 81 heterozygous HO/hoΔ::KanMX4 strains, we obtained stable a and α haploids for 76 of them. We then created MATa hoΔ::HygMX4 (Hygromycin B resistance) and MATα hoΔ::NatMX4 (nourseothricin resistance) haploids through marker switching (supplementary data S1, Supplementary Material online; see “Materials and Methods”).

To test the utility of these barcoded strains for studying the genetic basis of phenotypic variation, we combined the 81 barcoded diploid strains with the 4,653 diploid strains from the S. cerevisiae homozygous nonessential gene deletion collection to create a common starter pool to use across our experiments (fig. 3a). We grew this pool to a benign environment (YPD at 30 °C) as well as six stressful environments: high temperature (YPD at 40 °C), high salt/osmotic stress (YPD at 30 °C + 1.25 M NaCl), high ethanol (YPD at 30 °C + 7% ethanol), superoxide anions (YPD at 30 °C + 4 mM paraquat), oxidizing agents (YPD at 30 °C + 3 mM hydrogen peroxide), and a hypoxia mimetic (YPD at 30 °C + 1 mM cobalt chloride). We extracted genomic DNAs from the common starting pool and following each competition, produced bar-seq libraries, and quantified the sequencing read number of each barcode, a proxy for strain frequency, at each time point, by Illumina sequencing (fig. 3a). We estimated the fitness of each strain in a particular environment relative to the benign 30 °C YPD environment, relative to the BY4743-derived HO deletion strain (Giaever etal. 2002) to identify strains with particularly high or low relative fitness in the environment of interest (fig. 3b). While there is a clear phylogenetic component to the phenotypic similarity among some strains, being particularly apparent within the North American and Malaysian clades, other strains appear phenotypically diverged from the clades they are most closely related to. The bar-seq data also allowed us to determine the effect each gene deletion has across the tested environments, revealing that relative fitness varies greatly among the gene deletion strains, with 13.8% (623/4521) of gene deletion strains having significantly higher relative fitness and 18.3% (829/4521) having significantly lower relative fitness than the reference strain in high temperature (fig. 3c;supplementary data S6, Supplementary Material online). Similar patterns are observed in the other five environments examined (supplementary fig. S11 and data S6, Supplementary Material online). Broadly, the fitnesses of both the gene deletion strains and the natural strains are positively correlated across the environments tested (supplementary fig. S12, Supplementary Material online).

Combining Forward and Reverse Genetics Reveals SNPs Underlying Phenotypic Diversity

Using a multi-stage GWAS approach (supplementary fig. S13, Supplementary Material online; see “Materials and Methods”), we attempted to identify SNPs associated with relative fitness variation among the 81 natural strains. We discovered between 3 and 19 associated SNPs per environment after controlling for population structure for five of the six environments examined (supplementary data S7, Supplementary Material online). For example, we detected 13 SNPs associated with relative fitness at 40 °C. Five of the 13 SNPs map to a ∼16 kb region on Ch. XI that contains the ribosomal protein gene RPS21a, deletion of which is known to slow growth at high temperature (Sinha etal. 2008). Similarly, a second SNP on Ch. XI, located 51.5 kb from this cluster, is within 7 kb upstream and downstream of the genes DBP7, RPC37 and GCN3, which again are known to reduce heat tolerance upon deletion, as well as SET3 and YKR023C, known to reduce stress tolerance when deleted.

While association studies rarely validate that the identified SNPs or linked regions are responsible for the observed phenotypic differences, validation can be performed in yeast. Such validations are an increasingly important step in understanding the signals identified by GWAS because many confounding factors—such as the strong population structure and linkage (Connelly and Akey 2012)—can lead to false signals that are hard to untangle without direct genetic manipulations. To this end, we used reciprocal hemizygosity tests to identify difference in fitness due to deletion of alternative alleles in hybrids of high- and low-fitness strains at 40 °C relative to 30 °C for genes surrounding each of several associated SNPs (fig. 4a; see “Materials and Methods”). In each competition experiment, one strain expressed yellow fluorescent protein (YFP), facilitating the quantification of relative fitness by flow cytometry (He etal. 2010). Reciprocal experiments with the YFP marker in opposing hybrid background were performed to remove any fitness effect of YFP expression. We chose two strains with high fitness (YPS128 and YJM320) and two with low fitness (W303, UWOPS05-227.2) at 40 °C relative to 30 °C and selected three significant SNPs for investigation, reciprocally deleting four to six genes surrounding each SNP (supplementary data S7, Supplementary Material online).

Fig. 4.


Fig. 4

Reciprocal hemizygosity test for causal effects of candidate genes surrounding SNPs identified by GWAS to be associated with relative fitness at 40 °C. (a) Reciprocal hemizygosity test. Blue- and red-outlined cells depict strains carrying the predicted high- and low-fitness alleles, respectively. Black crosses indicate gene deletion. Yellow colored cells indicate YFP expression. The frequency of YFP and nonYFP expressing cells were determined at two time points. Competitions above and below the dashed line have alternative genotypes marked with YFP, allowing removal of potential fitness effects of YFP expression. (b) Fitness of the hemizygous strain deficient for the low-fitness allele of a candidate gene relative to the hemizygous strain deficient for the high-fitness allele in the high-temperature environment, relative to a benign environment. The low- and high-fitness alleles are from W303 and YPS128, respectively. Significant deviation of relative fitness from 1 is determined by a t-test using biological replicates and is indicated as follows: *P ≤ 0.05; **P ≤ 0.01; ***P ≤ 0.001. Error bars denote the 95% confidence intervals determined by Fieller's theorem. The genes examined are shown at the bottom of the panel, with black arrows pointing to the significantly associated SNPs detected from GWAS. The number below each gene is the fitness of the gene deletion strain relative to that of the reference strain at 40 °C relative to 30 °C, as shown in figure 3c. Genes with significant positive effects on relative fitness (when deleted) are indicated in green, while significant negative effects are indicated in red. Gray and white coloring of genes denote no significant effect upon deletion and no data available, respectively. (c) Same as panel b except for a different genomic region. (d) Same as panel b except for a different genomic region and the strains used. The low- and high-fitness alleles are from UWOPS05-227.2 and YJM320, respectively.

The first SNP of interest, the most significant identified, is located at site 490,822 of chromosome XI within the bidirectional promoter of GCN3 and BCH2, both annotated as having temperature related deletion phenotypes. In addition, two neighboring genes (DBP7 and RPC37) are similarly annotated. Our bar-seq data showed that deleting DBP7 from S288c drastically reduced fitness at 40 °C compared to that at 30 °C. To investigate if variations in these genes cause the fitness variation in natural populations, we individually deleted the alternative alleles in W303/YPS128 hybrids for each of the above four genes and compared their growth rates at 40 °C relative to that at 30 °C. As expected, the strain retaining the predicted high-fitness allele of DBP7 from YPS128 grew significantly faster than the strain retaining the predicted low-fitness allele from W303 at 40 °C relative to 30 °C (fig. 4b). No such significant difference was observed for the other three genes tested. Thus, variation in either function or expression of DBP7, which encodes a putative ATP-dependent RNA helicase of the DEAD-box family, likely contributes to fitness variation at 40 °C among these strains. Interestingly, the causal gene is two genes (3.5 kb) away from the significantly associated SNP identified in GWAS.

The second SNP of interest is located within the coding region of CNA1 (Ch XII, 1004315), a gene whose deletion is annotated as increasing stress susceptibility. We individually deleted the alternative alleles in W303/YPS128 hybrids for CNA1 and four neighboring genes and compared their growth rates. Unexpectedly, retention of the high-fitness allele resulted in a higher fitness than retention of the low-fitness allele at 40 °C relative to 30 °C for four of the five genes examined (fig. 4c). This suggests a complex genetic architecture and highlights that the associated SNPs are unlikely to be causal themselves, but instead simply markers for genomic regions harboring naturally occurring variation affecting the trait of interest. Interestingly, unlike the results for DBP7, deletion from S288c for two of these four genes did not have an appreciable impact on relative fitness, suggesting that genes underlying phenotypic variation in natural populations can differ from the genes affecting growth in the laboratory strain. As such, these results indicate the necessity of considering the genetic background when determining gene–phenotype relationships. Furthermore, the type of mutations may also matter, because only null mutations are present in the deletion collection, while some gain-of-function mutations are expected in natural strains.

The third region investigated contains a cluster of five significantly associated SNPs on Ch. XI. Three of these are located in the coding region of DYN1, and one in each of the coding regions of the upstream genes RHO4 and TRM2. We individually deleted alternative alleles of these three genes as well as three additional genes, including the thermally annotated gene RPS21A, from the hybrid of UWOPS05-227.2 and YJM320. The results confirmed that the RHO4 and RPS21A alleles have different effects on relative fitness at 40 °C (fig. 4d). However, deleting RHO4 from S288c did not significantly alter the relative fitness at 40 °C, again suggesting differences in genetic background and/or type of mutation underlying phenotypic variation in natural strains relative to a laboratory strain. Finally, for GLG1, the strain carrying the predicted high-fitness allele was outcompeted by the strain carrying the predicted low-fitness allele, suggesting not only that an associated SNP may have multiple causal genetic variants but also that these variants may have opposite fitness effects.

Discussion

We have presented here a detailed overview of the genomic diversity within S. cerevisiae by combining newly sequenced genomes with those previously published. The genome sequences, in conjunction with the genetically tractable haploids and diploids created, provide valuable resources to the community for understanding the genetic basis of phenotypic variation in yeast. This will not only be informative due to the wealth of biological information we have about yeast but will also be useful to society due to the wide use of diverse yeast in many industries.

We found strong population structure in S. cerevisiae and significant variation in population structure and evolutionary history among different parts of the yeast genome. The incongruences in phylogeny and population structure among different genomic regions are likely due, in a large part, to mating between divergent strains. Meiotic products of such hybrids and their subsequent asexual competition can quickly lead to such patterns. Even without meiosis, an apparently rare event in yeast (Tsai etal. 2008), the production of beneficial aneuploidies during clonal growth can occur, removing some of the chromosomes derived from the hybrid-forming strains.

We identified SNPs associated with variation in growth rate under several environmental stresses relative to a benign condition and validated a subset of these associations experimentally. These observations add much complexity to association mapping in yeast, especially if the goal is to identify causal genetic variants. For instance, only a minority of the associated SNPs were located within the causal genes validated by the hemizygosity test, suggesting that it is uncommon for a SNP identified by our association analysis to cause the observed fitness variation among the natural strains. Instead, the association analysis identified regions likely to harbor allelic variation affecting growth rate, and the reverse genetic data then more precisely located the causal genes. In addition to this problem, we found several cases where deleting a gene from a laboratory strain had no appreciable phenotypic effect, yet alternative alleles segregating in natural populations had different effects on growth rate. Because these genes are annotated as functional in the laboratory strain, our finding suggests that gain-of-function mutations relative to the laboratory strain may be involved in natural phenotypic variation and that forward genetics may sometimes identify a genetic basis that is invisible by reverse genetics of laboratory strains using gene deletion. Alternatively, these results may indicate that the effects of segregating variation depend on genetic background, i.e. epistasis. For high temperature growth, this view is consistent with previous work identifying variability in the gene-phenotype relationship across genetic backgrounds (Sinha etal. 2006; Cubillos etal. 2013). Finally, we also observed instances where retention of the assumed fitter allele resulted in lower fitness than retention of the assumed less fit allele. Because the genetic basis of high temperature growth is one of the most extensively studied traits in S. cerevisiae (Steinmetz etal. 2002; Sinha etal. 2006; Doniger etal. 2008; Parts etal. 2011; Edwards and Gifford 2012; Bloom etal. 2013; Cubillos etal. 2013), it remains to be determined the extent to which this variability in the genic basis of complex traits is observed for other phenotypes. That we observed all of the above phenomena in mapping only a single trait suggests that this key model system for understanding eukaryotic cell biology still has much to teach us about the genetic basis of phenotypic diversity. The combination of forward and reverse genetic approaches in this model system offers one way in which this complexity can begin to be unraveled.

Materials and Methods

Strains and Strain Construction

The strains sequenced in this work were obtained from the authors of two previous studies (Liti etal. 2009; Schacherer etal. 2009) and are listed in supplementary data S1, Supplementary Material online. The information of geographic location and environment of each strain is also provided when available. Most strains used are originally diploid and homothallic, and contain no tractable genetic marker, making tracking strains difficult and the maintenance of stable haploid strains, necessary for many studies, impossible. To produce a set of strains useful to the community, we adopted the approach used in the construction of the S. cerevisiae gene deletion collection to introduce drug resistance markers flanked by two unique, strain-identifying, 20-nucleotide DNA barcodes at the HO (YDL227C) locus of each strain (outlined in supplementary fig. S10, Supplementary Material online). This process simultaneously removed the strains’ ability to mating-type switch and introduced a reliable means for strain tracking. Diploid strains were transformed using the lithium acetate method (Cubillos etal. 2009) with minor alterations. The ∼1 µg of transforming HO-targeting DNA contained a G418 sulfate resistance marker flanked by strain-specific barcodes and was produced by two successive polymerase chain reaction (PCR) amplifications. We first amplified the KanMX4 cassette from plasmid pFA6a-KanMX4 (Wach etal. 1994) using two 74-nucleotide primers each containing a unique 20-nucleotide barcode, the sequences necessary for its amplification (U1 + U2 or D1 + D2), and priming sites for the second PCR (supplementary data S8, Supplementary Material online). The second PCR used a dilution of the product of the first PCR as the template and added sequences homologous to regions upstream and downstream of HO for targeting and replacement of the locus (supplementary fig. S10, Supplementary Material online). The primers used in this PCR differed by strain to maintain lineage-specific SNPs in the region. A full list of the primers used can be found in supplementary data S8, Supplementary Material online. To ensure that the barcodes assigned to each strain are novel and maintain their compatibility with those in gene deletion strains (Giaever etal. 2002), molecular barcoded yeast (MoBY) ORF library (Ho etal. 2009), and existing technologies used to estimate barcode frequency, we employed unused barcode sequences already present on the widely used Tag4 array (Pierce etal. 2006). We confirmed successful insertion of the KanMX4 cassette by PCR and confirmed their sequence using Sanger sequencing. Although Cubillos etal. (2009) have previously produced a set of genetically tractable strains for a subset of genotypes studied, they were unsuitable for several reasons. First, the shorter barcode (6 bp vs. 20 bp) reduces the number of strains that can be confidently mixed due to potential misidentification of barcodes and therefore counts in the presence of sequencing errors. Second, the barcodes used are incompatible with the bar-seq methodology used for the gene deletion lines due to differences in flanking sequences. This precludes the simultaneous single tube determination of both wild and deletion line fitness values. Third, all strains were converted to uracil auxotrophs by deletion of URA3, possibly affecting strain growth. Together, these considerations led us to construct this new resource for the community.

From these heterozygous HO marked diploids (MATa/MATα HO/hoΔ::Uptag-KanMX4-Downtag), stable haploid strains were obtained by sporulation on potassium acetate media followed by ascus digestion and tetrad dissection. G418 resistant colonies were identified by replication to YPD media containing 300µg/ml G418 sulfate (Gold Biotechnology, US). Colony PCR was used to determine the mating type of individual colonies, and single MATa and MATα colonies were streaked to obtain a pure strain of each mating type. Samples were grown overnight and frozen at −80 °C in 20% glycerol for long-term storage. To allow for the easy formation of diploids between any two strains, we switched the drug resistance cassette carried by MATa and MATα strains to hygromycin B and nourseothricin, respectively (Gold Biotechnology). This was achieved by the standard LiAc method using a PCR product produced by the use of primers specific to the TEF promoter and terminator common to all three drug resistance cassettes (Wach etal. 1994; Goldstein and McCusker 1999).

Two sets of strains were treated slightly differently due to their genotypes. First, RM11 was previously made into a stable haploid strain by insertion of a KanMX4 cassette at the HO locus, resulting in the deletion of the targeting region we used in all other strains (Brem etal. 2002). To insert the appropriate barcodes into this background, unique homologous primers were used to target and replace the existing KanMX4 cassette with a HphMX4 marker amplified from plasmid pAG32 (Goldstein and McCusker 1999). Unique barcodes were then added and the cassette was switched back to KanMX4. Second, three strains (S288c, W303, and RM11) were already heterothallic haploids. After insertion of the barcoded cassette at the HO locus, these strains were transformed with plasmid pCM66, which contains a galactose inducible copy of HO and a nourseothricin drug resistance marker, to obtain strains of both mating types. After transformation, nourseothricin resistant cells were grown with galactose as the sole carbon source at 30 °C without shaking for 8 h to induce expression of HO. This allowed for mating-type switching and subsequent mother–daughter cell mating to produce diploids. Cells were then streaked for single colonies on YPD (1% yeast extract, 2% peptone, 2% glucose, and 2% agar) plates, and the ploidy of single colonies was checked by colony PCR using mating-type-specific primers. Diploid colonies were streaked for single colonies on fresh, nonselective, YPD plates and assayed for nourseothricin resistance. A single colony unable to grow in the presence of the drug, and therefore having lost the plasmid, was selected for each strain.

We attempted to produce genetically tractable strains for each of the 85 strains whose genomes we sequenced, but found some to be unamenable to our approach, either due to natural resistance to the drugs used or an inability to successfully sporulate and produce viable offspring of both mating types. The full details of all tractable strains created and the reason for missing strains are outlined in supplementary data S1, Supplementary Material online.

Genome Sequencing

Each of the 85 strains was streaked from frozen stocks onto YPD plates. Following 2 days of growth, a single colony was picked into 5 ml of liquid YPD media and grown to saturation (36 h at 30 °C with shaking). Cultures were centrifuged to collect cells, and DNA was extracted using standard methods. Dried DNA pellets were resuspended in 70 µl of Tris-EDTA (pH8.0), the DNA was quantified, and the purity was assessed, before DNA storage at −80 °C. Illumina libraries were constructed using a protocol modified from a previous study (Rohland and Reich 2012). Briefly, 5 µg of genomic DNA was sheared using a Covaris S220 (duty cycle 10%, intensity 4, cycles/burst 200, time 55 s), of which 2 µg was used in library construction. To select DNA fragments of the desired size range (∼400 nucleotides), we used DNA binding Magna beads to perform dual size selection. The fragments were blunt-end repaired, adapter ligated, and nick filled to repair the adapter overhangs. Finally, sequences necessary for multiplexing and cluster formation on an Illumina HiSeq2000 were added by PCR. Equal amounts of each library were combined and run across two paired-end 100-nucleotide lanes (43 strains in one lane and 42 in a second) of an Illumina HiSeq2000 at the University of Michigan DNA sequencing core.

Read Mapping and SNP/Indel Calling

Reads were first trimmed using Cutadapt (Martin 2011) to remove adapter sequences. Bowtie2 v2.1.0 (Langmead and Salzberg 2012) was used to map reads to the S288c reference (R64-1-1) genome under the sensitive local alignment mode, allowing up to 3 mismatches/indels per read. Pertinent statistics obtained during the mapping process are listed in supplementary data S2, Supplementary Material online. Paired reads were considered nonconcordant and discarded from further analysis if apparent mapping locations were more than 1,200 nucleotides apart or if the paired reads appeared to completely overlap one another. Paired reads were also removed from further analysis if either read was found to map ambiguously. Finally, we removed PCR duplicates by discarding all but one copy of any read pair found to map to exactly the same genomic position.

SAMtools v0.1.18 (Li etal. 2009) and VarScan v2.3.6 (Koboldt etal. 2012) were used to identify SNPs and indels within each genome. Only variants identified by both programs were used in downstream analysis. To further reduce false calls due to misalignment of reads to the reference genome, we removed variants that showed a significant strand bias (binomial P < 0.001) or invariant distance to the end of supporting reads (VDB < 0.0015) (Daneck etal. 2012). Only the most likely variant is listed for the indel and homozygous SNP lists. For the heterozygous SNP list, maximum likelihood genotype inferred by SAMtools is reported. To reduce errors in estimating allele frequencies, we used only segregating sites with reads covering the variant in each of the 85 strains except when identifying pseudogenizing variants.

Phylogenetic Reconstruction

We reconstructed a maximum composite likelihood neighbor-joining tree using MEGA 5.2 with all homozygous SNPs and all substitution types (Tamura etal. 2011). We allowed heterogeneous rates amongst lineages and heterogeneous rates amongst sites. Clades were identified in line with previous studies (Liti etal. 2009; Schacherer etal. 2009). To assess the strength of support for the phylogeny, we performed 1,000 bootstraps. Phylogenies of individual chromosomes were reconstructed using the same method.

Population Structure

To assess the population structure of the 190 strains, we used a model-based Markov Chain Monte Carlo (MCMC) algorithm implemented in fastSTRUCTURE (Raj etal. 2014). For genome-wide population structure analysis, we randomly selected 10% of nonsingleton homozygous SNPs. One hundred runs of fastSTRUCTURE for each of K = 2–9 were performed, with other parameters set as the default. K = 7 was found to be the best. The population structure that exhibited the maximum mean likelihood was plotted using R. Finally, the population structure of each chromosome was determined using all homozygous SNPs on the specific chromosome at K = 7.

Linkage Disequilibrium

LD measured by r2 between every pair of SNPs was calculated using custom code. We then computed the average r2 for all SNP pairs with a distance in the range between x − 99 and x nucleotides, where x = 100, 200, 300, …, and 200,000. We computed the expected LD between unlinked SNPs by calculating the mean r2 of 10,000 random pairs of SNPs located on different chromosomes. Following Schacherer etal. (2009), for each distance range, we presented the difference between an observed r2 and the expected r2 of unlinked SNPs in figure 1d.

Population Genomic Analysis of Natural Selection

We used only SNP sites that are dimorphic and for which the ancestral state could be unambiguously assigned in population genomic analysis. To infer SNP ancestral states, we took advantage of the published orthology information and multi-species genome sequences (Scannell etal. 2011). We used T-Coffee (Notredame etal. 2000) and the default settings in BioPerl to align the coding sequences of S. paradoxus, S. mikatae, and S. bayanus with the orthologous coding sequences of the S. cerevisiae reference sequence R64-1-1. Using these multiple-sequence alignments, we considered the states of S. paradoxus, S. mikatae, and S. bayanus for each SNP site and unambiguously assigned its ancestral state if at least two of these outgroup species were in agreement.

We found polymorphisms that result in nonsense mutations to have much lower derived allele frequencies (DAFs) than nonsynonymous polymorphisms, which have lower DAFs than synonymous polymorphisms (supplementary fig. S3, Supplementary Material online). This pattern suggests that purifying selection against nonsense mutations is generally stronger than that against nonsynonymous mutations, which is in turn stronger than that against synonymous mutations. To examine whether purifying selection acts on synonymous mutations, especially in genes with strong codon usage bias (CUB), we measured CUB by codon-adaptation index (CAI) of yeast genes previously published (Qian etal. 2012b). We divided genes into two bins: those with CAI > 0.6 and those with CAI ≤ 0.6. We found that synonymous polymorphisms in high-CAI genes tend to have lower DAFs than those in low-CAI genes (supplementary fig. S3, Supplementary Material online), supporting the hypothesis of purifying selection against synonymous mutations in genes with strong CUB. For each SNP category, we also calculated the population genetic statistics Tajima’s D (1989), Fu and Li’s F (1993), and Fay and Wu’s H (2000). Compared with synonymous polymorphisms, the more negative values of D and F for nonsynonymous and nonsense polymorphisms are consistent with the excess of rare alleles, and the less negative values of H are consistent with the deficiency of common alleles (supplementary table S1, Supplementary Material online).

To assess potential positive selection at the protein level, we counted the number of synonymous polymorphisms (PS), nonsynonymous polymorphisms (PN), synonymous substitutions between S. cerevisiae and S. paradoxus (DS) and nonsynonymous substitutions between S. cerevisiae and S. paradoxus (DN) in each gene. The McDonald–Kreitman (1991) test was performed using a two-tailed Fisher’s exact test within R followed by a Bonferroni multiple-test correction. To calculate the proportion of amino acid substitutions driven by positive selection (α) for each gene, we used S. paradoxus as an outgroup. We first determined if DN/PN > DS/PS. When DN/PN > DS/PS, we calculated α by 1-DSPN/(DNPS); otherwise, we calculated α' by 1 − DNPS/(DSPN), which represents the fraction of nonsynonymous mutations under purifying selection. We found that the distribution of α is largely consistent with widespread purifying selection and relatively few instances of positive selection (supplementary fig. S4, Supplementary Material online). In addition, McDonald–Kreitman tests of individual genes failed to detect a significant signal of positive selection for any gene after Bonferroni correction. This result is consistent with a previous analysis of a smaller yeast dataset (Liti etal. 2009). McDonald–Kreitman tests suggested that 6.9% of genes are under significant purifying selection (Bonferroni corrected P-value < 0.05).

Because slightly deleterious alleles can bias the estimation of α, we further estimated α using the approach proposed by Eyre-Walker and Keightley (2009). Briefly, their approach uses polymorphism data to estimate the distribution of fitness effects of new deleterious mutations (DFE) and then uses DFE to predict the number of neutral and adaptive substitutions. When using this approach with a two-epoch model, we found the estimated α to be −0.11 (supplementary table S2, Supplementary Material online), which is again consistent with the lack of signal of positive selection.

Recently, Messer and Petrov (2013) proposed a heuristic method to estimate α by considering α values for polymorphic sites with different levels of DAF. Following Messer and Petrov (2013), we applied the extended version of the McDonald–Kreitman test to all genes. Starting with all SNPs, we sequentially raised the threshold for DAF and recalculated α using only SNPs that pass the threshold. Using MATLAB, we applied a nonlinear least-square method to fit the data to the function of αx = a + becx, where x is the DAF threshold and αx is the corresponding α. We restricted b < 0 and c > 0. While the theoretical support of Messer and Petrov's approach is lacking, this approach, surprisingly, converges on an estimated value of 0.55 for α, suggesting that ∼55% of between-species sequence divergence at nonsynonymous sites is due to positive selection (supplementary fig. S5a, Supplementary Material online). Interestingly, the plot of α vs. DAF showed a clear valley around intermediate DAFs that has not been observed previously. Suspecting the reason might be heterogeneous selection across strains, we partitioned strains based on phylogenetic clustering. We found that if we separated strains from the Wine/European cluster from all other strains and performed the same analysis on the two groups separately, the plots of α vs. DAF were dramatically different from each other. The extrapolated α values were −0.36 for the Wine/European cluster and 0.56 for all other strains. This difference in the estimate of α was not caused by sampling in general, but the specific partitions used (supplementary fig. S5b, c, Supplementary Material online). In addition, the α vs. DAF plot for strains not in the Wine/European cluster showed a better fit to an exponential curve than the combined analysis (adjusted R2 = 0.85 without Wine/Europen strains vs. adjusted R2 = 0.69 for all strains). Overall, these results suggest that the historical action of natural selection within these two groups has been different and that positive selection in yeast may be more common than initially expected, especially outside of domesticated wine strains. These results, however, are inconsistent with the results obtained using the method of Eyre-Walker and Keightley (2009), where no positive values are apparent when using either only strains in the Wine/European cluster or only strains outside the Wine/European cluster (supplementary table S2, Supplementary Material online).

De Novo Assembly and Identification of Non-Reference Genes

De novo genome assembly was performed using SOAPdenovo2 v2.04 (Luo etal. 2012) with K = 51 for all adapter-trimmed reads from each of the 85 strains we sequenced. Basic statistics of the genome assemblies are listed in supplementary data S3, Supplementary Material online. To examine the quality of the assemblies, we used BLASTn to search for the KanMX4 vector sequence present at the HO locus of 81 genomes. We then used it as an anchor to extract the strain-specific barcodes (UPTAG and DSTAG). We successfully recovered the strain-specific barcodes for all 81 genomes. To rule out false positive de novo gene calls, Exonerate v2.2.0 (Slater and Birney 2005) was used to align known genes in the reference genome of S288c to the assembled contigs. Known genes were localized onto contigs by prioritizing the Exonerate hits by (i) best hit with syntenic neighbor genes at either side, (ii) best hit with 100% query sequence coverage, and (iii) hits longer than 200 nucleotides that do not overlap with better hits by more than 30 nucleotides—in case a gene is split among multiple contigs. Having identified the locations of known genes on de novo contigs, we used GeneMarkS v4.17 (Besemer etal. 2001) to perform gene predictions and compared these to the locations of known genes.

Predicted genes that showed no overlap with known gene locations were considered candidate nonreference genes. To avoid false positives caused by un-localized known genes, we used BLASTn to align the predicted genes with cDNA sequences of known genes. All predicted genes with hits covering 80% of the query, or with a >200-nucleotide region that is >90% identical with any reference gene, were removed. In order to classify the origin of remaining nonreference genes, we retrieved the best hit in the NCBI “nr” database reported by BLASTn and tBLASTx for each nonreference gene. If the best hit was in the reference genome, it was also removed unless there was a premature stop codon for the hit region in the reference genome. Finally, we removed nonreference genes with best hits to sequences derived from vectors, synthetic constructs, phages, bacterial genomes, or tandem elements (supplementary data S4, Supplementary Material online).

To access the expression levels of the nonreference genes, RNA-seq data of 23 S. cerevisiae strains generated by SOLiD were downloaded (Skelly etal. 2013). The color space RNA-seq reads from each strain were mapped to known and predicted nonreference genes using bowtie (Langmead etal. 2009), allowing up to two mismatches in color space. The best hits for each read were then used to calculate the Reads Per Kilobase of transcript per Million mapped reads (RPKM) for each gene.

Phenotypic Consequences of Deleting Non-Reference Genes

Two newly identified nonreference genes were deleted from the genetic backgrounds in which they were discovered using the G418 sulfate resistance cassette KanMX4 that was PCR-amplified from plasmid pFA6a–KanMX4 (Wach etal. 1994), followed by PCR confirmation (see supplementary data S8, Supplementary Material online for primer sequences). Non-Ref-129 was deleted from strain UWOPS87-2421, while Non-Ref-67 was deleted from strain CLIB272. Both backgrounds were MATa hoΔ::NatMX4 genotypes detailed below.

To test the phenotypic consequences of the deletion of these two genes, growth curves were obtained using a Bioscreen C (Growth Curves, USA). Non-Ref-129 deletion strain and wild-type strain were grown in Complete Supplemented Media (CSM) containing varying concentrations (0, 3, 6, 12.5, 25, 37.5, 50, 75, and 100 µg/ml) of fluconazole (Sigma Aldrich). Non-Ref-67 deletion strain and wild-type strain were also grown in CSM media but using varying concentrations (0, 1.5, 3, 6, 12, 25, 50, and 100 µM) of cantharidin. To initiate the growth, we grew 5 ml CSM cultures from frozen stocks for 24 h. Each strain was diluted 200 times into 350 µl of the appropriate drug-containing media in triplicate. Growth curves were collected at 30 °C for 60 h using the Wide-Band 420–580 nm filter every 20 min. Maximum growth rate (OD/h) was collected from growth curve data as in a previous study (Warringer etal. 2011) and efficiency (Max OD) was calculated by taking the average of the 3rd–7th highest OD values recorded. Two-tailed t-tests were used to assess the significance of differences between genotypes. Three replications were performed for each growth curve.

Identification of Intron Losses

Based on S288c reference genome annotation, we built a sequence database containing all known exon–exon junctions up to 101 nucleotides from each side of the junction. To search for reads supporting an intron loss event, all reads were mapped to this database by Bowtie2. We required at least 95% coverage of the query read and that the read mapped to at least 20 nucleotides in each of two adjacent exons. We further filtered ambiguous mappings on the reference genome by BLASTn with an E-value cutoff at 0.01. To search for reads supporting the presence of introns, the same procedure was conducted for all sequences annotated as exon–intron borders in the S288c reference genome. Finally, intron loss was declared in strains in which at least two reads span the exon–exon junction but no read spans the two corresponding exon–intron borders.

Identifying Potential Copy Number Variants (CNVs) and Aneuploidies

To assess gene duplication/deletion events, read pairs that were concordantly mapped to the reference genome following filtering of potential PCR duplicates were analyzed using Cufflinks v2.1.1 (Trapnell etal. 2010) to generate Fragments Per Kilobase of transcript per Million mapped reads (FPKM) values for each coding sequence (CDS). Potential CNVs of individual genes, as well as aneuploidies and large scale duplications, were then identified by dividing each FPKM by that obtained from the same CDS in the reference strain S288c.

Simultaneous Phenotyping of Barcoded Yeast Strains

To phenotype all strains with unique barcodes, we used bar-seq (Smith etal. 2009) to estimate their relative growth rates in each of seven environments. The barcoded strains were mixed approximately equally and combined with the diploid homozygous gene deletion collection (Invitrogen 95401.H1Pool). Each nondeletion strain was present at approximately twice the initial population size of each gene deletion strain. The initial pool of strains was grown for approximately two generations in 25 ml of YPD media at 30 °C before the resulting culture, termed generation 0, was used to initiate competitions in each of the experiments. To reduce the effect of genetic drift, large populations were maintained throughout competitions with regular transfers to fresh media every 4–5 generations to maintain populations in exponential growth. Populations competed for approximately 30 generations (6 transfers) and samples were stored at −80 °C following each transfer. Following preliminary investigations, we chose to carry out in-depth analysis of the populations following the second transfer (∼10 generations) in YPD at 30 °C, YPD at 40 °C, YPD + 1.25 M NaCl at 30 °C, YPD + 8% EtOH at 30 °C, YPD + 4 mM paraquat (superoxide) at 30 °C, YPD + 3 mM Hydrogen peroxide at 30 °C, and YPD + 1 mM cobalt chloride at 30 °C, respectively.

To determine the frequency of strains in the pooled population at a given time point, we extracted genomic DNA from samples using a Puregene Yeast/Bacteria DNA extraction kit (Qiagen). DNA barcodes were amplified by PCR using Accuprime pfx (Invitrogen). The primers used for barcode amplification also added sequences necessary for cluster formation and sequencing primer annealing on the Illumina platform. Because the downstream barcode is known to be missing in some deletion strains (Deutschbauer etal. 2005), only the upstream barcodes were used. Fifty base-pair single-end sequence reads were obtained using one lane of an Illumina Genome Analyzer IIx at the University of Michigan DNA Sequencing Core. The Illumina Pipeline software version 1.6 was used for base calling from the image data. Because all sequences started with the same 18 base pairs of the PCR primer region and this uniformity adversely affected base calling, we removed the first 18 sequencing cycles before base calling. We used the previously published “gene-barcode map” (Qian etal. 2012a) with the addition of our own strains' barcode identities to assign each read to a particular strain allowing for only a single mismatch. We followed a previously published outline of bar-seq analysis (Robinson etal. 2014). We required that barcodes be represented by at least 40 total counts in populations analyzed before (generation 0) and after competition. Count numbers were then normalized using the TMM method implemented in the R package egdeR (Robinson etal. 2010). Because we are interested in phenotypes that are specific to a given condition, as opposed to the general fitness across environments, we computed fitness in a specific condition relative to that in YPD (the benign condition). Overall there were 79 natural S. cerevisiae strains and 4,498 deletion strains with usable data.

Let the number of reads for a genotype of interest at the beginning of the competitions be N0 and the corresponding number for the reference genotype (i.e., the strain lacking HO) be M0. Let the numbers of reads for the above two genotypes after competition in the benign environment of YPD at 30 °C be N1 and M1, respectively, and the corresponding numbers in a stressful environment of interest be N2 and M2, respectively. The competition lasted for g = ∼10 generations in each environment. Note that the inaccuracy in g does not affect comparison of relative fitness among strains because the same g applies to all strains. The fitness of the genotype of interest relative to that of the reference genotype in the stressful environment, relative to that in the benign environment is R=[(N2/N0)/(M2/M0)]1/g[(N1/N0)/(M1/M0)]1/g=(N2M1N1M2)1/g.

Two biological replicates of the competition and bar-seq were conducted in YPD at 30 °C. Based on the two estimates of N1/M1, we tested whether N2/M2 is significantly different from N1/M1 (i.e., whether R is significantly different from 1), followed by FDR correction for multiple testing.

GWAS

We followed a multistep GWAS approach (Listgarten etal. 2012, 2013) outlined in supplementary fig. S13, Supplementary Material online. For each environment, we first scaled and centered the relative growth rates. We removed all sites with a minor allele frequency below 5% across the set of strains for which we had phenotype data. We then removed sites for which data were missing in >5% of strains and all mitochondrial sites. The remaining 123,121 sites were converted into a 0, 0.5 and 1 format to represent homozygous nonreference, heterozygous, and homozygous reference states in each strain, respectively.

Besides true positives, individual SNPs can be significantly associated with a phenotype because of chance or the confounding effect of population structure. Because over- or under-correction of population structure can lead to a loss of statistical power or false positives, respectively, we performed the GWAS in a number of steps to attempt to find an appropriate balance. First, we performed an association between the normalized phenotype values and each SNP without controlling for population structure by a simple linear regression method without covariates. Second, we used this unstructured association to rank all SNPs based on their statistical significance of association. From this list, we performed a series of associations by maximum likelihood using EMMA (Kang etal. 2008). To control for population structure, we estimated a kinship matrix based on a specific set of SNPs. To define this set, we started with the 1,000 SNPs most significantly associated with the phenotype in the unstructured analysis and then successively added the next 1,000 most significant SNPs from the unstructured analysis until 120,000 SNPs were included. For each association, the genomic control factor, lambda, was calculated using gcontrol2 within R (Devlin and Roeder 1999). We identified the minimal kinship set that controlled for population structure based on where lambda first hit 1, or if it failed to do so, was minimized (supplementary fig. S12, Supplementary Material online). Third, we ran an additional series of associations centered on the 3,000 SNP region identified by lambda using kinship sets in 50 SNP windows. Again, we found the smallest kinship set where lambda hit 1 or was at its minimum. Finally, we performed an association for the 500 most significant SNPs from the unstructured association. In each case, we used the estimated kinship set that was optimal for lambda, minus any SNPs within 10 kb of the focal SNP, to estimate population structure. Any variant in this final association with a P-value below 0.0001 (i.e., 5%/500) was classified as significant. For these SNPs, we identified the coding region nearest to the variant as well as its immediate neighbors. These candidate SNPs generally cover a 4–10 kb region which is approximately 4–8 times the range over which LD is seen to break down.

Experimental Validation of GWAS Results

To confirm the validity of our GWAS approach and to narrow down the genes responsible for the observed phenotypic variations within the regions surrounding the significant SNPs, we performed reciprocal hemizygosity tests (Steinmetz etal. 2002). We chose to concentrate on a subset of SNPs significantly associated with relative fitness at 40 °C. The same candidate genes were deleted from haploid MATa hoΔ::HygMX4 backgrounds that showed a high relative fitness and a low relative fitness using a KanMX4 marker. Appropriate strain hybrids were formed by mating of these deletion strains to MATα cells. Two types of hybrid were formed to allow for the relative fitness of reciprocal hemizygotes to be ascertained. The first was formed by mating of deletion-carrying haploids to MATα hoΔ::NatMX4 of the other genetic background and the second was formed by mating to MATα hoΔ::TDH3p-YFP-NatMX4 cells. This resulted in four hybrid strains for each gene to be tested, fluorescent and nonfluorescent hybrids lacking the high-fitness allele (ΔH-F and ΔH-NF) and fluorescent and nonfluorescent hybrids lacking the low-fitness allele (ΔL-F and ΔL-NF). All strains were frozen at −80 °C until needed. To perform competitive fitness assays, freezer samples were inoculated into 1 ml of YPD and grown with shaking for 24 h at 30 °C. Diploid hybrids were paired so that each competition included two hybrids carrying reciprocal deletions of the alternative alleles and identifying markers. That is, ΔH-F was paired with ΔL-NF in the first competition, and ΔL-F was paired with ΔH-NF in the second competition. Competitions were performed by mixing equal volumes of the two strains to form the t0 population which was diluted 1,000 fold into 1.5 ml of fresh YPD media pre-warmed at 40 °C. Strains competed for 36 h, or ∼10 generations, at 40 °C with shaking till saturation (t1). All competitions were performed in 96 deep-well plates with four replicates of each competition. The frequencies of fluorescent and nonfluorescent cells within t0 and t1 populations were determined by flow cytometry using a BD Accuri C6 attached to a hypercyt sampling robot. After data cleaning to remove artifacts, the relative fitness of the competing strains was calculated using F=(#ofΔHcellsatt1/#ofΔLcellsatt1#ofΔHcellsatt0/#ofΔLcellsatt0)1/G, where G = 10 denotes the number of generations of competition. The same experiment was repeated in YPD at 30 °C. Relative fitness was computed by dividing fitness at 40 °C by fitness at 30 °C. Five replicate competitions were performed for each set of strains and the relative fitness was calculated by averaging over the reciprocal YFP-marked strain pairs for each gene. Statistical significance was obtained by t tests.

Data Access

The yeast genome sequences determined in this work have been deposited to NCBI with the BioProject ID of PRJNA320792.

Supplementary Material

Supplementary data are available at Molecular Biology and Evolution online.

Supplementary Material

Supplementary Data 1
mbe_34_10_2486_s6.xlsx (28KB, xlsx)
Supplementary Data 2
mbe_34_10_2486_s7.xls (71.5KB, xls)
Supplementary Data 3
mbe_34_10_2486_s8.xlsx (13.9KB, xlsx)
Supplementary Data 4
mbe_34_10_2486_s9.xls (8.2MB, xls)
Supplementary Data 5
mbe_34_10_2486_s10.xls (5.2MB, xls)
Supplementary Data 6
Supplementary Data 7
mbe_34_10_2486_s12.docx (92.2KB, docx)
Supplementary Data 8
Supplementary Data
mbe_34_10_2486_s1.pdf (6.9MB, pdf)

Acknowledgments

We thank Gianni Liti and Leonid Kruglyak for sharing the yeast strains that are sequenced in this study, Audrey Gasch for help accessing read data from previously sequenced strains, and two anonymous reviewers for constructive comments. This work was supported by a research grant from the U.S. National Science Foundation (MCB-1329578) to J.Z.

Contributor Information

Calum J. Maclean, Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI.

Brian P.H. Metzger, Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI.

Jian-Rong Yang, Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI.

Wei-Chin Ho, Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI.

Bryan Moyers, Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI.

Jianzhi Zhang, Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI.

References

  1. Bergström  A, Simpson JT, Salinas F, Barré B, Parts L, Zia A, Nguyen Ba AN, Moses AM, Louis EJ, Mustonen V, et al. 2014. A high-definition view of functional genetic variation from natural yeast genomes. Mol Biol Evol. 31:1–17. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Besemer  J, Lomsadze A, Borodovsky M.  2001. GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions. Nucleic Acids Res. 29:2607–2618. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Bloom  JS, Ehrenreich IM, Loo WT, Lite T-LV, Kruglyak L.  2013. Finding the sources of missing heritability in a yeast cross. Nature 494:234–237. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Borneman  AR, Pretorius IS.  2015. Genomic insights into the Saccharomyces sensu stricto complex. Genetics 199:281–291. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Brem  RB, Yvert G, Clinton R, Kruglyak L.  2002. Genetic dissection of transcriptional regulation in budding yeast. Science 296:752–755. [DOI] [PubMed] [Google Scholar]
  6. Brôco  N, Tenreiro S, Viegas CA, Sá-Correia I.  1999. FLR1 gene (ORF YBR008c) is required for benomyl and methotrexate resistance in Saccharomyces cerevisiae and its benomyl-induced expression is dependent on pdr3 transcriptional regulator. Yeast 15:1595–1608. [DOI] [PubMed] [Google Scholar]
  7. Connelly  CF, Akey JM.  2012. On the prospects of whole-genome association mapping in Saccharomyces cerevisiae. Genetics 191:1345–1353. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Cromie  GA, Hyma KE, Ludlow CL, Garmendia-Torres C, Gilbert TL, May P, Huang A. a, Dudley AM, Fay JC.  2013. Genomic sequence diversity and population structure of Saccharomyces cerevisiae assessed by RAD-seq. G3 3:2163–2171. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Cubillos  FA, Louis EJ, Liti G.  2009. Generation of a large set of genetically tractable haploid and diploid Saccharomyces strains. FEMS Yeast Res. 9:1217–1225. [DOI] [PubMed] [Google Scholar]
  10. Cubillos  FA, Parts L, Salinas F, Bergström A, Scovacricchi E, Zia A, Illingworth CJR, Mustonen V, Ibstedt S, Warringer J, et al. 2013. High-resolution mapping of complex traits with a four-parent advanced intercross yeast population. Genetics 195:1141–1155. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Daneck  P, Nellaker C, McIntyre RE, Buendia-Buendia JE, Bumpstead S, Ponting CP, Flint J, Durbin R, Keane TM, Adams DJ.  2012. High levels of RNA-editing site conservation amongst 15 laboratory mouse strains. Genome Biol. 13:R26.. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Deutschbauer  AM, Jaramillo DF, Proctor M, Kumm J, Hillenmeyer ME, Davis RW, Nislow C, Giaever G.  2005. Mechanisms of haploinsufficiency revealed by genome-wide profiling in yeast. Genetics 169:1915–1925. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Devlin  B, Roeder K.  1999. Genomic control for association studies. Biometrics 55:997–1004. [DOI] [PubMed] [Google Scholar]
  14. Diao  L, Chen KC.  2012. Local ancestry corrects for population structure in Saccharomyces cerevisiae genome-wide association studies. Genetics 192:1503–1511. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Doniger  SW, Kim HS, Swain D, Corcuera D, Williams M, Yang S-P, Fay JC.  2008. A catalog of neutral and deleterious polymorphism in yeast. PLOS Genet. 4:e1000183.. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Douglas  AC, Smith AM, Sharifpoor S, Yan Z, Durbic T, Heisler LE, Lee AY, Ryan O, Göttert H, Surendra A, et al. 2012. Functional analysis with a barcoder yeast gene overexpression system. G3 2:1279–1289. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Dujon  B, Sherman D, Fischer G, Durrens P, Casaregola S, Lafontaine I, Montigny J, De Blanchin S, Beckerich J, Beyne E, et al. 2004. Genome evolution in yeasts. Nature 35–44. [DOI] [PubMed] [Google Scholar]
  18. Edwards  MD, Gifford DK.  2012. High-resolution genetic mapping with pooled sequencing. BMC Bioinformatics 13 Suppl 6:S8.. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Eyre-Walker  A, Keightley PD.  2009. Estimating the rate of adaptive molecular evolution in the presence of slightly deleterious mutations and population size change. Mol Biol Evol. 26:2097–2108. [DOI] [PubMed] [Google Scholar]
  20. Fay  JC.  2013. The molecular basis of phenotypic variation in yeast. Curr Opin Genet Dev. 23:672–677. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Fay  JC, Wu CI.  2000. Hitchhiking under positive Darwinian selection. Genetics 155:1405–1413. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Fu  YX, Li WH.  1993. Statistical tests of neutrality of mutations. Genetics 133:693–709. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Gallone  B, Steensels J, Baele G, Maere S, Verstrepen KJ, Prahl T, Soriaga L, Saels V, Herrera-Malaver B, Merlevede A, et al. 2016. Domestication and divergence of Saccharomyces cerevisiae beer yeasts. Cell 166:1397–1410. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Gbelska  Y, Krijger J-J, Breunig KD.  2006. Evolution of gene families: the multidrug resistance transporter genes in five related yeast species. FEMS Yeast Res. 6:345–355. [DOI] [PubMed] [Google Scholar]
  25. Giaever  G, Chu AM, Ni L, Connelly C, Riles L, Véronneau S, Dow S, Lucau-Danila A, Anderson K, André B, et al. 2002. Functional profiling of the Saccharomyces cerevisiae genome. Nature 418:387–391. [DOI] [PubMed] [Google Scholar]
  26. Giaever  G, Nislow C.  2014. The yeast deletion collection: a decade of functional genomics. Genetics 197:451–465. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Goffeau  A, Barrell BG, Bussey H, Davis RW, Dujon B, Feldmann H, Galibert F, Hoheisel JD, Jacq C, Johnston M, et al. 1996. Life with 6000 genes. Science 274:546–567. [DOI] [PubMed] [Google Scholar]
  28. Goldstein  AL, McCusker JH.  1999. Three new dominant drug resistance cassettes for gene disruption in Saccharomyces cerevisiae. Yeast 15:1541–1553. [DOI] [PubMed] [Google Scholar]
  29. Gonçalves  M, Pontes A, Almeida P, Barbosa R, Serra M, Libkind D, Hutzler M, Gonçalves P, Sampaio JP.  2016. Distinct domestication trajectories in top-fermenting beer yeasts and wine yeasts. Curr Biol. 26:2750–2761. [DOI] [PubMed] [Google Scholar]
  30. He  X, Qian W, Wang Z, Li Y, Zhang J.  2010. Prevalent positive epistasis in Escherichia coli and Saccharomyces cerevisiae metabolic networks. Nat Genet. 42:272–276. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Hillenmeyer  ME, Fung E, Wildenhain J, Pierce SE, Hoon S, Lee W, Proctor M, St Onge RP, Tyers M, Koller D, et al. 2008. The chemical genomic portrait of yeast: uncovering a phenotype for all genes. Science 320:362–365. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Hittinger  CT.  2013. Saccharomyces diversity and evolution: a budding model genus. Trends Genet. 29:309–317. [DOI] [PubMed] [Google Scholar]
  33. Ho  CH, Magtanong L, Barker SL, Gresham D, Nishimura S, Natarajan P, Koh JLY, Porter J, Gray CA, Andersen RJ, et al. 2009. A molecular barcoded yeast ORF library enables mode-of-action analysis of bioactive compounds. Nat Biotechnol. 27:369–377. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Hose  J, Yong CM, Sardi M, Wang Z, Newton MA, Gasch AP.  2015. Dosage compensation can buffer copy-number variation in wild yeast. Elife 4:e05462. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Juneau  K, Miranda M, Hillenmeyer ME, Nislow C, Davis RW.  2006. Introns regulate RNA and protein abundance in yeast. Genetics 174:511–518. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Jungwirth  H, Wendler F, Platzer B, Bergler H, Högenauer G.  2000. Diazaborine resistance in yeast involves the efflux pumps Ycf1p and Flr1p and is enhanced by a gain-of-function allele of gene YAP1. Eur J Biochem. 267:4809–4816. [DOI] [PubMed] [Google Scholar]
  37. Kang  HM, Zaitlen NA, Wade CM, Kirby A, Heckerman D, Daly MJ, Eskin E.  2008. Efficient control of population structure in model organism association mapping. Genetics 178:1709–1723. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Kellis  M, Patterson N, Endrizzi M, Birren B, Lander ES.  2003. Sequencing and comparison of yeast species to identify genes and regulatory elements. Nature 423:241–254. [DOI] [PubMed] [Google Scholar]
  39. Koboldt  DC, Zhang Q, Larson DE, Shen D, McLellan MD, Lin L, Miller CA, Mardis ER, Ding L, Wilson RK.  2012. VarScan 2: Somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res. 22:568–576. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Langmead  B, Salzberg SL.  2012. Fast gapped-read alignment with Bowtie 2. Nat Methods 9:357–359. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Langmead  B, Trapnell C, Pop M, Salzberg SL.  2009. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10:R25.. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Legras  JL, Merdinoglu D, Cornuet JM, Karst F.  2007. Bread, beer and wine: Saccharomyces cerevisiae diversity reflects human history. Mol Ecol. 16:2091–2102. [DOI] [PubMed] [Google Scholar]
  43. Li  H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis GR, Durbin R.  2009. The sequence alignment/map format and SAMtools. Bioinformatics 25:2078–2079. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Lissina  E, Young B, Urbanus ML, Guan XL, Lowenson J, Hoon S, Baryshnikova A, Riezman I, Michaut M, Riezman H, et al. 2011. A systems biology approach reveals the role of a novel methyltransferase in response to chemical stress and lipid homeostasis. PLOS Genet. 7:e1002332.. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Listgarten  J, Lippert C, Heckerman D.  2013. FaST-LMM-Select for addressing confounding from spatial structure and rare variants. Nat Genet. 45:470–471. [DOI] [PubMed] [Google Scholar]
  46. Listgarten  J, Lippert C, Kadie CM, Davidson RI, Eskin E, Heckerman D.  2012. Improved linear mixed models for genome-wide association studies. Nat Methods 9:525–526. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Liti  G, Carter DM, Moses AM, Warringer J, Parts L, James SA, Davey RP, Roberts IN, Burt A, Koufopanou V, et al. 2009. Population genomics of domestic and wild yeasts. Nature 458:337–341. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Liti  G, Louis EJ.  2012. Advances in quantitative trait analysis in yeast. PLOS Genet. 8:e1002912.. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Liti  G, Nguyen Ba AN, Blythe M, Müller CA, Bergström A, Cubillos FA, Dafhnis-Calas F, Khoshraftar S, Malla S, Mehta N, et al. 2013. High quality de novo sequencing and assembly of the Saccharomyces arboricolus genome. BMC Genomics 14:69. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Long  A, Liti G, Luptak A, Tenaillon O.  2015. Elucidating the molecular architecture of adaptation via evolve and resequence experiments. Nat Rev Genet. 16:567–582. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Luo  R, Liu B, Xie Y, Li Z, Huang W, Yuan J, He G, Chen Y, Pan Q, Liu Y, et al. 2012. SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. Gigascience 1:18. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Martin  M.  2011. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J. 17:10. [Google Scholar]
  53. Martin  MV.  1999. The use of fluconazole and itraconazole in the treatment of Candida albicans infections: a review. J Antimicrob Chemother. 44:429–437. [DOI] [PubMed] [Google Scholar]
  54. McDonald  JH, Kreitman M.  1991. Adaptive protein evolution at the Adh locus in Drosophila. Nature 351:652–654. [DOI] [PubMed] [Google Scholar]
  55. McIlwain  SJ, Peris D, Sardi M, Moskvin OV, Zhan F, Myers KS, Riley NM, Buzzell A, Parreiras LS, Ong IM, et al. 2016. Genome sequence and analysis of a stress-tolerant, wild-derived strain of Saccharomyces cerevisiae used in biofuels research. G3 6:1757–1766. [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Menichetti  F, Fiorio M, Tosti A, Gatti G, Bruna Pasticci M, Miletich F, Marroni M, Bassetti D, Pauluzzi S.  1996. High-dose fluconazole therapy for cryptococcal meningitis in patients with AIDS. Clin Infect Dis. 22:838–840. [DOI] [PubMed] [Google Scholar]
  57. Messer  PW, Petrov D. a.  2013. Frequent adaptation and the McDonald-Kreitman test. Proc Natl Acad Sci USA. 110:8615–8620. [DOI] [PMC free article] [PubMed] [Google Scholar]
  58. Mortimer  RK, Johnston JR.  1986. Genealogy of principal strains of the yeast genetic stock center. Genetics 113:35–43. [DOI] [PMC free article] [PubMed] [Google Scholar]
  59. Notredame  C, Higgins DG, Heringa J.  2000. T-Coffee: a novel method for fast and accurate multiple sequence alignment. J Mol Biol. 302:205–217. [DOI] [PubMed] [Google Scholar]
  60. Novo  M, Bigey F, Beyne E, Galeote V, Gavory F, Mallet S, Cambon B, Legras J-L, Wincker P, Casaregola S, et al. 2009. Eukaryote-to-eukaryote gene transfer events revealed by the genome sequence of the wine yeast Saccharomyces cerevisiae EC1118. Proc Natl Acad Sci USA. 106:16333–16338. [DOI] [PMC free article] [PubMed] [Google Scholar]
  61. Ohya  Y, Sese J, Yukawa M, Sano F, Nakatani Y, Saito TL, Saka A, Fukuda T, Ishihara S, Oka S, et al. 2005. High-dimensional and large-scale phenotyping of yeast mutants. Proc Natl Acad Sci USA. 102:19015–19020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  62. Parenteau  J, Durand M, Véronneau S, Lacombe A-A, Morin G, Guérin V, Cecez B, Gervais-Bird J, Koh C-S, Brunelle D, et al. 2008. Deletion of many yeast introns reveals a minority of genes that require splicing for function. Mol Biol Cell 19:1932–1941. [DOI] [PMC free article] [PubMed] [Google Scholar]
  63. Parts  L, Cubillos FA, Warringer J, Jain K, Salinas F, Bumpstead SJ, Molin M, Zia A, Simpson JT, Quail MA, et al. 2011. Revealing the genetic structure of a trait by sequencing a population under selection. Genome Res. 21:1131–1138. [DOI] [PMC free article] [PubMed] [Google Scholar]
  64. Pierce  SE, Fung EL, Jaramillo DF, Chu AM, Davis RW, Nislow C, Giaever G.  2006. A unique and universal molecular barcode array. Nat Methods 3:601–603. [DOI] [PubMed] [Google Scholar]
  65. Qian  W, Ma D, Xiao C, Wang Z, Zhang J.  2012a. The Genomic landscape and evolutionary resolution of antagonistic pleiotropy in yeast. Cell Rep. 2:1399–1410. [DOI] [PMC free article] [PubMed] [Google Scholar]
  66. Qian  W, Yang J-R, Pearson NM, Maclean C, Zhang J.  2012b. Balanced codon usage optimizes eukaryotic translational efficiency. PLoS Genet. 8:e1002603.. [DOI] [PMC free article] [PubMed] [Google Scholar]
  67. Raj  A, Stephens M, Pritchard JK.  2014. fastSTRUCTURE: Variational inference of population structure in large SNP datasets. Genetics 573–589. [DOI] [PMC free article] [PubMed] [Google Scholar]
  68. Robinson  DG, Chen W, Storey JD, Gresham D.  2014. Design and analysis of bar-seq experiments. G3 4:11–18. [DOI] [PMC free article] [PubMed] [Google Scholar]
  69. Robinson  MD, McCarthy DJ, Smyth GK.  2010. edgeR: a bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26:139–140. [DOI] [PMC free article] [PubMed] [Google Scholar]
  70. Rohland  N, Reich D.  2012. Cost-effective, high-throughput DNA sequencing libraries for multiplexed target capture. Genome Res. 22:939–946. [DOI] [PMC free article] [PubMed] [Google Scholar]
  71. Scannell  DR, Zill OA, Rokas A, Payen C, Dunham MJ, Eisen MB, Rine J, Johnston M, Hittinger CT.  2011. The awesome power of yeast evolutionary genetics: new genome sequences and strain resources for the Saccharomyces sensu stricto genus. G3 1:11–25. [DOI] [PMC free article] [PubMed] [Google Scholar]
  72. Schacherer  J, Shapiro J. a, Ruderfer DM, Kruglyak L.  2009. Comprehensive polymorphism survey elucidates population structure of Saccharomyces cerevisiae. Nature 458:342–345. [DOI] [PMC free article] [PubMed] [Google Scholar]
  73. Sinha  H, David L, Pascon RC, Clauder-Münster S, Krishnakumar S, Nguyen M, Shi G, Dean J, Davis RW, Oefner PJ, et al. 2008. Sequential elimination of major-effect contributors identifies additional quantitative trait loci conditioning high-temperature growth in yeast. Genetics 180:1661–1670. [DOI] [PMC free article] [PubMed] [Google Scholar]
  74. Sinha  H, Nicholson BP, Steinmetz LM, McCusker JH.  2006. Complex genetic interactions in a quantitative trait locus. PLoS Genet. 2:140–147. [DOI] [PMC free article] [PubMed] [Google Scholar]
  75. Skelly  DA, Merrihew GE, Riffle M, Connelly CF, Kerr EO, Johansson M, Jaschob D, Graczyk B, Shulman NJ, Wakefield J, et al. 2013. Integrative phenomics reveals insight into the structure of phenotypic diversity in budding yeast. Genome Res. 23:1496–1504. [DOI] [PMC free article] [PubMed] [Google Scholar]
  76. Slater  GSC, Birney E.  2005. Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics 6:31.. [DOI] [PMC free article] [PubMed] [Google Scholar]
  77. Smith  AM, Durbic T, Oh J, Urbanus M, Proctor M, Heisler LE, Giaever G, Nislow C.  2011. Competitive genomic screens of barcoded yeast libraries. J Vis Exp. 54:2–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  78. Smith  AM, Heisler LE, Mellor J, Kaper F, Thompson MJ, Chee M, Roth FP, Giaever G, Nislow C.  2009. Quantitative phenotyping via deep barcode sequencing. Genome Res. 19:1836–1842. [DOI] [PMC free article] [PubMed] [Google Scholar]
  79. Sopko  R, Huang D, Preston N, Chua G, Papp B, Kafadar K, Snyder M, Oliver SG, Cyert M, Hughes TR, et al. 2006. Mapping pathways and phenotypes by systematic gene overexpression. Mol Cell 21:319–330. [DOI] [PubMed] [Google Scholar]
  80. Steinmetz  LM, Sinha H, Richards DR, Spiegelman JI, Oefner PJ, McCusker JH, Davis RW.  2002. Dissecting the architecture of a quantitative trait locus in yeast. Nature 416:326–330. [DOI] [PubMed] [Google Scholar]
  81. Strope  PK, Skelly DA, Kozmin SG, Mahadevan G, Stone EA, Magwene PM, Dietrich FS, McCusker JH, Carolina N, Sciences B, et al. 2015. The 100-genomes strains, an S. cerevisiae resource that illuminates its natural phenotypic and genotypic variation and emergence as an opportunistic pathogen. Genome Res. 125:762–774. [DOI] [PMC free article] [PubMed] [Google Scholar]
  82. Swinnen  S, Thevelein JM, Nevoigt E.  2012. Genetic mapping of quantitative phenotypic traits in Saccharomyces cerevisiae. FEMS Yeast Res. 12:215–227. [DOI] [PubMed] [Google Scholar]
  83. Tajima  F.  1989. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123:585–595. [DOI] [PMC free article] [PubMed] [Google Scholar]
  84. Tamura  K, Peterson D, Peterson N, Stecher G, Nei M, Kumar S.  2011. MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol Biol Evol. 28:2731–2739. [DOI] [PMC free article] [PubMed] [Google Scholar]
  85. Trapnell  C, Williams BA, Pertea G, Mortazavi A, Kwan G, van Baren MJ, Salzberg SL, Wold BJ, Pachter L.  2010. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol. 28:511–515. [DOI] [PMC free article] [PubMed] [Google Scholar]
  86. Tsai  IJ, Bensasson D, Burt A, Koufopanou V.  2008. Population genomics of the wild yeast Saccharomyces paradoxus: quantifying the life cycle. Proc Natl Acad Sci USA. 105:4957–4962. [DOI] [PMC free article] [PubMed] [Google Scholar]
  87. Wach  A, Brachat A, Pöhlmann R, Philippsen P.  1994. New heterologous modules for classical or PCR-based gene disruptions in Saccharomyces cerevisiae. Yeast 10:1793–1808. [DOI] [PubMed] [Google Scholar]
  88. Wang  Q-M, Liu W-Q, Liti G, Wang S-A, Bai F-Y.  2012. Surprisingly diverged populations of Saccharomyces cerevisiae in natural environments remote from human activity. Mol Ecol. 21:5404–5417. [DOI] [PubMed] [Google Scholar]
  89. Warringer  J, Zörgö E, Cubillos F. a F, Zia A, Gjuvsland A, Simpson JT, Forsmark A, Durbin R, Omholt SW, Louis EJ, et al. 2011. Trait variation in yeast is defined by population history. PLoS Genet 7:e1002111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  90. Winzeler  EA, Shoemaker DD, Astromoff A, Liang H, Anderson K, Andre B, Bangham R, Benito R, Boeke JD, Bussey H, et al. 1999. Functional characterization of the S. cerevisiae genome by gene deletion and parallel analysis. Science 285:901–906. [DOI] [PubMed] [Google Scholar]
  91. Wolters  JF, Chiu K, Fiumera HL.  2015. Population structure of mitochondrial genomes in Saccharomyces cerevisiae. BMC Genomics 16:451.. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Data 1
mbe_34_10_2486_s6.xlsx (28KB, xlsx)
Supplementary Data 2
mbe_34_10_2486_s7.xls (71.5KB, xls)
Supplementary Data 3
mbe_34_10_2486_s8.xlsx (13.9KB, xlsx)
Supplementary Data 4
mbe_34_10_2486_s9.xls (8.2MB, xls)
Supplementary Data 5
mbe_34_10_2486_s10.xls (5.2MB, xls)
Supplementary Data 6
Supplementary Data 7
mbe_34_10_2486_s12.docx (92.2KB, docx)
Supplementary Data 8
Supplementary Data
mbe_34_10_2486_s1.pdf (6.9MB, pdf)

Articles from Molecular Biology and Evolution are provided here courtesy of Oxford University Press

RESOURCES