Rare variants contribute disproportionately to quantitative trait variation in yeast

Joshua S Bloom; James Boocock; Sebastian Treusch; Meru J Sadhu; Laura Day; Holly Oates-Barker; Leonid Kruglyak

doi:10.7554/eLife.49212

. 2019 Oct 24;8:e49212. doi: 10.7554/eLife.49212

Rare variants contribute disproportionately to quantitative trait variation in yeast

Joshua S Bloom ^1,^2,^3,^4,^✉, James Boocock ^1,^2,^3,⁴, Sebastian Treusch ^1,^2,^3,^4,^†, Meru J Sadhu ^1,^2,^3,^4,^‡, Laura Day ^1,^2,³, Holly Oates-Barker ^1,^2,³, Leonid Kruglyak ^1,^2,^3,^4,^✉

Editors: Christian R Landry⁵, Naama Barkai⁶

PMCID: PMC6892613 PMID: 31647408

Abstract

How variants with different frequencies contribute to trait variation is a central question in genetics. We use a unique model system to disentangle the contributions of common and rare variants to quantitative traits. We generated ~14,000 progeny from crosses among 16 diverse yeast strains and identified thousands of quantitative trait loci (QTLs) for 38 traits. We combined our results with sequencing data for 1011 yeast isolates to show that rare variants make a disproportionate contribution to trait variation. Evolutionary analyses revealed that this contribution is driven by rare variants that arose recently, and that negative selection has shaped the relationship between variant frequency and effect size. We leveraged the structure of the crosses to resolve hundreds of QTLs to single genes. These results refine our understanding of trait variation at the population level and suggest that studies of rare variants are a fertile ground for discovery of genetic effects.

Research organism: S. cerevisiae

Introduction

A detailed understanding of the sources of heritable variation is a central goal of modern genetics. Genome-wide association studies (GWAS) in humans (Visscher et al., 2017) have implicated tens of thousands of DNA sequence variants in disease risk and quantitative trait variation, but these variants fail to account for the entire heritability of diseases and traits. One key question is the relative contribution of DNA sequence variants with different allele frequencies in a population to trait variation. GWAS by design only test common DNA sequence variants; however, recent studies underscore the likely importance of the contribution of rare variants to heritable variation (Wainschtein et al., 2019). Theoretical analyses have explored how factors such as mutational target size, pleiotropy, and the strength of selection shape the relationship between variant frequency and effect size (Eyre-Walker, 2010; Robinson et al., 2014; Simons et al., 2018). In particular, purifying selection against variants that negatively affect fitness is expected to keep them at low frequencies in a population, resulting in a predicted inverse relationship between effect sizes and allele frequencies for variants that influence fitness-related traits (Gibson, 2012; Goldstein et al., 2013; Kryukov et al., 2007; Pritchard, 2001).

Empirical results have been consistent with the theoretical expectation that rare variants should have larger effect sizes, or, equivalently, that variants implicated in trait variation should be shifted to lower frequencies relative to all variants. An increased burden of ultra-rare protein-truncating variants has been observed in human diseases (Ganna et al., 2018; Exome Aggregation Consortium et al., 2016), and multiple studies have found that GWAS variants with lower allele frequencies have larger effect sizes (Marouli et al., 2017; Park et al., 2011). A negative correlation between allele frequency and effect size has also been observed in maize GWAS (Wallace et al., 2014), and our previous work in yeast suggested that variants that contribute to trait variation are shifted to lower frequencies when compared to all sequence variants (Ehrenreich et al., 2012).

Recent studies employed indirect variance partitioning approaches to uncover appreciable contributions of lower frequency variants to heritability of complex traits in humans, including prostate cancer susceptibility (Mancuso et al., 2016), height (Wainschtein et al., 2019; Yang et al., 2015), and body mass index (Wainschtein et al., 2019). However, a direct comprehensive comparison of the effects of rare and common variants has been lacking in humans for two principal reasons. First, rare variants cannot be detected by GWAS by design, and sequencing studies have not reached sufficient sample sizes to find them with high statistical power (Zuk et al., 2014). As a result, most rare variants have to date escaped detection. Second, the power to detect a variant with any given effect size decreases with the frequency of the variant in the study, simply because fewer individuals in the sample carry a less-frequent variant (Zuk et al., 2014). This statistical artifact shifts the effect sizes of those rare variants that are detected upwards, confounding effect size and allele frequency and biasing any effort to measure the underlying relationship between the two.

Here, we report a comprehensive study in yeast designed to overcome these limitations. We built a mapping population consisting of approximately one thousand progeny from each of 16 biparental crosses. In this mapping population, even variants that are rare in the yeast population and occur in only a single parental strain are present in approximately 1000 progeny, resulting in high power to detect them. We mapped thousands of QTLs that account for most of the heritable variation in 38 quantitative traits and measured the QTL effect sizes. We then decoupled variant frequency from effect size by measuring the population allele frequencies of QTL lead variants detected in our panel in a separate large catalog of sequenced yeast isolates (Peter et al., 2018). Analysis of these large complementary data sets enabled us to directly and comprehensively examine the relationship between QTL effect sizes and variant frequency, characterize the genetic architecture of quantitative traits on a population scale, and improve mapping resolution, in many cases to single genes.

Results

To investigate the genetic basis of quantitative traits in the yeast population, we selected 16 highly diverse S. cerevisiae strains that capture much of the known genetic diversity of this species. Specifically, they contain both alleles at 82% of biallelic SNPs and small indels observed at minor allele frequency >5% in a collection of 1011 S. cerevisiae strains (Peter et al., 2018). We sequenced the 16 strains to high coverage in order to obtain a comprehensive set of genetic variants. We constructed a panel of 13,950 individual recombinant haploid yeast segregants by crossing each parental strain to two different strains and collecting an average of 872 progeny per cross (Figure 1; Figure 1—source data 1; Supplementary file 1). We genotyped these segregants by highly multiplexed whole-genome sequencing, with median 2.3-fold coverage per base per individual. Genotypes were called at 298,979 genetic variants, with an average of 71,117 genetic variants segregating in a single cross. We measured the growth of each segregant in 38 different environments in duplicate by automated assays and quantitative imaging (Materials and methods). Because the growth measurements in different environments are not strongly correlated, we treat them as separate phenotypes or traits (Bloom et al., 2013). The resulting genotype-by-phenotype matrix (over half a million phenotypic measurements and 158 billion combinations of genotype and phenotype) formed the basis for all downstream analyses.

Figure 1. — 16 parental strains were chosen to represent the diversity of the *S. cerevisiae* population, as illustrated by their positions on a neighbor-joining tree based on 1011 sequenced isolates (Peter et al., 2018). These strains were crossed in a single round-robin design, with each strain crossed to two other strains, as depicted by lines connecting the colored circles. Colors indicate the ecological origins of the parental strains.

Figure 1—source data 1. Additional information on yeast crosses and phenotypes.
Strain information for the 16 haploid parents and 16 F1 hybrids between them is listed. Additional information about the conditions tested is indicated.

elife-49212-fig1-data1.xls^{(31.5KB, xls)}

DOI: 10.7554/eLife.49212.003

We used a variance components model (Bloom et al., 2015; de los Campos et al., 2015; Yang et al., 2010) to show that, on average, additive genetic effects accounted for just over half of the total phenotypic variance, while pairwise genetic interactions accounted for 8%, approximately 1/6 as much as additive effects (Figure 2 inset; Supplementary file 2; Figure 2—source data 1). We carried out QTL mapping to find the specific loci contributing additively to trait variation. We used a joint mapping approach that leverages information across the entire panel of 13,950 segregants (Materials and methods). We mapped 4552 QTLs at a false discovery rate (FDR) of 5%, with an average of 120 (range 52–195) QTLs per trait (Supplementary file 3; Figure 3—source data 1). The detected QTLs explain a median of 73% of the additive heritability per trait and cross, showing that we can account for most of the genetic contribution to trait variation with specific loci (Figure 2; Figure 2—source data 1). We complemented the joint analysis with QTL mapping within each cross and found a median of 12 QTLs per trait at the same FDR of 5%. The detected loci explained a median of 68% of the additive heritability (Figure 2—source data 1). The joint analysis was more powerful, explaining an additional 5% of trait variance and uncovering 458 QTLs not detected within individual crosses. Consistent with the higher statistical power of the joint analysis, these additional QTLs had smaller effect sizes (median of 0.071 SD units vs 0.083 SD units; Wilcoxon rank sum test W = 1e6, p=9e-5). All subsequent results are based on the QTLs detected in the joint analysis.

Figure 2. — Whole-genome estimates of additive genetic variance (X-axis) are plotted against cross-validated estimates of trait variance explained by detected QTLs (Y-axis) for each trait-cross combination. Red points show values for the BY-RM cross. The diagonal line corresponds to detected QTLs explaining all of the estimated additive genetic variance, and is shown as a visual guide. (Inset) A histogram of the ratio of non-additive to additive genetic variance for each trait-cross combination, based on estimates from a variance component model.

Figure 2—source data 1. Total variance explained by QTLs and within-cross variance component analyses.
Results from within-cross variance components models and total variance explained by the QTL models are listed.

elife-49212-fig2-data1.xls^{(798.5KB, xls)}

DOI: 10.7554/eLife.49212.005

To investigate the relationship between variant frequency and QTL effects, we focused on biallelic variants observed in our panel whose frequency could be measured in a large collection of 1011 sequenced yeast strains. Based on their minor allele frequency (MAF) in this collection, we designated variants as rare (MAF <0.01) or common (MAF >0.01). By this definition, 27.8% of biallelic variants in our study were rare. For each trait, we computed the relative fraction of variance explained by these two categories of variants in the segregant panel (Materials and methods) (Yang et al., 2015). Across all traits, the median contribution of rare variants was 51.7%, despite the fact that they constituted only 27.8% of all variants and that a rare variant is expected to explain less variance than a common one with the same allelic effect size. These results are consistent with rare variants having larger effect sizes and making a disproportionate contribution to trait variation. Comparing different traits, we saw a wide range of the relative contribution of rare variants, from almost none for growth in the presence of copper sulfate and lithium chloride to over 75% for growth in the presence of cadmium chloride, in low pH, at high temperature, and on minimal medium (Figure 3A; Figure 3—figure supplement 1; Figure 3—source data 2). The results for copper sulfate and lithium chloride are consistent with GWAS for these traits in the 1011 sequenced yeast strains—these two traits had the most phenotypic variance explained by detected GWAS loci, which inherently correspond to common variants, with large contributions coming from known common copy-number variation at the CUP and ENA loci, respectively (Peter et al., 2018).

Figure 3. — (A) Stacked bar plots of additive genetic variance explained by rare (blue) and common (gray) variants. Error bars show + /- s.e. (B) Minor allele frequency (X-axis) of the lead variant at each QTL (Peter et al., 2018) is plotted against QTL effect size (Y-axis). Red points show mean QTL effect sizes for groups of approximately 100 variants binned by allele frequency. Error bars show + /- s.e.m. (C) Frequency of the derived allele of each QTL lead variant (X-axis), based on comparison with *S. paradoxus*, is plotted against QTL effect size (Y-axis). Negative values on the Y-axis correspond to variants with effects that are detrimental for growth.

Figure 3—source data 1. Detected QTL.
QTL mapping results are listed for both the within-cross and the joint analysis.

elife-49212-fig3-data1.xls^{(5.7MB, xls)}

DOI: 10.7554/eLife.49212.012

Figure 3—source data 2. Joint variance component estimates.
Results for the joint variance component models are given. This includes results for a model with two allele frequency bins (Figure 3A, figure supplement 1A), seven allele frequency bins (Figure 3—figure supplement 1B), and seven allele frequency bins using only variants that are private to each of the 16 parents (Figure 3—figure supplement 1C).

elife-49212-fig3-data2.xls^{(98KB, xls)}

DOI: 10.7554/eLife.49212.013

Figure 3—figure supplement 1. — (A) Stacked bar plots of additive genetic variance explained by rare (blue) and common (gray) variants. Error bars show + /- s.e. (B) Minor allele frequency (X-axis) of the lead variant at each QTL (Peter et al., 2018) is plotted against QTL effect size (Y-axis). Red points show mean QTL effect sizes for groups of approximately 100 variants binned by allele frequency. Error bars show + /- s.e.m. (C) Frequency of the derived allele of each QTL lead variant (X-axis), based on comparison with *S. paradoxus*, is plotted against QTL effect size (Y-axis). Negative values on the Y-axis correspond to variants with effects that are detrimental for growth.

Figure 3—source data 1. Detected QTL.
QTL mapping results are listed for both the within-cross and the joint analysis.

elife-49212-fig3-data1.xls^{(5.7MB, xls)}

DOI: 10.7554/eLife.49212.012

Figure 3—source data 2. Joint variance component estimates.
Results for the joint variance component models are given. This includes results for a model with two allele frequency bins (Figure 3A, figure supplement 1A), seven allele frequency bins (Figure 3—figure supplement 1B), and seven allele frequency bins using only variants that are private to each of the 16 parents (Figure 3—figure supplement 1C).

elife-49212-fig3-data2.xls^{(98KB, xls)}

DOI: 10.7554/eLife.49212.013

In a complementary analysis, we investigated the relationship between the allele frequency of the lead variant at each QTL and the corresponding QTL effect size. Although the lead variant is not necessarily causal, in our study it is likely to be of similar frequency as the causal variant, and a simulation analysis showed that this approach largely preserves the relationship between frequency and effect size (Figure 3—figure supplement 2). Most QTLs had small effects (64% of QTLs had effects less than 0.1 SD units) and most lead variants were common (78%), consistent with previous linkage and association studies. We observed that QTLs with large effects were highly enriched for rare variants, and conversely, that rare variants were highly enriched for large effect sizes (Figure 3B; Figure 3—figure supplement 3; Figure 3—figure supplement 4). For instance, among QTLs with an absolute effect of at least 0.3 SD units, 145 of the corresponding lead variants were rare and only 90 were common. Rare variants were 6.7 times more likely to have an effect greater than 0.3 SD (Figure 3—source data 1, Fisher’s exact test, p<2e-16). Theoretical population genetics models show that for traits under negative selection, variant effect size is expected to be a decreasing function of minor allele frequency (Eyre-Walker, 2010; Pritchard, 2001). We empirically observe this relationship in our data for most of the traits examined, providing evidence that they have evolved under negative selection in the yeast population (Figure 3—figure supplement 5).

The existence of a close sister species of S. cerevisiae—S. paradoxus—allowed us to distinguish rare variants by their ancestral state. Variants that share the major allele with S. paradoxus are more likely to have arisen in the S. cerevisiae population recently than those that share the minor allele with S. paradoxus. We classified low-frequency variants as recent or ancient according to whether their major or minor allele was shared with S. paradoxus, respectively. Recently arising deleterious alleles have had less time to be purged by negative selection, and therefore recent variants are expected to have stronger effects on gene function, and hence manifest as QTLs with larger effects. Consistent with this expectation, we observed that recent variants were 1.8 times more likely than ancient variants to have an effect size greater than 0.1 SD units (Fisher’s exact test p=9e-5) (Figure 3C). We further examined the direction of QTL effects and found that recent variants were 1.5 times more likely to decrease fitness (Fisher’s exact test p=8e-3). Strikingly, no ancient variant decreased fitness by more than 0.5 SD units, whereas 41 recent variants did (Fisher’s exact test p=7e-3).

An understanding of trait variation at the level of molecular mechanisms requires narrowing QTLs to the underlying causal genes. Such fine-mapping is a challenge because genetic linkage causes variants across an extended region to show mapping signals of similar strength. Statistical fine-mapping aims to address this challenge by estimating the probability that each variant within a QTL region is causal based on the precise pattern of genotype-phenotype correlations (Farh et al., 2015; Pasaniuc and Price, 2017; Treusch et al., 2015). Our crossing design enables us to obtain higher resolution for QTLs observed in two crosses that share a parent strain by looking for consistent inheritance patterns in both. Specifically, we focused on QTLs with effects greater than 0.14 SD units and used a Bayesian framework (Farh et al., 2015) to compute the posterior probability that each variant is causal (Figure 4A). We then aggregated these probabilities to obtain causality scores for each gene in a QTL. With this approach, we resolved 427 QTLs to single causal genes at an FDR of 20%. Because some QTLs have pleiotropic effects on multiple traits, this gene set contains 195 unique genes, greatly expanding the repertoire of causal genes in yeast. We searched the literature and found that 26 of the 195 genes identified here are supported by previous experimental evidence as causal for yeast trait variation (Fay, 2013; Jerison et al., 2017; Sadhu et al., 2016; Treusch et al., 2015; Wang and Kruglyak, 2014) (Figure 4B; Figure 4—source data 1). At a more stringent FDR of 5%, we found 105 unique causal genes, which included 24 of the 26 genes with experimental evidence.

Figure 4. — (A) Statistical fine-mapping of a QTL for growth in the presence of caffeine. Genetic mapping signal, shown as the coefficient of determination between genotype and phenotype (Y-axis, left), is plotted against genome position (X-axis) for crosses between 273614N and YJM981 (black) and YJM981 and CBS2888 (blue). The posterior probability of causality (PPC), plotted in red (Y-axis, right), localizes the QTL to a portion of the gene TOR1. (B) PPC is shown as black dots for 195 genes identified as causal at an FDR of 20%, sorted by PPC. Genes containing natural variants that have been experimentally validated as causal for trait variation in prior studies (Fay, 2013; Jerison et al., 2017; Sadhu et al., 2016; Treusch et al., 2015; Wang and Kruglyak, 2014) are shown in red and labeled with gene names.

Figure 4—source data 1. Candidate causal genes and GO enrichments.
Candidate causal genes per QTL are listed. GO enrichments for causal genes are listed.

elife-49212-fig4-data1.xls^{(1.5MB, xls)}

DOI: 10.7554/eLife.49212.015

Causal genes were highly enriched for GO terms related to the plasma membrane (45 of 522, 16.5 expected, q = 1.8e-7), metal ion transport (13 of 83, 2.6 expected, q = 0.0009), and positive regulation of nitrogen compound biosynthesis (28 of 393, 12.5 expected, q = 0.0076) (Figure 4—source data 1). Strikingly, five of the six genes involved in cAMP biosynthesis were identified as causal (IRA1, IRA2, BCY1, CYR1, and RAS1; 0.19 expected, q = 0.0002). Additional genes in the RAS/cAMP signaling pathway were also identified as causal, including GPR1, which is involved in glucose sensing, SRV2, which binds adenylate cyclase, and RHO3, which encodes a RAS-like GTPase. In yeast, the RAS/cAMP pathway regulates cell cycle progression, metabolism, and stress resistance (Tisi et al., 2014). Variation in many of these genes influenced growth on alternative carbon sources. We hypothesize that the yeast population contains abundant functional variation in genes that regulate the switch from glucose to alternative carbon sources through the RAS/cAMP pathway.

Discussion

We previously used a cross between lab (BY) and vineyard (RM) strains of yeast to show that the majority of heritable phenotypic differences arise from additive genetic effects, and we were able to detect, at genome-wide significance, specific loci that together account for the majority of quantitative trait variation (Bloom et al., 2015; Bloom et al., 2013). It has been argued that the BY lab reference strain (commonly known as S288c) used in those and many other yeast studies is genetically and phenotypically atypical compared to other yeast isolates (Warringer et al., 2011). Our results here, obtained from crosses among 16 diverse strains, generalize these findings to the S. cerevisiae population and show that S288c is not exceptional from the standpoint of genetic variation and quantitative traits. We believe that the findings that the majority of the genetic variance of most traits is additive, and that there is little additive ‘missing heritability’ in studies with sufficiently large sample sizes, will apply broadly beyond yeast.

We discovered over 4500 quantitative trait loci (QTLs) that influence yeast growth in a wide variety of conditions. These loci likely capture the majority of common variants that segregate in S. cerevisiae and have appreciable phenotypic effects on growth, and therefore provide a comprehensive starting point for more fine-grained analyses of the genetic contribution to quantitative trait variation. We were able to localize approximately 8% of the QTLs to single genes based on genetic mapping information alone. Interestingly, these genes cluster in specific functional categories and pathways, suggesting that different strains of S. cerevisiae may have evolved different strategies for nutrient sensing and response as a function of specializing in particular environmental niches (Chantranupong et al., 2015). In addition to the findings described here, we anticipate that our data set will be a useful resource for further dissecting the genetic basis of trait variation at the gene and variant level (Peltier et al., 2019), and for evaluating statistical methods aimed at inferring causal genes and variants. In particular, the set of loci and genes identified here provides an ideal starting point for massively parallel editing experiments that directly test the phenotypic consequences of sequence variants (Shendure and Fields, 2016).

By combining our results with deep population sequencing in yeast (Peter et al., 2018), we were able to examine the contributions of variants in different frequency classes to trait variation. This approach avoids statistical confounding between variant frequency and effect size that occurs when both are measured in the same study sample. We observed a broad range of genetic architectures across the traits studied here, with variation in some traits dominated by common variants, while variation in others is mostly explained by rare variants. Overall, rare variants made a disproportionate contribution to trait variation as a consequence of their larger effect sizes. A complementary mapping approach in an overlapping set of yeast isolates also revealed enrichment of rare variants with larger effects (Fournier et al., 2019). These results are consistent with the finding from GWAS that common variants have small effects, as well as with linkage studies that find rare variants with large effect sizes. Our study design also revealed a substantial component of genetic variation—variants with low allele frequency and small effect size—that has been refractory to discovery in humans because both GWAS and linkage studies lack statistical power to detect this class of variants. Recent work in humans has suggested that rare variants account for a substantial fraction of heritability of complex traits and diseases (Wainschtein et al., 2019). Our study presents a more direct and fine-grained view of this component of trait variation and implies that larger sample sizes and more complete genotype information will be needed for more comprehensive studies in other systems.

Materials and methods

Data availability

Unless otherwise specified, all computational analyses were performed in R (v3.4.4). Analysis code and processing scripts are available at https://github.com/joshsbloom/yeast-16-parents (Bloom, 2019; copy archived at https://github.com/elifesciences-publications/yeast-16-parents). Additional links to generated data are also provided in the github repository. The version numbers of R packages used are listed in this repository. Sequencing data for parents and segregants is available in the Sequenced Read Archive (SRA) under the Bioproject ID PRJNA549760.

Short-read and synthetic long read sequencing of parental strains

Parental genotypes were obtained by deep (>100X) paired-end sequencing of the 16 parental strains. A VCF file containing SNPs and small indels was generated for the parents using bwa (v0.7.1) (Li, 2013) to align to the sacCer3 reference (Engel et al., 2014), Picard (v2.12.2) (Broad Institute, 2019) to remove PCR duplicates, and the GATK HaplotypeCaller (v3.8) (Van der Auwera et al., 2013) with expected sample ploidy set to 1. A separate pipeline was developed to leverage additional synthetic long-reads (Illumina/Moleculo) to identify larger structural variants in the parents. Briefly, synthetic long-read assemblies were filtered to only include scaffolds greater than 10 kb. Scaffolds were corrected with our short-read data using Pilon (Walker et al., 2014). CNVs were discovered using custom scripts modified from scripts originally used to generate calls for testing LUMPY (Layer et al., 2014). CNVs were genotyped in all parents using the approach presented in SVTyper (Ebler et al., 2017). Scripts associated with the CNV detection pipeline are available at https://github.com/theboocock/long_read_cnv (Boocock, 2019; https://github.com/elifesciences-publications/long_read_cnv).

Construction of haploid segregant panels

Segregants for the BY-RM cross and YPS163-YJM145 cross were obtained by sporulation of the hybrid diploid parents for 5–7 days in SPO++ sporulation medium (http://dunham.gs.washington.edu/sporulationdissection.htm) and tetrad dissection using the MSM 400 dissection microscope (Singer Instrument Company Ltd.). Four-spore tetrads were retained. For BYxRM, one segregant was randomly chosen per tetrad (Bloom et al., 2013). For the YPS163-YJM145 cross, all segregants from ~250 tetrads were used. For all other crosses, the hybrid diploids were were pre-grown in YPD with either G418 or cloNat, depending on which fluorescent magic marker plasmid they contained (Treusch et al., 2015). Then they were sporulated in SPO++ and either cloNat or G418 for 5–7 days. A random spore prep was used to isolate haploid progeny (https://openwetware.org/wiki/McClean:Random_Spore_Prep), modified to exclude the use of glass beads for spore separation. Cells were plated on selective media, grown for two days, and colony fluorescence was visualized. Green fluorescent colonies and red fluorescent colonies corresponding to MATa and MATα haploid progeny were picked to deep-well 96-well plates and then split into frozen stocks.

Preparation of whole-genome sequencing libraries for segregants

Yeast were grown in 1 ml of yeast peptone dextrose in 2 ml deep-well 96-well plates (Thermo Scientific). Plates were sealed with Breathe-Easy gas-permeable membranes (Sigma-Aldrich). Yeasts were grown without shaking for 2 days in a 30°C incubator. Cell walls were digested with Zymolase, and DNA was extracted using either the 96-well DNeasy Blood and Tissue kits (Qiagen) for the BY-RM and YPS163-YJM145 segregants, or 96-well E-Z 96 Tissue DNA kit, following the bacterial protocol (Omega) for all other segregants. DNA concentrations were determined using the Quant-iT dsDNA High-Sensitivity DNA quantification kit (Invitrogen) and the Bio-Tek Synergy two-plate reader. DNA was diluted to 0.22 ng per μl using a Biomek FX liquid handing robot (Beckamn Coulter). For each segregant, 5 μl of 0.22 ng per μl of DNA was added to 4 μl of 5X Nextera HWM buffer (Illumina), 6 μl of water and 5 μl of 1/35 diluted Nextera enzyme. The transposition reaction was performed for 5 min at 55°C. Directly after the tagmentation reaction and without additional sample purification, Illumina sequencing adaptors and custom indices were added by PCR. 10 μl of tagmented DNA was combined with 0.5 μl each of 10 μM index primers (one of N701-N712 plus one of 96 custom indices, see Supplementary file 1), 5 μl of 10X Ex Taq buffer, 0.375 μl Ex Taq polymerase (Takara), 4 μl of 2.5 mM dNTPs and 29.625 μl of water, and amplified with 20 cycles of PCR. Up to 1152-plex libraries were run on a Hiseq 2500 with single end 150 bp reads, except BY-RM (Bloom et al., 2013) and YPS163-YJM145 which were sequenced with 100 bp reads.

Segregant genotype calling

Fastq files for were demultiplexed using fastq-multx (v1.3.1) (Aronesty, 2013) and aligned to the SacCer3 version of the reference genome using bwa. Adapter sequences were trimmed from reads and Phred33 quality scores were computed with Trimmomatic (v0.32) (Bolger et al., 2014). PCR duplicates were removed using Picard and then merged into one CRAM file per cross using Picard. VCF files were generated for each cross using the GATK haplotype caller (Van der Auwera et al., 2013) and genotypes were called at known variant sites between the parental strains. Additional custom provided R code was used to remove regions with strong mapping bias toward the reference genome (Albert et al., 2018), filter poor quality markers, and remove segregants with too many crossovers, likely corresponding to diploid contaminants. Missing segregant genotype information was imputed using a hidden Markov model (HMM) implemented in R/QTL (Arends et al., 2010). Structural variants identified in the parent VCF files were considered missing information in the segregants and the HMM was used to impute genotypes at those sites.

Phenotyping by endpoint colony growth

Segregants were arrayed to 384-well liquid plates in duplicate with different plate positions across the duplicates. Segregants were grown in YPD for approximately 48 hr without shaking and then pinned to agar plates using a BM-5 colony arraying robot (S and P Robotics). Plates were incubated for 48 hr and end-point growth was quantified by automated plate imaging using the colony arraying robot. Colony radii were calculated using functions in the EBImage R package (Pau et al., 2010), and endpoint growth measurements were filtered and normalized for plate effects as described previously (Bloom et al., 2015; Bloom et al., 2013). In addition, a manual filtering step was used to filter out aberrant colonies arising from technical artefacts, such as from wet spots on the agar plates at the time colonies were pinned. Unless otherwise specified, the average value across replicates was used per segregant for all downstream analyses.

Within-cross QTL mapping

QTL were mapped using a forward stepwise regression procedure that controls the FDR (G’Sell et al., 2013) for each trait and cross. We tested for linkage at each marker along the genome by calculating r², where r is the Pearson correlation coefficient between segregant genotypes at the marker and segregant phenotypes. 10,000 permutations of phenotype to strain assignment were performed and this statistic was calculated across the genome for each of the permutations. For each of the permutations, the maximum statistic was recorded to generate an empirical null distribution of the maximum statistic (Churchill and Doerge, 1994). A p-value was calculated as the probability the observed maximum statistic comes from the empirical null distribution of maximum statistics. If the observed statistic was greater than all of the empirical null statistics the p-value was recorded as 1e-4. The p-value was added to a set of p-values (p₁, … p_k), and the entire procedure was repeated (including permutations) with the previously identified marker(s) included as regression covariates. A ‘FowardStop’, FDR-controlling statistic (G’Sell et al., 2013) was calculated as $- \frac{1}{k} \sum_{i = 1}^{k} l o g (1 - p_{i})$ . We continued to add selected markers to a multiple regression model as long as the ‘ForwardStop’ statistic was less than or equal to 5%.

We note that we chose to use this procedure rather than procedures we have used in the past (Albert et al., 2018; Bloom et al., 2015; Bloom et al., 2013) because it is simple, does not require exchangeability of statistics across different traits, gives very similar results as previous methods, and we verified through simulations (not shown) that it controls FDR for forward stepwise model selection under different QTL architectures.

For this within-cross QTL mapping procedure, we re-localized QTL peak positions for QTL detected by the forward selection procedure. Specifically, for each QTL peak we included all other detected QTL peaks (as detected from the forward selection procedure) as covariates in a multiple regression model, and scanned each marker on the chromosome on which the QTL peak being re-localized was detected to identify the marker that maximized the likelihood of the multiple regression model. The marker that maximized the likelihood of the multiple regression model was retained as the new, re-localized, QTL peak position.

Cross-validation procedures to estimate heritability explained by QTL

The amount of additive variance explained by detected QTLs was estimated using cross-validation. For the within-cross analysis, segregants were randomly split into 10 sets. Each set of segregants was left out of the procedure one at a time (held-out set). The within-cross QTL mapping procedure was performed for all the other sets (training set). For the QTL markers detected in this training set and with effects estimated in the training set, the amount of variance explained by the joint model of the set of significant QTL markers was estimated in the held out set. For the joint analysis described below, we performed a similar procedure, splitting the segregants within each cross into 10 sets, leaving one of the sets from each cross out (held-out set) identifying QTL jointly across the other sets (training set) and estimating their effects in each cross (training set) and then estimating the variance explained in the held-out set.

Within-cross analysis to estimate additive and pairwise genetic interaction variance

To estimate the fraction of phenotypic variance attributable to additive genetic effects for each cross and trait we fit the model y = a+e, where y contains the segregant phenotype values and is standardized to have mean 0 and variance 1. Here, a are the additive genetic effects and the residual error is denoted as e. The distributions of these effects are assumed to be multivariate normal with mean zero and variance-covariance as follows:

a ~ N (0, σ_{A}^{2} A) a n d e ~ N (0, σ_{E V}^{2} I)

Here, A is the additive relatedness matrix, the fraction of genome shared between pairs of segregants and was calculated as $M M^{'} / n$ where M is the n x m matrix of standardized marker genotypes, n is the number of segregants and m is the number of markers.

We also fit an expanded model to estimate the relative contribution of additive vs non-additive (pairwise epistatic) effects. For the pairwise epistatic component, we believe that the assumption that all pairs of loci contribute to trait variation with effect sizes drawn from a single normal distribution is violated when one or a few QTL-QTL interactions with large effects are present, resulting in a downward bias. We previously showed (Bloom et al., 2015) that loci involved in such stronger interactions can be detected in additive scans. Therefore, by explicitly including additive QTLs in the three components model, we avoid making the assumption that the effect sizes of all locus pairs are drawn from the same normal distribution and obtain a better estimator of total two-way epistatic variance when large-effect QTL-QTL interactions are present. This model was parameterized as:

y = β X + Z q + Z a + Z f + Z g + Z i + Z p + e

The distributions of these effects are assumed to be multivariate normal with mean zero and variance-covariance as follows:

q ~ N (0, σ_{A_{Q T L}}^{2} A_{Q T L}), a ~ N (0, σ_{A}^{2} A), f ~ N (0, σ_{A_{Q T L} * A_{Q T L}}^{2} A_{Q T L} \circ A_{Q T L}),

$g ~ N (0, σ_{A_{Q T L} * A}^{2} A_{Q T L} \circ A),$ $i ~ N (0, σ_{A * A}^{2} A \circ A),$ $p ~ {N (0, σ}_{R}^{2} I_{n})$ , ${a n d e ~ N (0, σ}_{E V}^{2} I_{m})$ where y is a vector of length L that contains phenotypes for n segregants including replicate measurements such that L = n x [number of replicates]. β is a vector of estimated fixed effect coefficients. X is a matrix of fixed effects (here β is the overall mean, and X is a 1_L vector of ones unless otherwise specified). Z is an L x n incidence matrix that maps L total measures to n total segregants. In order, the random effect terms correspond to the effects of detected QTL, effects from the whole genome, epistatic interactions between detected QTL, epistatic interactions between additive QTL and the genome, epistatic interactions between all pairs of markers across the genome, and residual repeatability, following very similar methods and syntax as described previously (Bloom et al., 2015). We also fit a model that omitted the terms for epistatic interactions between detected QTL, and epistatic interactions between additive QTL and the genome. The mixed model was fit with the regress R package (Clifford and McCullagh, 2014) using restricted maximum likelihood estimation (REML). Standard errors of variance component estimates were calculated as the square root of the diagonal of the Fisher information matrix from the iteration at convergence of the Newton-Raphson algorithm. These procedures were used for all other mixed model analyses described below. For the analysis that compared the fraction non-additive to additive variation, we calculated $\frac{σ_{A_{Q T L} * A_{Q T L}}^{2} + σ_{A_{Q T L} * A}^{2} + σ_{A * A}^{2}}{σ_{A_{Q T L}}^{2} + σ_{A}^{2}}$ .

Allele-frequency lookup in 1011 yeast isolate population

We used bcftools isec (Li et al., 2009) to intersect our VCF containing sequence variant information on the 16 parental strains with the 1011 yeast isolate VCF generated by Peter et al. (2018), and vcftools (Danecek et al., 2011) to further filter only biallelic variants. This subset of 259,647 biallelic markers was used for variance components analysis and joint QTL mapping across the panel. Allele frequencies in the larger panel of 1011 yeast isolates were extracted from the provided VCF (Peter et al., 2018). Derived allele frequencies were calculated by using nucmer (Marçais et al., 2018) to perform whole genome alignment between the sacCer3 reference assembly and the CBS432 assembly of S. paradoxus. Variants were identified using delta-filter and show-snps commands provided in nucmer. Biallelic variants in our panel were classified as ancient if the variant matches the S. paradoxus sequence and recent if not. The unfolded allele frequency was calculated as the frequency of the recent variant. We could determine ancestral status for approximately 80% of the biallelic variants. To improve power for enrichment tests, we used derived allele frequency <5% and>95% as cutoffs when comparing effect sizes and signs of effects between derived and ancestral variants.

Genotype recoding for joint analyses

We coded the biallelic markers for which we had allele frequency data from the larger yeast isolate panel as −1 for matching the reference strain, or one if not matching the reference. If a variant does not segregate in a particular cross it was treated as missing in that cross.

Mixed model analysis with allele-frequency partitioning

We fit the following mixed model model per trait (jointly across the different crosses):

y = β X + r + c + e

The distributions of these effects are assumed to be multivariate normal with mean zero and variance-covariance as follows:

$r \sim N (0, σ_{R}^{2} A_{m a f < 1 %}), c \sim N (0, σ_{C}^{2} A_{m a f >= 1 %}), a n d e \sim N (0, σ_{E V}^{2} I_{m})$ where y is a vector of length 13,950 that contains phenotypes for segregants concatenated across the different crosses. β is a vector of estimated fixed effects of each cross. X is an incidence matrix mapping segregants to crosses. Here, the two relatedness matrices A_maf<1% and A_maf>=1% were calculated separately for all markers with MAF<1% and MAF>=1% respectively in the larger panel of 1,011 yeast isolates. Per marker, the genotype values were scaled to have mean 0 and variance 1, for each of the segregants from crosses in which that marker segregates. Markers that are fixed within a cross were excluded from the subsequent calculation of genetic covariance. The rationale for excluding data for variants not segregating in a given cross is that all such variants are completely confounded with each other and with any other effects specific to that cross. Thus, their effects are more appropriately captured by including a fixed effect for each cross within the analysis. Then, with M being the n segregants by m markers matrix corresponding to the standardized genotypes for that subset of markers, we calculated the relatedness matrix as a Gower’s centered matrix (Forni et al., 2011; Kang et al., 2010; McArdle and Anderson, 2001) $\frac{M M'}{\frac{t r (M M')}{n}}$ which has the property that the average diagonal coefficient equals 1.

We used the same logic to construct additional covariance matrices when more finely binning variants by allele frequency in the external panel (seven allele-frequency bins model). Bins were chosen to contain approximately equal numbers of variants. We also fit the seven allele-frequency bins model using only variants that were private to each parent (variants that only segregate in a pair of crosses). In this last model, the allele-frequency of variants used for the analysis are all approximately the same across the panel. Therefore, this last model does not make the assumption that the variance of variants effects is inversely proportional to their frequencies in the mapping panel (Yang et al., 2010).

The procedure for fitting these models was the same as described above in the section ‘within-cross variance component analysis’.

Accounting for large effect QTL and polygenic background for all chromosomes except the chromosome of interest for joint QTL mapping

For each chromosome of interest and for each trait and cross and trait, we calculated y_c = CQ+a_L+s_c where y_c is the vector of trait values for a given trait and cross, Q is a matrix of QTL genotypes at peak markers from the within-cross mapping described above, with FDR < 5% that are not located on the chromosome of interest, C is a vector of estimated QTL effects from the section ‘within-cross QTL mapping’, a_L is the additive genetic variance from all chromosomes excluding the chromosome of interest. a_L comes from the REML-based BLUP estimate of the effect all other chromosomes, including the fixed effects of detected QTL on the other. The goal of this step was to obtain the residual trait values s_c that can be used to scan for QTLs on a chromosome of interest and corrects for mapped genetic sources of variation that do not arise from the chromosome of interest (Yang et al., 2014).

Joint QTL mapping

Under the assumption that a causal biallelic variant has a consistent additive effect in all the crosses in which it segregates, we implemented a model to identify such variants jointly across our entire segregant panel (McMullen et al., 2009; Stich, 2009). This procedure increases statistical power. For example, for variants that are private to one of the 16 parental strains, this procedure will approximately double the observed number of instances of the minor allele, resulting in less noisy estimates of variant effects. For variants that are shared among multiple parents, the increase in the observed number of instances of the minor allele will be greater.

For each trait and each chromosome, and then for each marker on that chromosome, we calculated a t-statistic as $\frac{r}{\sqrt{\frac{1 - r^{2}}{n - 2}}}$ . Here, r is the Pearson correlation between the recoded segregant genotypes across the panel, and the vector s, which corresponds to the values of s_c described in the previous section concatenated across the different crosses. The number of informative segregants, n, differs for each biallelic variant, and corresponds to the sum of the sample sizes for each cross in which the variant segregates. P-values were calculated that factor in the different number of informative segregants, n, in the calculation of the degrees of freedom using built-in R functions. The -log10(p) was recorded. This statistic was calculated for each marker on the chromosome. 1000 permutations of phenotype to strain assignment were performed, but these permutations were performed with phenotype values within each cross (we did not permute values between crosses) and this statistic was calculated across the genome for each of the permutations. For each of the permutations, the maximum statistic was recorded to generate an empirical null distribution of the maximum statistic (Churchill and Doerge, 1994). A new corrected p-value was calculated as the probability the observed maximum statistic comes from the empirical null distribution of maximum statistics. If the observed maximum statistic was greater than all of the empirical null maximum statistics the p-value was recorded as 1e-3. The p-value was added to a set of p-values (p₁, … p_k), and the entire procedure was repeated (including permutations) with the previously identified marker(s) included as regression covariates. A ‘FowardStop’, FDR-controlling statistic (G’Sell et al., 2013) was calculated as described above. We continued to add selected markers to a multiple regression model as long as the ‘ForwardStop’ statistic was less than or equal to 5%.

Effect size estimation for joint QTL mapping

The peak markers (lead variants) from this procedure were then used for effect size estimation. For each trait and cross, the phenotypes are scaled to have mean 0 and variance 1, and effect sizes within each cross are estimated using multiple regression for the peak markers that segregate within that cross. The betas in this analysis correspond to the differences in the means between the two QTL alleles (conditional on the effects of the other segregating QTL). For peak markers that segregate in multiple crosses, the average betas over the different crosses are shown in Figure 3. Unbiased estimates of QTL effect size (Figure 3—figure supplement 4) were obtained by the same procedure except peak detection was performed in 9/10 of the data and effects estimated in the 1/10 of the data left out. Allele frequencies of the lead variants were looked up in the 1011 isolate panel.

Statistical fine-mapping to identify causal genes

We implemented the probabilistic identification of causal snps (PICS) procedure, a Bayesian approach to estimate the probability that a variant is causal. A very thorough description of the method, including details about the logic and implementation, is present in Farh et al. (2015). We aggregated these probabilities within genes to estimate the probability that a gene contains the causal variant. We noted the position of the observed QTL peak (called the ‘lead’ variant in the GWAS literature), and its effect size for all QTL that explained more than 2% of phenotypic variance from the within-cross mapping (equivalent to 0.1414 SD units). We assumed that the prior probabilities of a variant being causal, or being identified as a lead variant, are equal. For this analysis, we only used variants that fall within a 50 kb window centered around the detected QTL peak. For each variant within this window, we simulated the observed QTL effect size on the background of noise, 500 times. Here, noise was estimated as the residual error of the within-cross QTL model for that trait and cross. Each of the simulations was generated by a different permutation of the assignment of the residual error to segregant. We then repeated our mapping procedure for the simulated data and calculated the fraction of simulations where the observed QTL peak from our trait mappings was the lead variant given the simulated causal variant. This posterior probability was estimated for each of the variants within the 50 kb window, and then normalized so that the sum of all the probabilities in the window is 1. This generated a variant-level probability of causality for each variant within the window for that trait and cross.

Next, we identified overlapping QTL. Overlapping QTL were defined as the QTL coming from neighboring crosses that shared a parent, have 1.5 LOD drop confidence intervals that overlap, and have QTL effect directions that are consistent between the neighboring crosses. For these overlapping QTL, we calculated the product of the causality probabilities (described above) for each variant shared between the two crosses (and segregating in both crosses) and then normalized these probabilities so that they sum to 1. To calculate the probability that a gene was causal, we summed these probabilities for all variants that fell within each gene. Here, a gene was defined as all variants that fell within the defined open-reading frame as well as variants that fell halfway between the start and stop of the adjacent open-reading frames. We calculated a FDR by sorting the observed posterior probabilities of causality per gene from highest to lowest, calculating a posterior error probability as one minus the posterior probability of causality, and calculating the cumulative mean of these probabilities (Käll et al., 2008; Storey, 2003; Storey and Tibshirani, 2003).

We note that the causal gene statistic is an estimate of the posterior probability that a gene is causal assuming that one causal variant in the defined window is responsible for generating a signal in two crosses that share a parent strain, that we estimate the effects of causal variants in both crosses without error, and that genotypes are called without error.

Gene ontology enrichment analyses

We tested for GO enrichments using the R package topGO (Alexa and Rahnenfuhrer, 2018), using the Fisher test for enrichment ant the ‘classic’ scoring method that does not adjust the enrichments for significance of child GO terms.

Acknowledgements

We thank Bogdan Pasaniuc, Frank W Albert, Olga T Schubert, Liangke Gou, Tzitziki Lemus Vergara, Matthieu Delcourt, Longhua Guo, and Eyal Ben-David for helpful manuscript feedback and edits. We thank Illumina for performing synthetic long-read sequencing of the parental yeast strains. This work was supported by funding from the Howard Hughes Medical Institute (to LK) and NIH grant R01GM102308 (to LK). The authors declare no competing financial interests.

Funding Statement

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Contributor Information

Joshua S Bloom, Email: jbloom@mednet.ucla.edu.

Leonid Kruglyak, Email: LKruglyak@mednet.ucla.edu.

Christian R Landry, Université Laval, Canada.

Naama Barkai, Weizmann Institute of Science, Israel.

Funding Information

This paper was supported by the following grants:

National Institutes of Health R01GM102308 to Joshua S Bloom, Meru J Sadhu, Laura Day, Holly Oates-Barker, Leonid Kruglyak.
Howard Hughes Medical Institute to Joshua S Bloom, Laura Day, Holly Oates-Barker, Leonid Kruglyak.

Additional information

Competing interests

No competing interests declared.

Sebastian Treusch is now affiliated with Intrexon, although all work for this study was carried out while ST was affiliated with UCLA. The author has no other competing interests to declare.

Author contributions

Conceptualization, Resources, Data curation, Software, Formal analysis, Supervision, Validation, Investigation, Visualization, Methodology, Writing—original draft, Writing—review and editing.

Data curation, Software, Formal analysis, Writing—review and editing.

Conceptualization, Resources.

Investigation, Writing—review and editing.

Resources, Methodology.

Conceptualization, Supervision, Funding acquisition, Writing—original draft, Project administration, Writing—review and editing.

Additional files

Supplementary file 1. Key resources table.

elife-49212-supp1.xlsx^{(19.6KB, xlsx)}

DOI: 10.7554/eLife.49212.016

Supplementary file 2. Barplots depicting results from within-cross variance component analyses.

For each trait and cross, a stacked barplot is shown representing the results from a multiple variance component model. Phenotypic covariance was modeled as the sum of QTL effects (light blue, denoted as Q), additive genome effects (dark blue, denoted as A), interactions between additive QTLs (dark green, denoted as Q∘Q), interactions between additive QTLs and the rest of the genome (light green, denoted as Q∘A), interactions between all loci in the genome (sea green, denoted as A∘A), residual effect of strain (pink), and residual error (white).

elife-49212-supp2.pdf^{(125.4KB, pdf)}

DOI: 10.7554/eLife.49212.017

Supplementary file 3. QTL mapping results for each trait and cross.

Results from QTL mapping are shown for each trait. Each subpanel represents the results for a cross. Along the Y-axis the two parental strains for each cross are shown. Position of QTL along the genome is represented on the X-axis. The arrows represent QTL effects from the within-cross mapping. The arrows point toward the strain that increases growth. The size of the arrow reflects the magnitude of the QTL effect. Full length arrows represent QTL that explain more than 25% of phenotypic variance, ¾ length arrows represent QTL that explain between 8% and 25% of phenotypic variance, ½ length arrows represent QTL that explain between 4% and 8% of phenotypic variance, and short arrows represent QTL that explain less than 4% of phenotypic variance. Large effect QTL (explaining more than 4% of phenotypic variance) are colored black, and small effect QTL (less than 4% of phenotypic variance) are colored gray. The green vertical lines correspond to QTL detected from the joint QTL mapping analysis (Materials and methods).

elife-49212-supp3.pdf^{(464KB, pdf)}

DOI: 10.7554/eLife.49212.018

Transparent reporting form

elife-49212-transrepform.docx^{(246.5KB, docx)}

DOI: 10.7554/eLife.49212.019

Data availability

Unless otherwise specified, all computational analyses were performed in R (v3.4.4). Analysis code and processing scripts are available at https://github.com/joshsbloom/yeast-16-parents (copy archived at https://github.com/elifesciences-publications/yeast-16-parents). Additional links to generated data are also provided in the github repository. The version numbers of R packages used are listed in this repository. Sequencing data has been deposited in the SRA under the accession code PRJNA549760.

The following dataset was generated:

Bloom JS, Boocock J, Treusch S, Sadhu MJ, Day L, Oates-Barker H, Kruglyak L. 2019. Rare variants contribute disproportionately to quantitative trait variation in yeast, Jun 19 '19. SRA Bioproject. PRJNA549760

References

Albert FW, Bloom JS, Siegel J, Day L, Kruglyak L. Genetics of trans-regulatory variation in gene expression. eLife. 2018;7:e35471. doi: 10.7554/eLife.35471. [DOI] [PMC free article] [PubMed] [Google Scholar]
Alexa A, Rahnenfuhrer J. topGO: enrichment analysis for gene ontology 2018
Arends D, Prins P, Jansen RC, Broman KW. R/qtl: high-throughput multiple QTL mapping. Bioinformatics. 2010;26:2990–2992. doi: 10.1093/bioinformatics/btq565. [DOI] [PMC free article] [PubMed] [Google Scholar]
Aronesty E. Comparison of sequencing utility programs. The Open Bioinformatics Journal. 2013;7:1–8. doi: 10.2174/1875036201307010001. [DOI] [Google Scholar]
Bloom JS, Ehrenreich IM, Loo WT, Lite TL, Kruglyak L. Finding the sources of missing heritability in a yeast cross. Nature. 2013;494:234–237. doi: 10.1038/nature11867. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bloom JS, Kotenko I, Sadhu MJ, Treusch S, Albert FW, Kruglyak L. Genetic interactions contribute less than additive effects to quantitative trait variation in yeast. Nature Communications. 2015;6:ncomms9712. doi: 10.1038/ncomms9712. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bloom JS. yeast-16-parents. c913c9aGithub. 2019 https://github.com/joshsbloom/yeast-16-parents
Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30:2114–2120. doi: 10.1093/bioinformatics/btu170. [DOI] [PMC free article] [PubMed] [Google Scholar]
Boocock J. Github; 2019. https://github.com/theboocock/long_read_cnv [Google Scholar]
Broad Institute Picard Tools. 2019 http://broadinstitute.github.io/picard/
Chantranupong L, Wolfson RL, Sabatini DM. Nutrient-Sensing mechanisms across evolution. Cell. 2015;161:67–83. doi: 10.1016/j.cell.2015.02.041. [DOI] [PMC free article] [PubMed] [Google Scholar]
Churchill GA, Doerge RW. Empirical threshold values for quantitative trait mapping. Genetics. 1994;138:963–971. doi: 10.1093/genetics/138.3.963. [DOI] [PMC free article] [PubMed] [Google Scholar]
Clifford D, McCullagh P. The regress package. 2014 https://cran.r-project.org/web/packages/regress/regress.pdf
Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, Handsaker RE, Lunter G, Marth GT, Sherry ST, McVean G, Durbin R, 1000 Genomes Project Analysis Group The variant call format and VCFtools. Bioinformatics. 2011;27:2156–2158. doi: 10.1093/bioinformatics/btr330. [DOI] [PMC free article] [PubMed] [Google Scholar]
de los Campos G, Sorensen D, Gianola D. Genomic Heritability: What Is It? PLOS Genetics. 2015;11:e1005048. doi: 10.1371/journal.pgen.1005048. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ebler J, Schönhuth A, Marschall T. Genotyping inversions and tandem duplications. Bioinformatics. 2017;33:4015–4023. doi: 10.1093/bioinformatics/btx020. [DOI] [PubMed] [Google Scholar]
Ehrenreich IM, Bloom J, Torabi N, Wang X, Jia Y, Kruglyak L. Genetic Architecture of Highly Complex Chemical Resistance Traits across Four Yeast Strains. PLOS Genetics. 2012;8:e1002570. doi: 10.1371/journal.pgen.1002570. [DOI] [PMC free article] [PubMed] [Google Scholar]
Engel SR, Dietrich FS, Fisk DG, Binkley G, Balakrishnan R, Costanzo MC, Dwight SS, Hitz BC, Karra K, Nash RS, Weng S, Wong ED, Lloyd P, Skrzypek MS, Miyasato SR, Simison M, Cherry JM. The reference genome sequence of Saccharomyces cerevisiae : then and now. G3: Genes, Genomes, Genetics. 2014;4:389–398. doi: 10.1534/g3.113.008995. [DOI] [PMC free article] [PubMed] [Google Scholar]
Exome Aggregation Consortium. Lek M, Karczewski KJ, Minikel EV, Samocha KE, Banks E, Fennell T, O'Donnell-Luria AH, Ware JS, Hill AJ, Cummings BB, Tukiainen T, Birnbaum DP, Kosmicki JA, Duncan LE, Estrada K, Zhao F, Zou J, Pierce-Hoffman E, Berghout J, Cooper DN, Deflaux N, DePristo M, Do R, Flannick J, Fromer M, Gauthier L, Goldstein J, Gupta N, Howrigan D, Kiezun A, Kurki MI, Moonshine AL, Natarajan P, Orozco L, Peloso GM, Poplin R, Rivas MA, Ruano-Rubio V, Rose SA, Ruderfer DM, Shakir K, Stenson PD, Stevens C, Thomas BP, Tiao G, Tusie-Luna MT, Weisburd B, Won HH, Yu D, Altshuler DM, Ardissino D, Boehnke M, Danesh J, Donnelly S, Elosua R, Florez JC, Gabriel SB, Getz G, Glatt SJ, Hultman CM, Kathiresan S, Laakso M, McCarroll S, McCarthy MI, McGovern D, McPherson R, Neale BM, Palotie A, Purcell SM, Saleheen D, Scharf JM, Sklar P, Sullivan PF, Tuomilehto J, Tsuang MT, Watkins HC, Wilson JG, Daly MJ, MacArthur DG. Analysis of protein-coding genetic variation in 60,706 humans. Nature. 2016;536:285–291. doi: 10.1038/nature19057. [DOI] [PMC free article] [PubMed] [Google Scholar]
Eyre-Walker A. Genetic architecture of a complex trait and its implications for fitness and genome-wide association studies. PNAS. 2010;107:1752–1756. doi: 10.1073/pnas.0906182107. [DOI] [PMC free article] [PubMed] [Google Scholar]
Farh KK-H, Marson A, Zhu J, Kleinewietfeld M, Housley WJ, Beik S, Shoresh N, Whitton H, Ryan RJH, Shishkin AA, Hatan M, Carrasco-Alfonso MJ, Mayer D, Luckey CJ, Patsopoulos NA, De Jager PL, Kuchroo VK, Epstein CB, Daly MJ, Hafler DA, Bernstein BE. Genetic and epigenetic fine mapping of causal autoimmune disease variants. Nature. 2015;518:337–343. doi: 10.1038/nature13835. [DOI] [PMC free article] [PubMed] [Google Scholar]
Fay JC. The molecular basis of phenotypic variation in yeast. Current Opinion in Genetics & Development. 2013;23:672–677. doi: 10.1016/j.gde.2013.10.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
Forni S, Aguilar I, Misztal I. Different genomic relationship matrices for single-step analysis using phenotypic, pedigree and genomic information. Genetics Selection Evolution. 2011;43:1. doi: 10.1186/1297-9686-43-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
Fournier T, Abou Saada O, Hou J, Peter J, Caudal E, Schacherer J. Extensive impact of low-frequency variants on the phenotypic landscape at population-scale. eLife. 2019;8:e49258. doi: 10.7554/eLife.49258. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ganna A, Satterstrom FK, Zekavat SM, Das I, Kurki MI, Churchhouse C, Alfoldi J, Martin AR, Havulinna AS, Byrnes A, Thompson WK, Nielsen PR, Karczewski KJ, Saarentaus E, Rivas MA, Gupta N, Pietiläinen O, Emdin CA, Lescai F, Bybjerg-Grauholm J, Flannick J, Mercader JM, Udler M, Laakso M, Salomaa V, Hultman C, Ripatti S, Hämäläinen E, Moilanen JS, Körkkö J, Kuismin O, Nordentoft M, Hougaard DM, Mors O, Werge T, Mortensen PB, MacArthur D, Daly MJ, Sullivan PF, Locke AE, Palotie A, Børglum AD, Kathiresan S, Neale BM, GoT2D/T2D-GENES Consortium. SIGMA Consortium Helmsley IBD Exome Sequencing Project. FinMetSeq Consortium. iPSYCH-Broad Consortium Quantifying the impact of rare and Ultra-rare coding variation across the phenotypic spectrum. The American Journal of Human Genetics. 2018;102:1204–1211. doi: 10.1016/j.ajhg.2018.05.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gibson G. Rare and common variants: twenty arguments. Nature Reviews Genetics. 2012;13:135–145. doi: 10.1038/nrg3118. [DOI] [PMC free article] [PubMed] [Google Scholar]
Goldstein DB, Allen A, Keebler J, Margulies EH, Petrou S, Petrovski S, Sunyaev S. Sequencing studies in human genetics: design and interpretation. Nature Reviews Genetics. 2013;14:460–470. doi: 10.1038/nrg3455. [DOI] [PMC free article] [PubMed] [Google Scholar]
G’Sell MG, Wager S, Chouldechova A, Tibshirani R. Sequential selection procedures and false discovery rate control. arXiv. 2013 https://arxiv.org/abs/1309.5352
Jerison ER, Kryazhimskiy S, Mitchell JK, Bloom JS, Kruglyak L, Desai MM. Genetic variation in adaptability and pleiotropy in budding yeast. eLife. 2017;6:e27167. doi: 10.7554/eLife.27167. [DOI] [PMC free article] [PubMed] [Google Scholar]
Käll L, Storey JD, MacCoss MJ, Noble WS. Posterior error probabilities and false discovery rates: two sides of the same coin. Journal of Proteome Research. 2008;7:40–44. doi: 10.1021/pr700739d. [DOI] [PubMed] [Google Scholar]
Kang HM, Sul JH, Service SK, Zaitlen NA, Kong SY, Freimer NB, Sabatti C, Eskin E. Variance component model to account for sample structure in genome-wide association studies. Nature Genetics. 2010;42:348–354. doi: 10.1038/ng.548. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kryukov GV, Pennacchio LA, Sunyaev SR. Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics. 2007;80:727–739. doi: 10.1086/513473. [DOI] [PMC free article] [PubMed] [Google Scholar]
Layer RM, Chiang C, Quinlan AR, Hall IM. LUMPY: a probabilistic framework for structural variant discovery. Genome Biology. 2014;15:R84. doi: 10.1186/gb-2014-15-6-r84. [DOI] [PMC free article] [PubMed] [Google Scholar]
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, 1000 Genome Project Data Processing Subgroup The sequence alignment/Map format and SAMtools. Bioinformatics. 2009;25:2078–2079. doi: 10.1093/bioinformatics/btp352. [DOI] [PMC free article] [PubMed] [Google Scholar]
Li H. Aligning sequence reads. clone sequences and assembly contigs with BWA-MEM. arXiv. 2013 https://arxiv.org/abs/1303.3997
Mancuso N, Rohland N, Rand KA, Tandon A, Allen A, Quinque D, Mallick S, Li H, Stram A, Sheng X, Kote-Jarai Z, Easton DF, Eeles RA, Le Marchand L, Lubwama A, Stram D, Watya S, Conti DV, Henderson B, Haiman CA, Pasaniuc B, Reich D, PRACTICAL consortium The contribution of rare variation to prostate Cancer heritability. Nature Genetics. 2016;48:30–35. doi: 10.1038/ng.3446. [DOI] [PMC free article] [PubMed] [Google Scholar]
Marçais G, Delcher AL, Phillippy AM, Coston R, Salzberg SL, Zimin A. MUMmer4: a fast and versatile genome alignment system. PLOS Computational Biology. 2018;14:e1005944. doi: 10.1371/journal.pcbi.1005944. [DOI] [PMC free article] [PubMed] [Google Scholar]
Marouli E, Graff M, Medina-Gomez C, Lo KS, Wood AR, Kjaer TR, Fine RS, Lu Y, Schurmann C, Highland HM, Rüeger S, Thorleifsson G, Justice AE, Lamparter D, Stirrups KE, Turcot V, Young KL, Winkler TW, Esko T, Karaderi T, Locke AE, Masca NGD, Ng MCY, Mudgal P, Rivas MA, Vedantam S, Mahajan A, Guo X, Abecasis G, Aben KK, Adair LS, Alam DS, Albrecht E, Allin KH, Allison M, Amouyel P, Appel EV, Arveiler D, Asselbergs FW, Auer PL, Balkau B, Banas B, Bang LE, Benn M, Bergmann S, Bielak LF, Blüher M, Boeing H, Boerwinkle E, Böger CA, Bonnycastle LL, Bork-Jensen J, Bots ML, Bottinger EP, Bowden DW, Brandslund I, Breen G, Brilliant MH, Broer L, Burt AA, Butterworth AS, Carey DJ, Caulfield MJ, Chambers JC, Chasman DI, Chen Y-DI, Chowdhury R, Christensen C, Chu AY, Cocca M, Collins FS, Cook JP, Corley J, Galbany JC, Cox AJ, Cuellar-Partida G, Danesh J, Davies G, de Bakker PIW, de Borst GJ, de Denus S, de Groot MCH, de Mutsert R, Deary IJ, Dedoussis G, Demerath EW, den Hollander AI, Dennis JG, Di Angelantonio E, Drenos F, Du M, Dunning AM, Easton DF, Ebeling T, Edwards TL, Ellinor PT, Elliott P, Evangelou E, Farmaki A-E, Faul JD, Feitosa MF, Feng S, Ferrannini E, Ferrario MM, Ferrieres J, Florez JC, Ford I, Fornage M, Franks PW, Frikke-Schmidt R, Galesloot TE, Gan W, Gandin I, Gasparini P, Giedraitis V, Giri A, Girotto G, Gordon SD, Gordon-Larsen P, Gorski M, Grarup N, Grove ML, Gudnason V, Gustafsson S, Hansen T, Harris KM, Harris TB, Hattersley AT, Hayward C, He L, Heid IM, Heikkilä K, Helgeland Øyvind, Hernesniemi J, Hewitt AW, Hocking LJ, Hollensted M, Holmen OL, Hovingh GK, Howson JMM, Hoyng CB, Huang PL, Hveem K, Ikram MA, Ingelsson E, Jackson AU, Jansson J-H, Jarvik GP, Jensen GB, Jhun MA, Jia Y, Jiang X, Johansson S, Jørgensen ME, Jørgensen T, Jousilahti P, Jukema JW, Kahali B, Kahn RS, Kähönen M, Kamstrup PR, Kanoni S, Kaprio J, Karaleftheri M, Kardia SLR, Karpe F, Kee F, Keeman R, Kiemeney LA, Kitajima H, Kluivers KB, Kocher T, Komulainen P, Kontto J, Kooner JS, Kooperberg C, Kovacs P, Kriebel J, Kuivaniemi H, Küry S, Kuusisto J, La Bianca M, Laakso M, Lakka TA, Lange EM, Lange LA, Langefeld CD, Langenberg C, Larson EB, Lee I-T, Lehtimäki T, Lewis CE, Li H, Li J, Li-Gao R, Lin H, Lin L-A, Lin X, Lind L, Lindström J, Linneberg A, Liu Y, Liu Y, Lophatananon A, Luan Jian'an, Lubitz SA, Lyytikäinen L-P, Mackey DA, Madden PAF, Manning AK, Männistö S, Marenne G, Marten J, Martin NG, Mazul AL, Meidtner K, Metspalu A, Mitchell P, Mohlke KL, Mook-Kanamori DO, Morgan A, Morris AD, Morris AP, Müller-Nurasyid M, Munroe PB, Nalls MA, Nauck M, Nelson CP, Neville M, Nielsen SF, Nikus K, Njølstad PR, Nordestgaard BG, Ntalla I, O'Connel JR, Oksa H, Loohuis LMO, Ophoff RA, Owen KR, Packard CJ, Padmanabhan S, Palmer CNA, Pasterkamp G, Patel AP, Pattie A, Pedersen O, Peissig PL, Peloso GM, Pennell CE, Perola M, Perry JA, Perry JRB, Person TN, Pirie A, Polasek O, Posthuma D, Raitakari OT, Rasheed A, Rauramaa R, Reilly DF, Reiner AP, Renström F, Ridker PM, Rioux JD, Robertson N, Robino A, Rolandsson O, Rudan I, Ruth KS, Saleheen D, Salomaa V, Samani NJ, Sandow K, Sapkota Y, Sattar N, Schmidt MK, Schreiner PJ, Schulze MB, Scott RA, Segura-Lepe MP, Shah S, Sim X, Sivapalaratnam S, Small KS, Smith AV, Smith JA, Southam L, Spector TD, Speliotes EK, Starr JM, Steinthorsdottir V, Stringham HM, Stumvoll M, Surendran P, ‘t Hart LM, Tansey KE, Tardif J-C, Taylor KD, Teumer A, Thompson DJ, Thorsteinsdottir U, Thuesen BH, Tönjes A, Tromp G, Trompet S, Tsafantakis E, Tuomilehto J, Tybjaerg-Hansen A, Tyrer JP, Uher R, Uitterlinden AG, Ulivi S, van der Laan SW, Van Der Leij AR, van Duijn CM, van Schoor NM, van Setten J, Varbo A, Varga TV, Varma R, Edwards DRV, Vermeulen SH, Vestergaard H, Vitart V, Vogt TF, Vozzi D, Walker M, Wang F, Wang CA, Wang S, Wang Y, Wareham NJ, Warren HR, Wessel J, Willems SM, Wilson JG, Witte DR, Woods MO, Wu Y, Yaghootkar H, Yao J, Yao P, Yerges-Armstrong LM, Young R, Zeggini E, Zhan X, Zhang W, Zhao JH, Zhao W, Zhao W, Zheng H, Zhou W, Rotter JI, Boehnke M, Kathiresan S, McCarthy MI, Willer CJ, Stefansson K, Borecki IB, Liu DJ, North KE, Heard-Costa NL, Pers TH, Lindgren CM, Oxvig C, Kutalik Z, Rivadeneira F, Loos RJF, Frayling TM, Hirschhorn JN, Deloukas P, Lettre G. Rare and low-frequency coding variants alter human adult height. Nature. 2017;542:186–190. doi: 10.1038/nature21039. [DOI] [PMC free article] [PubMed] [Google Scholar]
McArdle BH, Anderson MJ. Fitting multivariate models to community data: a comment on distance-based redundancy analysis. Ecology. 2001;82:290–297. doi: 10.1890/0012-9658(2001)082[0290:FMMTCD]2.0.CO;2. [DOI] [Google Scholar]
McMullen MD, Kresovich S, Villeda HS, Bradbury P, Li H, Sun Q, Flint-Garcia S, Thornsberry J, Acharya C, Bottoms C, Brown P, Browne C, Eller M, Guill K, Harjes C, Kroon D, Lepak N, Mitchell SE, Peterson B, Pressoir G, Romero S, Rosas MO, Salvo S, Yates H, Hanson M, Jones E, Smith S, Glaubitz JC, Goodman M, Ware D, Holland JB, Buckler ES. Genetic Properties of the Maize Nested Association Mapping Population. Science. 2009;325:737–740. doi: 10.1126/science.1174320. [DOI] [PubMed] [Google Scholar]
Park JH, Gail MH, Weinberg CR, Carroll RJ, Chung CC, Wang Z, Chanock SJ, Fraumeni JF, Chatterjee N. Distribution of allele frequencies and effect sizes and their interrelationships for common genetic susceptibility variants. PNAS. 2011;108:18026–18031. doi: 10.1073/pnas.1114759108. [DOI] [PMC free article] [PubMed] [Google Scholar]
Pasaniuc B, Price AL. Dissecting the genetics of complex traits using summary association statistics. Nature Reviews Genetics. 2017;18:117–127. doi: 10.1038/nrg.2016.142. [DOI] [PMC free article] [PubMed] [Google Scholar]
Pau G, Fuchs F, Sklyar O, Boutros M, Huber W. EBImage--an R package for image processing with applications to cellular phenotypes. Bioinformatics. 2010;26:979–981. doi: 10.1093/bioinformatics/btq046. [DOI] [PMC free article] [PubMed] [Google Scholar]
Peltier E, Friedrich A, Schacherer J, Marullo P. Quantitative trait nucleotides impacting the technological performances of industrial Saccharomyces cerevisiae strains. Frontiers in Genetics. 2019;10:e00683. doi: 10.3389/fgene.2019.00683. [DOI] [PMC free article] [PubMed] [Google Scholar]
Peter J, De Chiara M, Friedrich A, Yue J-X, Pflieger D, Bergström A, Sigwalt A, Barre B, Freel K, Llored A, Cruaud C, Labadie K, Aury J-M, Istace B, Lebrigand K, Barbry P, Engelen S, Lemainque A, Wincker P, Liti G, Schacherer J. Genome evolution across 1,011 Saccharomyces cerevisiae isolates. Nature. 2018;556:339–344. doi: 10.1038/s41586-018-0030-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
Pritchard JK. Are Rare Variants Responsible for Susceptibility to Complex Diseases? The American Journal of Human Genetics. 2001;69:124–137. doi: 10.1086/321272. [DOI] [PMC free article] [PubMed] [Google Scholar]
Robinson MR, Wray NR, Visscher PM. Explaining additional genetic variation in complex traits. Trends in Genetics. 2014;30:124–132. doi: 10.1016/j.tig.2014.02.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
Sadhu MJ, Bloom JS, Day L, Kruglyak L. CRISPR-directed mitotic recombination enables genetic mapping without crosses. Science. 2016;352:1113–1116. doi: 10.1126/science.aaf5124. [DOI] [PMC free article] [PubMed] [Google Scholar]
Shendure J, Fields S. Massively Parallel Genetics. Genetics. 2016;203:617–619. doi: 10.1534/genetics.115.180562. [DOI] [PMC free article] [PubMed] [Google Scholar]
Simons YB, Bullaughey K, Hudson RR, Sella G. A population genetic interpretation of GWAS findings for human quantitative traits. PLOS Biology. 2018;16:e2002985. doi: 10.1371/journal.pbio.2002985. [DOI] [PMC free article] [PubMed] [Google Scholar]
Stich B. Comparison of Mating Designs for Establishing Nested Association Mapping Populations in Maize and Arabidopsis thaliana. Genetics. 2009;183:1525–1534. doi: 10.1534/genetics.109.108449. [DOI] [PMC free article] [PubMed] [Google Scholar]
Storey JD. The positive false discovery rate: a Bayesian interpretation and the q -value. The Annals of Statistics. 2003;31:2013–2035. doi: 10.1214/aos/1074290335. [DOI] [Google Scholar]
Storey JD, Tibshirani R. Statistical significance for genomewide studies. PNAS. 2003;100:9440–9445. doi: 10.1073/pnas.1530509100. [DOI] [PMC free article] [PubMed] [Google Scholar]
Tisi R, Belotti F, Martegani E. Yeast as a Model for Ras Signalling. In: Trabalzini L, editor. Ras Signaling: Methods and Protocols, Methods in Molecular Biology. Totawa, NJ: Humana Press; 2014. pp. 359–390. [DOI] [PubMed] [Google Scholar]
Treusch S, Albert FW, Bloom JS, Kotenko IE, Kruglyak L. Genetic Mapping of MAPK-Mediated Complex Traits Across S. cerevisiae. PLOS Genetics. 2015;11:e1004913. doi: 10.1371/journal.pgen.1004913. [DOI] [PMC free article] [PubMed] [Google Scholar]
Van der Auwera GA, Carneiro MO, Hartl C, Poplin R, Del Angel G, Levy-Moonshine A, Jordan T, Shakir K, Roazen D, Thibault J, Banks E, Garimella KV, Altshuler D, Gabriel S, DePristo MA. From FastQ data to high confidence variant calls: the genome analysis toolkit best practices pipeline. Current Protocols in Bioinformatics. 2013;43:11.10.1-33. doi: 10.1002/0471250953.bi1110s43. [DOI] [PMC free article] [PubMed] [Google Scholar]
Visscher PM, Wray NR, Zhang Q, Sklar P, McCarthy MI, Brown MA, Yang J. 10 years of GWAS discovery: biology, function, and translation. The American Journal of Human Genetics. 2017;101:5–22. doi: 10.1016/j.ajhg.2017.06.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wainschtein P, Jain DP, Yengo L, Zheng Z, Cupples LA, Shadyab AH, McKnight B, Shoemaker BM, Mitchell BD, Psaty BM, Kooperberg C, Roden D, Darbar D, Arnett DK, Regan EA, Boerwinkle E, Rotter JI, Allison MA, McDonald M-LN, Chung MK, Smith NL, Ellinor PT, Vasan RS, Mathias RA, Rich SS, Heckbert SR, Redline S, Guo X, Chen Y-DI, Liu C-T, de AM, Yanek LR, Albert CM, Hernandez RD, McGarvey ST, North KE, Lange LA, Weir BS, Laurie CC, Yang J, Visscher PM. Recovery of trait heritability from whole genome sequence data. bioRxiv. 2019 doi: 10.1101/588020. [DOI]
Walker BJ, Abeel T, Shea T, Priest M, Abouelliel A, Sakthikumar S, Cuomo CA, Zeng Q, Wortman J, Young SK, Earl AM. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLOS ONE. 2014;9:e112963. doi: 10.1371/journal.pone.0112963. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wallace JG, Bradbury PJ, Zhang N, Gibon Y, Stitt M, Buckler ES. Association Mapping across Numerous Traits Reveals Patterns of Functional Variation in Maize. PLOS Genetics. 2014;10:e1004845. doi: 10.1371/journal.pgen.1004845. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wang X, Kruglyak L. Genetic Basis of Haloperidol Resistance in Saccharomyces cerevisiae Is Complex and Dose Dependent. PLOS Genetics. 2014;10:e1004894. doi: 10.1371/journal.pgen.1004894. [DOI] [PMC free article] [PubMed] [Google Scholar]
Warringer J, Zörgö E, Cubillos FA, Zia A, Gjuvsland A, Simpson JT, Forsmark A, Durbin R, Omholt SW, Louis EJ, Liti G, Moses A, Blomberg A. Trait Variation in Yeast Is Defined by Population History. PLOS Genetics. 2011;7:e1002111. doi: 10.1371/journal.pgen.1002111. [DOI] [PMC free article] [PubMed] [Google Scholar]
Yang J, Benyamin B, McEvoy BP, Gordon S, Henders AK, Nyholt DR, Madden PA, Heath AC, Martin NG, Montgomery GW, Goddard ME, Visscher PM. Common SNPs explain a large proportion of the heritability for human height. Nature Genetics. 2010;42:565–569. doi: 10.1038/ng.608. [DOI] [PMC free article] [PubMed] [Google Scholar]
Yang J, Zaitlen NA, Goddard ME, Visscher PM, Price AL. Advantages and pitfalls in the application of mixed-model association methods. Nature Genetics. 2014;46:100–106. doi: 10.1038/ng.2876. [DOI] [PMC free article] [PubMed] [Google Scholar]
Yang J, Bakshi A, Zhu Z, Hemani G, Vinkhuyzen AAE, Lee SH, Robinson MR, Perry JRB, Nolte IM, van Vliet-Ostaptchouk JV, Snieder H, Esko T, Milani L, Mägi R, Metspalu A, Hamsten A, Magnusson PKE, Pedersen NL, Ingelsson E, Soranzo N, Keller MC, Wray NR, Goddard ME, Visscher PM. Genetic variance estimation with imputed variants finds negligible missing heritability for human height and body mass index. Nature Genetics. 2015;47:1114–1120. doi: 10.1038/ng.3390. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zuk O, Schaffner SF, Samocha K, Do R, Hechter E, Kathiresan S, Daly MJ, Neale BM, Sunyaev SR, Lander ES. Searching for missing heritability: designing rare variant association studies. PNAS. 2014;111:E455–E464. doi: 10.1073/pnas.1322563111. [DOI] [PMC free article] [PubMed] [Google Scholar]

eLife. doi: 10.7554/eLife.49212.023

Decision letter

Editor: Christian R Landry¹

In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.

Acceptance summary:

The authors show that rare and recent variants contribute proportionally more than common ones to trait variation in budding yeast. They combine a powerful quantitative genetics approach with extensive trait phenotyping and population genomics data to illustrate how natural selection may be keeping mutations with large effects from reaching high frequency. The expected negative correlation between effect size and frequency has been studied theoretically, but much less so at the experimental level because of the complex type of data needed. The study has broad significance, from population genetics, to quantitative genetics and evolution of complex traits.

Decision letter after peer review:

Thank you for submitting your article "Rare variants contribute disproportionately to quantitative trait variation in yeast" for consideration by eLife. Your article has been reviewed by three peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Naama Barkai as the Senior Editor. The reviewers have opted to remain anonymous.

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

Summary:

Your manuscript addresses an important question in genetics and evolutionary biology: what are the relative contributions of genetic variants to phenotypic variation and do these contributions correlate with the frequency of these variants within a species? The reviewers agree that this is an important question that has rarely been comprehensively addressed. They therefore find the work of interest and the findings to be important for the field, in addition to appreciating the quality of the writing and of the presentation. However, they identify some aspects that would need to be reconsidered or better presented and interpreted (details below). In addition, your paper could be strengthened if it was extended to include a more detailed Introduction on why these questions are important for the non-specialists. You could also include more discussions on the implications of your findings, including the implications for biology and evolution in general and for yeast in particular. Since eLife is a generalist journal, the manuscript would appeal to a larger audience with these changes. Your manuscript is currently very short so this would be feasible by the addition of a few short sections in the current structure of the manuscript. You will see that one specific comment relates to the novelty of the work with respect to previous studies. This means that the novelty may not be obvious as presented. It would therefore also be important to emphasize this aspect in a revised version.

Essential revisions:

1) There are two analyses, within cross and joint analysis. I have to go back and forth between Results and Materials and methods to figure out exactly what is done. It would be nice to make clear when discussing the results from which one they are derived.

2) Because the segregants are haploid, there is only the A x A interaction. The majority of variance generated by A x A in fact goes into the additive variance, hence the non-additive variance is small. The authors did not make a big deal out of the fact that non-additive variance is only 1/6 of additive variance, but I feel it's important to stress that large additive variance is expected given the population design. In addition, when estimating the variance attributable additive variance and epistatic variance, the authors broke the non-additive variance into three components, AQTL x A, AQTL x AQTL, and A x A. I wonder if this is necessary because there was no mention of the differences between these three components. A single component A(all) x A(all) could be fitted.

3) The two-component mixed model analysis has some caveats. There is correlation between rare and common variants, i.e., the variance components are not orthogonal. This makes any claim about the relative importance of rare versus common less reliable. For example, for the Cadmium chloride trait (Figure 3—figure supplement 1), the 7-component model seems to disagree with the 2-component model, with MAF < 0.01 explaining much less in the 7-component model than in the 2-component model in A. I think comparing the 2-component and one component model could suffer from the same problem. Perhaps a more appropriate (but still not perfect) analysis is to fit single component model first and then fit two component models, do it in two sequential orders (rare, then rare + common; versus common, then common + rare), and look at how the cumulative variance increase. This will tell you which MAF class is more important or can explain more variance. I think in this case, one-component model is more informative than the 2-component model.

4) In Figure 3, the relationship between effects and MAF or DAF is a major result. Although many other papers have reported similar results, I think this paper (and the co-submitted paper from Fournier et al.) has the most appropriate design, i.e., the discovery panel is independent of where MAF is estimated. Given its central role in this paper, it probably deserves a bit more clarity. A few questions came to mind when reading this part of the paper. In what analysis is the effect size estimated, single crosses or joint? Could you briefly explain how the effect sizes are estimated in the Results section? If effects estimated from joint analysis, the t-statistic used a factor (n-2) to normalize the degree of freedom, which is smaller for rare variants. This would lead to Beavis effect. I believe the authors used a cross validation strategy to estimate effects, but it's not very clear by reading the Materials and methods. Can you also plot 2pqa^2 versus MAF? Even if a is large, the variance contributed by rare variants could be small.

5) The main conclusion of the manuscript is that rare variants significantly contribute to genetic variance. In my view, this conclusion is biased as these rare causal variants are being analysed in genetic backgrounds in which they are no longer rare; actually, these variants are biallelic. Several studies have shown that a rare variant of MKT1(89A) is a significant contributor to phenotypic variation whenever it is present in segregating populations. However, MKT1(89A) allele hardly identified when one of the parents is not S288c, the strain which harbours this allele. So, the extension that if the rare variant has a significant effect in a sub-population, then its effect size would be similar in a large heterogeneous population is false. Furthermore, the authors conclude that in their larger 16 strain segregant populations, a representative distribution of 1000 strain collection, most of the variants have additive effects. This the authors claim is revalidation of their other previous studies (Bloom et al., 2013, 2015), where they identified that most of the causal variants between BYxRM had additive effects. However, in their subsequent paper (Frosberg et al., 2017, PMID 28250458) and another paper (Yadav et al., 2016, PMID 28172852) showed that variance mapping in BYxRM segregants helped to account for genetic interactions and showed how non-additive interactions also contribute significantly to phenotypic variation. Therefore, I find that just doing a few more strains and larger no. of segregants per cross does not make this manuscript a significant advance over the previous studies. One can argue that taking into account all causal variants identified till date (Fay, 2013), one can identify what frequency of rare variants have been identified, e.g. a typical example being MKT1(89A) allele as causal, even though their effect size will not be identified using this strategy. Peltier et al., 2019, show that 284 rare QTNs variants have been identified till date and these functional variants being private to a subpopulation, possibly due to their adaptive role to a specific environment. Moreover, this conclusion can be made without these extensive experimental crosses.

eLife. 2019 Oct 24;8:e49212. doi: 10.7554/eLife.49212.024

Author response

Essential revisions:

1) There are two analyses, within cross and joint analysis. I have to go back and forth between Results and Materials and methods to figure out exactly what is done. It would be nice to make clear when discussing the results from which one they are derived.

We apologize for any confusion resulting from our presentation of these analyses. We have made a clarification to the text to emphasize that, except for the one paragraph comparing the two analyses beginning with “We complemented the joint analysis with QTL mapping within each cross…”), the text focuses entirely on results from the joint analysis.

2) Because the segregants are haploid, there is only the A x A interaction. The majority of variance generated by A x A in fact goes into the additive variance, hence the non-additive variance is small. The authors did not make a big deal out of the fact that non-additive variance is only 1/6 of additive variance, but I feel it’s important to stress that large additive variance is expected given the population design. In addition, when estimating the variance attributable additive variance and epistatic variance, the authors broke the non-additive variance into three components, AQTL x A, AQTL x AQTL, and A x A. I wonder if this is necessary because there was no mention of the differences between these three components. A single component A(all) x A(all) could be fitted.

Our estimates of additive variance per trait and cross are not exceptional when compared with those obtained from approaches that have used pedigree or marker-based measures of relatedness for numerous traits in plants, livestock, other model organisms, and humans (e.g. Visscher et al., 2008 PMID 18319743; Yang et al., 2010, among many others). We note that our population of line cross progeny is actually expected to give a higher estimate of epistatic variance when compared to outbred populations: as e.g. Mackay et al., 2014, PMID 24296533 Figure 2 shows, estimates of epistatic variance are maximized as allele frequencies of the interacting loci approach 0.5 (as in our line crosses here). As the reviewer notes, another potential non-additive component, dominance variance, is not accessible in our experiment design which uses haploids, but study designs that can estimate dominance variance have not detected a large contribution (e.g. Parts et al., 2016, PMID 27804950).

We are grateful to the reviewer for pointing out an omission in the Materials and methods section of our manuscript regarding an explanation of why we modeled the epistatic variance with three components. First, as the reviewer suggests, we have added results from a model with a single A(all) x A(all) component to Supplementary file 2. In Author response image 1 we show a visual comparison of the fraction of non-additive variance explained by the three component model (x-axis) and the one component model (y-axis) for each trait and cross (the diagonal line corresponds to identity between the two estimates). The estimates are very similar for most traits and crosses, but one can observe that the three component model occasionally gives a higher estimate. This happens because a key assumption of the one component model – that all pairs of loci contribute to trait variation with effect sizes drawn from a single normal distribution – is violated when one or a few QTL-QTL interactions with large effects are present, resulting in a downward bias. We previously showed (Bloom et al., 2015) that loci involved in such stronger interactions can be detected in additive scans. Therefore, by explicitly including additive QTLs in the three component model, we avoid making the assumption that the effect sizes of all locus pairs are drawn from the same normal distribution and obtain a better estimator of total two-way epistatic variance when large-effect QTL-QTL interactions are present. We have included this rationale in the revised manuscript, in the Materials and methods susbection “Within-cross analysis to estimate additive and pairwise genetic interaction variance”.

3) The two-component mixed model analysis has some caveats. There is correlation between rare and common variants, i.e., the variance components are not orthogonal. This makes any claim about the relative importance of rare versus common less reliable. For example, for the Cadmium chloride trait (Figure 3—figure supplement 1), the 7-component model seems to disagree with the 2-component model, with MAF < 0.01 explaining much less in the 7-component model than in the 2-component model in A. I think comparing the 2-component and one component model could suffer from the same problem. Perhaps a more appropriate (but still not perfect) analysis is to fit single component model first and then fit two component models, do it in two sequential orders (rare, then rare + common; versus common, then common + rare), and look at how the cumulative variance increase. This will tell you which MAF class is more important or can explain more variance. I think in this case, one-component model is more informative than the 2-component model.

We agree with the reviewer that genetic linkage creates a correlation between rare and common variants in genetic mapping studies. The variance component analysis performed here is based on approaches that are the standard in the field, and that have been extensively used in studies of human datasets that seek to address similar fundamental questions about the contribution of variants at different allele frequencies in a population (e.g. Yang et al., 2015; Gazal et al., 2018, PMID 30297966; Wainschtein et al., 2019). How the robustness of estimators obtained from these procedures is affected by the presence of genetic linkage, assumptions about the distributions of causal variant effect sizes, and the relationship between effect size and allele frequency is an active area of research. (e.g. Speed et al., 2017, PMID 28530675) The reviewer is proposing a new forward stepwise variance component analysis which to our knowledge has not been reported before in the literature and which poses its own issues of implementation and interpretation that are beyond the scope of our paper. We agree that this is an interesting idea, and we hope that by making our dataset available, we can stimulate the development of this and other new methods.

With regard to the comparison between the estimates of the contribution of rare alleles from the two-component allele frequency model (light blue bar in Figure 3—figure supplement 1A) and the 7-component model (Figure 3—figure supplement 1B), one can see that for nearly all traits, the estimate of variance explained is very similar, with the exception of cadmium chloride pointed out by the reviewer. We note that cadmium chloride is exceptional among the traits, with nearly all the additive heritability explained by a single locus near the gene PCA1, and that the patterns of segregation in different crosses are consistent with allelic heterogeneity at this locus. Contributions of QTLs with large effects are often poorly modeled with whole-genome variance component approaches, and we believe that this accounts for the discrepancy noted by the reviewer.

We further note that the known limitations of variance component analyses were a primary motivation for our study, and that in subsequent sections we also analyzed our dataset using fixed effects models based on detected QTLs. Our study design is highly-powered to detect QTL effects that jointly account for most of the heritable variance, enabling these analyses for the first time. As we show in Figure 3B, Figure 3C, Figure 3—figure supplement 2, and Figure 3—figure supplement 5, the fixed effects models lead to conclusions similar to those obtained from the variance component models.

4) In Figure 3, the relationship between effects and MAF or DAF is a major result. Although many other papers have reported similar results, I think this paper (and the co-submitted paper from Fournier et al.) has the most appropriate design, i.e., the discovery panel is independent of where MAF is estimated. Given its central role in this paper, it probably deserves a bit more clarity. A few questions came to mind when reading this part of the paper. In what analysis is the effect size estimated, single crosses or joint? Could you briefly explain how the effect sizes are estimated in the Results section? If effects estimated from joint analysis, the t-statistic used a factor (n-2) to normalize the degree of freedom, which is smaller for rare variants. This would lead to Beavis effect. I believe the authors used a cross validation strategy to estimate effects, but it’s not very clear by reading the Materials and methods. Can you also plot 2pqa^2 versus MAF? Even if a is large, the variance contributed by rare variants could be small.

We appreciate the reviewer’s positive comments regarding our study design, which decouples allele frequencies in the population from allele frequencies in the mapping panel, thereby allowing us to obtain estimates of effect sizes of rare variants without the typical complications one encounters in GWAS designs regarding sample size and confounding. We welcome the opportunity to clarify the details here, in the revised main text, and in the Materials and methods. Briefly, QTL peak markers are detected in the joint analysis for each trait. Then, for each trait and cross, the phenotypes are scaled to have mean 0 and variance 1, and effect sizes within each cross are estimated using multiple regression for the peak markers that segregate within that cross. The betas in this analysis correspond to the differences in the means between the two QTL alleles. For peak markers that segregate in multiple crosses, the average betas over the different crosses are shown in Figure 3. This is now described in greater detail in the Materials and methods subsection “Effect size estimation for joint QTL mapping”.

The reviewer correctly points out that because we perform model selection and parameter estimation on the same data set, parameter estimates may be upwardly biased (this is known as the Beavis effect). We note that we carried out simulation analyses (Figure 3—figure supplement 2) which indicated that while some estimate inflation is present, it does not qualitatively alter the results in Figure 3. To further address the reviewer’s concern, we have now calculated unbiased estimates of effect sizes by training a model on 9/10 of the data and estimating parameters on the 1/10 of the data held out from the training procedure. The results are shown in a new supplementary figure (Figure 3—figure supplement 4) and are very similar to Figure 3, but estimates are noisier due to the smaller sample size available for unbiased estimation in this procedure. This is now described in the aforementioned Materials and methods subsection.

We believe that it is important to proceed carefully when reporting and interpreting the relationships between allelic effect sizes, variance explained, and allele frequencies for individual QTL effects. We have modified our Introduction to give additional background as to the relevant issues. Figure 3—figure supplement 5 shows the cumulative fraction of variance explained in our mapping panel for each trait by the detected QTLs. In this calculation of variance explained, we used the allele frequency of the QTL peak marker in our mapping population. Were we to instead calculate variance explained in the larger panel of yeast isolates (Peter et al., 2018) using the allele frequencies of variants in that panel, but effect sizes estimated in our mapping population, the variance contributed by rare variants would necessarily be small because the study design severely undersamples variants that are rare and ultra-rare in this larger population (our mapping panel consists of individuals derived from only 16 of 1012 strains). We are concerned that presenting such results would be actively misleading and confusing. We believe that the results from the variance components analysis and those shown in Figure 3—figure supplement 5 would be recapitulated in the larger yeast population if we could detect and estimate the effects of all the variants present in that population, rather than the small fraction that segregates in our crosses.

5) The main conclusion of the manuscript is that rare variants significantly contribute to genetic variance. In my view, this conclusion is biased as these rare causal variants are being analysed in genetic backgrounds in which they are no longer rare; actually, these variants are biallelic. Several studies have shown that a rare variant of MKT1(89A) is a significant contributor to phenotypic variation whenever it is present in segregating populations. However, MKT1(89A) allele hardly identified when one of the parents is not S288c, the strain which harbours this allele. So, the extension that if the rare variant has a significant effect in a sub-population, its effect size would be similar in a large heterogeneous population is false. Furthermore, the authors conclude that in their larger 16 strain segregant populations, a representative distribution of 1000 strain collection, most of the variants have additive effects. This the authors claim is revalidation of their other previous studies (Bloom et al., 2013, 2015), where they identified that most of the causal variants between BYxRM had additive effects. However, in their subsequent paper (Frosberg et al., 2017, PMID 28250458) and another paper (Yadav et al., 2016, PMID 28172852) showed that variance mapping in BYxRM segregants helped to account for genetic interactions and showed how non-additive interactions also contribute significantly to phenotypic variation. Therefore, I find that just doing a few more strains and larger no. of segregants per cross does not make this manuscript a significant advance over the previous studies. One can argue that taking into account all causal variants identified to date (Fay, 2013), one can identify what frequency of rare variants have been identified, e.g. a typical example being MKT1(89A) allele as causal, even though their effect size will not be identified using this strategy. Peltier et al., 2019, show that 284 rare QTNs variants have been identified to date and these functional variants being private to a subpopulation, possibly due to their adaptive role to a specific environment. Moreover, this conclusion can be made without these extensive experimental crosses.

The summary and the previous reviewer comment underscore the novel contributions of our paper, including the ability to address the empirical contribution of rare variants to trait variation in a more comprehensive manner than has previously been possible, and the decoupling of allele frequency from variant discovery. We have taken the opportunity, as requested in the summary, to significantly expand the Introduction to make these and other key points clearer to the non-specialist. We agree that theoretical approaches in evolutionary, population and quantitative trait genetics have been applied to predict the relative contributions of common and rare variants to trait variation under different sets of assumptions; indeed, we cite many relevant papers in our manuscript. We also agree that there is value in aggregating information from individual case reports in the literature – we cited Fay, 2013 in our manuscript, and we noted that the large set of candidate QTGs systematically identified in our study is enriched for QTGs reported in that study. The variants reported in Fay, 2013 represent a sparse sampling of variant effects in yeast, and were by necessity based on studies with small sample sizes, which biased this set of variants in favor of large effects. The review by Peltier et al., 2019 (published after the submission of our manuscript) is similarly based on genes and variants previously reported in the literature. We have added a citation to this paper. We believe that our systematic, comprehensive, empirical approach provides much more general insights into the relative contributions of variants of different frequencies.

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Citations

Bloom JS, Boocock J, Treusch S, Sadhu MJ, Day L, Oates-Barker H, Kruglyak L. 2019. Rare variants contribute disproportionately to quantitative trait variation in yeast, Jun 19 '19. SRA Bioproject. PRJNA549760 [DOI] [PMC free article] [PubMed]

Supplementary Materials

Figure 1—source data 1. Additional information on yeast crosses and phenotypes.

Strain information for the 16 haploid parents and 16 F1 hybrids between them is listed. Additional information about the conditions tested is indicated.

elife-49212-fig1-data1.xls^{(31.5KB, xls)}

DOI: 10.7554/eLife.49212.003

Figure 2—source data 1. Total variance explained by QTLs and within-cross variance component analyses.

Results from within-cross variance components models and total variance explained by the QTL models are listed.

elife-49212-fig2-data1.xls^{(798.5KB, xls)}

DOI: 10.7554/eLife.49212.005

Figure 3—source data 1. Detected QTL.

QTL mapping results are listed for both the within-cross and the joint analysis.

elife-49212-fig3-data1.xls^{(5.7MB, xls)}

DOI: 10.7554/eLife.49212.012

Figure 3—source data 2. Joint variance component estimates.

Results for the joint variance component models are given. This includes results for a model with two allele frequency bins (Figure 3A, figure supplement 1A), seven allele frequency bins (Figure 3—figure supplement 1B), and seven allele frequency bins using only variants that are private to each of the 16 parents (Figure 3—figure supplement 1C).

elife-49212-fig3-data2.xls^{(98KB, xls)}

DOI: 10.7554/eLife.49212.013

Figure 4—source data 1. Candidate causal genes and GO enrichments.

Candidate causal genes per QTL are listed. GO enrichments for causal genes are listed.

elife-49212-fig4-data1.xls^{(1.5MB, xls)}

DOI: 10.7554/eLife.49212.015

Supplementary file 1. Key resources table.

elife-49212-supp1.xlsx^{(19.6KB, xlsx)}

DOI: 10.7554/eLife.49212.016

Supplementary file 2. Barplots depicting results from within-cross variance component analyses.

elife-49212-supp2.pdf^{(125.4KB, pdf)}

DOI: 10.7554/eLife.49212.017

Supplementary file 3. QTL mapping results for each trait and cross.

elife-49212-supp3.pdf^{(464KB, pdf)}

DOI: 10.7554/eLife.49212.018

Transparent reporting form

elife-49212-transrepform.docx^{(246.5KB, docx)}

DOI: 10.7554/eLife.49212.019

Data Availability Statement

Unless otherwise specified, all computational analyses were performed in R (v3.4.4). Analysis code and processing scripts are available at https://github.com/joshsbloom/yeast-16-parents (copy archived at https://github.com/elifesciences-publications/yeast-16-parents). Additional links to generated data are also provided in the github repository. The version numbers of R packages used are listed in this repository. Sequencing data has been deposited in the SRA under the accession code PRJNA549760.

The following dataset was generated:

[bib1] Albert FW, Bloom JS, Siegel J, Day L, Kruglyak L. Genetics of trans-regulatory variation in gene expression. eLife. 2018;7:e35471. doi: 10.7554/eLife.35471. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib2] Alexa A, Rahnenfuhrer J. topGO: enrichment analysis for gene ontology 2018

[bib3] Arends D, Prins P, Jansen RC, Broman KW. R/qtl: high-throughput multiple QTL mapping. Bioinformatics. 2010;26:2990–2992. doi: 10.1093/bioinformatics/btq565. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib4] Aronesty E. Comparison of sequencing utility programs. The Open Bioinformatics Journal. 2013;7:1–8. doi: 10.2174/1875036201307010001. [DOI] [Google Scholar]

[bib5] Bloom JS, Ehrenreich IM, Loo WT, Lite TL, Kruglyak L. Finding the sources of missing heritability in a yeast cross. Nature. 2013;494:234–237. doi: 10.1038/nature11867. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib6] Bloom JS, Kotenko I, Sadhu MJ, Treusch S, Albert FW, Kruglyak L. Genetic interactions contribute less than additive effects to quantitative trait variation in yeast. Nature Communications. 2015;6:ncomms9712. doi: 10.1038/ncomms9712. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib7] Bloom JS. yeast-16-parents. c913c9aGithub. 2019 https://github.com/joshsbloom/yeast-16-parents

[bib8] Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30:2114–2120. doi: 10.1093/bioinformatics/btu170. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib9] Boocock J. Github; 2019. https://github.com/theboocock/long_read_cnv [Google Scholar]

[bib10] Broad Institute Picard Tools. 2019 http://broadinstitute.github.io/picard/

[bib11] Chantranupong L, Wolfson RL, Sabatini DM. Nutrient-Sensing mechanisms across evolution. Cell. 2015;161:67–83. doi: 10.1016/j.cell.2015.02.041. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib12] Churchill GA, Doerge RW. Empirical threshold values for quantitative trait mapping. Genetics. 1994;138:963–971. doi: 10.1093/genetics/138.3.963. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib13] Clifford D, McCullagh P. The regress package. 2014 https://cran.r-project.org/web/packages/regress/regress.pdf

[bib14] Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, Handsaker RE, Lunter G, Marth GT, Sherry ST, McVean G, Durbin R, 1000 Genomes Project Analysis Group The variant call format and VCFtools. Bioinformatics. 2011;27:2156–2158. doi: 10.1093/bioinformatics/btr330. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib15] de los Campos G, Sorensen D, Gianola D. Genomic Heritability: What Is It? PLOS Genetics. 2015;11:e1005048. doi: 10.1371/journal.pgen.1005048. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib16] Ebler J, Schönhuth A, Marschall T. Genotyping inversions and tandem duplications. Bioinformatics. 2017;33:4015–4023. doi: 10.1093/bioinformatics/btx020. [DOI] [PubMed] [Google Scholar]

[bib17] Ehrenreich IM, Bloom J, Torabi N, Wang X, Jia Y, Kruglyak L. Genetic Architecture of Highly Complex Chemical Resistance Traits across Four Yeast Strains. PLOS Genetics. 2012;8:e1002570. doi: 10.1371/journal.pgen.1002570. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib18] Engel SR, Dietrich FS, Fisk DG, Binkley G, Balakrishnan R, Costanzo MC, Dwight SS, Hitz BC, Karra K, Nash RS, Weng S, Wong ED, Lloyd P, Skrzypek MS, Miyasato SR, Simison M, Cherry JM. The reference genome sequence of Saccharomyces cerevisiae : then and now. G3: Genes, Genomes, Genetics. 2014;4:389–398. doi: 10.1534/g3.113.008995. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib19] Exome Aggregation Consortium. Lek M, Karczewski KJ, Minikel EV, Samocha KE, Banks E, Fennell T, O'Donnell-Luria AH, Ware JS, Hill AJ, Cummings BB, Tukiainen T, Birnbaum DP, Kosmicki JA, Duncan LE, Estrada K, Zhao F, Zou J, Pierce-Hoffman E, Berghout J, Cooper DN, Deflaux N, DePristo M, Do R, Flannick J, Fromer M, Gauthier L, Goldstein J, Gupta N, Howrigan D, Kiezun A, Kurki MI, Moonshine AL, Natarajan P, Orozco L, Peloso GM, Poplin R, Rivas MA, Ruano-Rubio V, Rose SA, Ruderfer DM, Shakir K, Stenson PD, Stevens C, Thomas BP, Tiao G, Tusie-Luna MT, Weisburd B, Won HH, Yu D, Altshuler DM, Ardissino D, Boehnke M, Danesh J, Donnelly S, Elosua R, Florez JC, Gabriel SB, Getz G, Glatt SJ, Hultman CM, Kathiresan S, Laakso M, McCarroll S, McCarthy MI, McGovern D, McPherson R, Neale BM, Palotie A, Purcell SM, Saleheen D, Scharf JM, Sklar P, Sullivan PF, Tuomilehto J, Tsuang MT, Watkins HC, Wilson JG, Daly MJ, MacArthur DG. Analysis of protein-coding genetic variation in 60,706 humans. Nature. 2016;536:285–291. doi: 10.1038/nature19057. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib20] Eyre-Walker A. Genetic architecture of a complex trait and its implications for fitness and genome-wide association studies. PNAS. 2010;107:1752–1756. doi: 10.1073/pnas.0906182107. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib21] Farh KK-H, Marson A, Zhu J, Kleinewietfeld M, Housley WJ, Beik S, Shoresh N, Whitton H, Ryan RJH, Shishkin AA, Hatan M, Carrasco-Alfonso MJ, Mayer D, Luckey CJ, Patsopoulos NA, De Jager PL, Kuchroo VK, Epstein CB, Daly MJ, Hafler DA, Bernstein BE. Genetic and epigenetic fine mapping of causal autoimmune disease variants. Nature. 2015;518:337–343. doi: 10.1038/nature13835. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib22] Fay JC. The molecular basis of phenotypic variation in yeast. Current Opinion in Genetics & Development. 2013;23:672–677. doi: 10.1016/j.gde.2013.10.005. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib23] Forni S, Aguilar I, Misztal I. Different genomic relationship matrices for single-step analysis using phenotypic, pedigree and genomic information. Genetics Selection Evolution. 2011;43:1. doi: 10.1186/1297-9686-43-1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib24] Fournier T, Abou Saada O, Hou J, Peter J, Caudal E, Schacherer J. Extensive impact of low-frequency variants on the phenotypic landscape at population-scale. eLife. 2019;8:e49258. doi: 10.7554/eLife.49258. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib25] Ganna A, Satterstrom FK, Zekavat SM, Das I, Kurki MI, Churchhouse C, Alfoldi J, Martin AR, Havulinna AS, Byrnes A, Thompson WK, Nielsen PR, Karczewski KJ, Saarentaus E, Rivas MA, Gupta N, Pietiläinen O, Emdin CA, Lescai F, Bybjerg-Grauholm J, Flannick J, Mercader JM, Udler M, Laakso M, Salomaa V, Hultman C, Ripatti S, Hämäläinen E, Moilanen JS, Körkkö J, Kuismin O, Nordentoft M, Hougaard DM, Mors O, Werge T, Mortensen PB, MacArthur D, Daly MJ, Sullivan PF, Locke AE, Palotie A, Børglum AD, Kathiresan S, Neale BM, GoT2D/T2D-GENES Consortium. SIGMA Consortium Helmsley IBD Exome Sequencing Project. FinMetSeq Consortium. iPSYCH-Broad Consortium Quantifying the impact of rare and Ultra-rare coding variation across the phenotypic spectrum. The American Journal of Human Genetics. 2018;102:1204–1211. doi: 10.1016/j.ajhg.2018.05.002. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib26] Gibson G. Rare and common variants: twenty arguments. Nature Reviews Genetics. 2012;13:135–145. doi: 10.1038/nrg3118. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib27] Goldstein DB, Allen A, Keebler J, Margulies EH, Petrou S, Petrovski S, Sunyaev S. Sequencing studies in human genetics: design and interpretation. Nature Reviews Genetics. 2013;14:460–470. doi: 10.1038/nrg3455. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib28] G’Sell MG, Wager S, Chouldechova A, Tibshirani R. Sequential selection procedures and false discovery rate control. arXiv. 2013 https://arxiv.org/abs/1309.5352

[bib29] Jerison ER, Kryazhimskiy S, Mitchell JK, Bloom JS, Kruglyak L, Desai MM. Genetic variation in adaptability and pleiotropy in budding yeast. eLife. 2017;6:e27167. doi: 10.7554/eLife.27167. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib30] Käll L, Storey JD, MacCoss MJ, Noble WS. Posterior error probabilities and false discovery rates: two sides of the same coin. Journal of Proteome Research. 2008;7:40–44. doi: 10.1021/pr700739d. [DOI] [PubMed] [Google Scholar]

[bib31] Kang HM, Sul JH, Service SK, Zaitlen NA, Kong SY, Freimer NB, Sabatti C, Eskin E. Variance component model to account for sample structure in genome-wide association studies. Nature Genetics. 2010;42:348–354. doi: 10.1038/ng.548. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib32] Kryukov GV, Pennacchio LA, Sunyaev SR. Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics. 2007;80:727–739. doi: 10.1086/513473. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib33] Layer RM, Chiang C, Quinlan AR, Hall IM. LUMPY: a probabilistic framework for structural variant discovery. Genome Biology. 2014;15:R84. doi: 10.1186/gb-2014-15-6-r84. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib34] Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, 1000 Genome Project Data Processing Subgroup The sequence alignment/Map format and SAMtools. Bioinformatics. 2009;25:2078–2079. doi: 10.1093/bioinformatics/btp352. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib35] Li H. Aligning sequence reads. clone sequences and assembly contigs with BWA-MEM. arXiv. 2013 https://arxiv.org/abs/1303.3997

[bib36] Mancuso N, Rohland N, Rand KA, Tandon A, Allen A, Quinque D, Mallick S, Li H, Stram A, Sheng X, Kote-Jarai Z, Easton DF, Eeles RA, Le Marchand L, Lubwama A, Stram D, Watya S, Conti DV, Henderson B, Haiman CA, Pasaniuc B, Reich D, PRACTICAL consortium The contribution of rare variation to prostate Cancer heritability. Nature Genetics. 2016;48:30–35. doi: 10.1038/ng.3446. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib37] Marçais G, Delcher AL, Phillippy AM, Coston R, Salzberg SL, Zimin A. MUMmer4: a fast and versatile genome alignment system. PLOS Computational Biology. 2018;14:e1005944. doi: 10.1371/journal.pcbi.1005944. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib39] McArdle BH, Anderson MJ. Fitting multivariate models to community data: a comment on distance-based redundancy analysis. Ecology. 2001;82:290–297. doi: 10.1890/0012-9658(2001)082[0290:FMMTCD]2.0.CO;2. [DOI] [Google Scholar]

[bib40] McMullen MD, Kresovich S, Villeda HS, Bradbury P, Li H, Sun Q, Flint-Garcia S, Thornsberry J, Acharya C, Bottoms C, Brown P, Browne C, Eller M, Guill K, Harjes C, Kroon D, Lepak N, Mitchell SE, Peterson B, Pressoir G, Romero S, Rosas MO, Salvo S, Yates H, Hanson M, Jones E, Smith S, Glaubitz JC, Goodman M, Ware D, Holland JB, Buckler ES. Genetic Properties of the Maize Nested Association Mapping Population. Science. 2009;325:737–740. doi: 10.1126/science.1174320. [DOI] [PubMed] [Google Scholar]

[bib41] Park JH, Gail MH, Weinberg CR, Carroll RJ, Chung CC, Wang Z, Chanock SJ, Fraumeni JF, Chatterjee N. Distribution of allele frequencies and effect sizes and their interrelationships for common genetic susceptibility variants. PNAS. 2011;108:18026–18031. doi: 10.1073/pnas.1114759108. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib42] Pasaniuc B, Price AL. Dissecting the genetics of complex traits using summary association statistics. Nature Reviews Genetics. 2017;18:117–127. doi: 10.1038/nrg.2016.142. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib43] Pau G, Fuchs F, Sklyar O, Boutros M, Huber W. EBImage--an R package for image processing with applications to cellular phenotypes. Bioinformatics. 2010;26:979–981. doi: 10.1093/bioinformatics/btq046. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib44] Peltier E, Friedrich A, Schacherer J, Marullo P. Quantitative trait nucleotides impacting the technological performances of industrial Saccharomyces cerevisiae strains. Frontiers in Genetics. 2019;10:e00683. doi: 10.3389/fgene.2019.00683. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib45] Peter J, De Chiara M, Friedrich A, Yue J-X, Pflieger D, Bergström A, Sigwalt A, Barre B, Freel K, Llored A, Cruaud C, Labadie K, Aury J-M, Istace B, Lebrigand K, Barbry P, Engelen S, Lemainque A, Wincker P, Liti G, Schacherer J. Genome evolution across 1,011 Saccharomyces cerevisiae isolates. Nature. 2018;556:339–344. doi: 10.1038/s41586-018-0030-5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib46] Pritchard JK. Are Rare Variants Responsible for Susceptibility to Complex Diseases? The American Journal of Human Genetics. 2001;69:124–137. doi: 10.1086/321272. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib47] Robinson MR, Wray NR, Visscher PM. Explaining additional genetic variation in complex traits. Trends in Genetics. 2014;30:124–132. doi: 10.1016/j.tig.2014.02.003. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib48] Sadhu MJ, Bloom JS, Day L, Kruglyak L. CRISPR-directed mitotic recombination enables genetic mapping without crosses. Science. 2016;352:1113–1116. doi: 10.1126/science.aaf5124. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib49] Shendure J, Fields S. Massively Parallel Genetics. Genetics. 2016;203:617–619. doi: 10.1534/genetics.115.180562. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib50] Simons YB, Bullaughey K, Hudson RR, Sella G. A population genetic interpretation of GWAS findings for human quantitative traits. PLOS Biology. 2018;16:e2002985. doi: 10.1371/journal.pbio.2002985. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib51] Stich B. Comparison of Mating Designs for Establishing Nested Association Mapping Populations in Maize and Arabidopsis thaliana. Genetics. 2009;183:1525–1534. doi: 10.1534/genetics.109.108449. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib52] Storey JD. The positive false discovery rate: a Bayesian interpretation and the q -value. The Annals of Statistics. 2003;31:2013–2035. doi: 10.1214/aos/1074290335. [DOI] [Google Scholar]

[bib53] Storey JD, Tibshirani R. Statistical significance for genomewide studies. PNAS. 2003;100:9440–9445. doi: 10.1073/pnas.1530509100. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib54] Tisi R, Belotti F, Martegani E. Yeast as a Model for Ras Signalling. In: Trabalzini L, editor. Ras Signaling: Methods and Protocols, Methods in Molecular Biology. Totawa, NJ: Humana Press; 2014. pp. 359–390. [DOI] [PubMed] [Google Scholar]

[bib55] Treusch S, Albert FW, Bloom JS, Kotenko IE, Kruglyak L. Genetic Mapping of MAPK-Mediated Complex Traits Across S. cerevisiae. PLOS Genetics. 2015;11:e1004913. doi: 10.1371/journal.pgen.1004913. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib56] Van der Auwera GA, Carneiro MO, Hartl C, Poplin R, Del Angel G, Levy-Moonshine A, Jordan T, Shakir K, Roazen D, Thibault J, Banks E, Garimella KV, Altshuler D, Gabriel S, DePristo MA. From FastQ data to high confidence variant calls: the genome analysis toolkit best practices pipeline. Current Protocols in Bioinformatics. 2013;43:11.10.1-33. doi: 10.1002/0471250953.bi1110s43. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib57] Visscher PM, Wray NR, Zhang Q, Sklar P, McCarthy MI, Brown MA, Yang J. 10 years of GWAS discovery: biology, function, and translation. The American Journal of Human Genetics. 2017;101:5–22. doi: 10.1016/j.ajhg.2017.06.005. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib58] Wainschtein P, Jain DP, Yengo L, Zheng Z, Cupples LA, Shadyab AH, McKnight B, Shoemaker BM, Mitchell BD, Psaty BM, Kooperberg C, Roden D, Darbar D, Arnett DK, Regan EA, Boerwinkle E, Rotter JI, Allison MA, McDonald M-LN, Chung MK, Smith NL, Ellinor PT, Vasan RS, Mathias RA, Rich SS, Heckbert SR, Redline S, Guo X, Chen Y-DI, Liu C-T, de AM, Yanek LR, Albert CM, Hernandez RD, McGarvey ST, North KE, Lange LA, Weir BS, Laurie CC, Yang J, Visscher PM. Recovery of trait heritability from whole genome sequence data. bioRxiv. 2019 doi: 10.1101/588020. [DOI]

[bib59] Walker BJ, Abeel T, Shea T, Priest M, Abouelliel A, Sakthikumar S, Cuomo CA, Zeng Q, Wortman J, Young SK, Earl AM. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLOS ONE. 2014;9:e112963. doi: 10.1371/journal.pone.0112963. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib60] Wallace JG, Bradbury PJ, Zhang N, Gibon Y, Stitt M, Buckler ES. Association Mapping across Numerous Traits Reveals Patterns of Functional Variation in Maize. PLOS Genetics. 2014;10:e1004845. doi: 10.1371/journal.pgen.1004845. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib61] Wang X, Kruglyak L. Genetic Basis of Haloperidol Resistance in Saccharomyces cerevisiae Is Complex and Dose Dependent. PLOS Genetics. 2014;10:e1004894. doi: 10.1371/journal.pgen.1004894. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib62] Warringer J, Zörgö E, Cubillos FA, Zia A, Gjuvsland A, Simpson JT, Forsmark A, Durbin R, Omholt SW, Louis EJ, Liti G, Moses A, Blomberg A. Trait Variation in Yeast Is Defined by Population History. PLOS Genetics. 2011;7:e1002111. doi: 10.1371/journal.pgen.1002111. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib63] Yang J, Benyamin B, McEvoy BP, Gordon S, Henders AK, Nyholt DR, Madden PA, Heath AC, Martin NG, Montgomery GW, Goddard ME, Visscher PM. Common SNPs explain a large proportion of the heritability for human height. Nature Genetics. 2010;42:565–569. doi: 10.1038/ng.608. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib64] Yang J, Zaitlen NA, Goddard ME, Visscher PM, Price AL. Advantages and pitfalls in the application of mixed-model association methods. Nature Genetics. 2014;46:100–106. doi: 10.1038/ng.2876. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib65] Yang J, Bakshi A, Zhu Z, Hemani G, Vinkhuyzen AAE, Lee SH, Robinson MR, Perry JRB, Nolte IM, van Vliet-Ostaptchouk JV, Snieder H, Esko T, Milani L, Mägi R, Metspalu A, Hamsten A, Magnusson PKE, Pedersen NL, Ingelsson E, Soranzo N, Keller MC, Wray NR, Goddard ME, Visscher PM. Genetic variance estimation with imputed variants finds negligible missing heritability for human height and body mass index. Nature Genetics. 2015;47:1114–1120. doi: 10.1038/ng.3390. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib66] Zuk O, Schaffner SF, Samocha K, Do R, Hechter E, Kathiresan S, Daly MJ, Neale BM, Sunyaev SR, Lander ES. Searching for missing heritability: designing rare variant association studies. PNAS. 2014;111:E455–E464. doi: 10.1073/pnas.1322563111. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Rare variants contribute disproportionately to quantitative trait variation in yeast

Joshua S Bloom

James Boocock

Sebastian Treusch

Meru J Sadhu

Laura Day

Holly Oates-Barker

Leonid Kruglyak

Roles

Abstract

Introduction

Results

Figure 1. Multiparental cross design with 16 diverse progenitor yeast strains.

Figure 2. Most heritable variation is explained by detected QTLs.

Figure 3. Effect size and contribution to trait variation of rare and common variants.

Figure 3—figure supplement 1. Within-cross variance component analysis.

Figure 3—figure supplement 2. MAF vs effect size for two simulated architectures.

Figure 3—figure supplement 3. MAF vs effect size for each trait.

Figure 3—figure supplement 4. MAF vs unbiased estimate of effect size.

Figure 3—figure supplement 5. Cumulative genetic variance explained vs minor allele frequency of lead variants.

Figure 4. QTL fine-mapping at gene-level resolution.

Discussion

Materials and methods

Data availability

Short-read and synthetic long read sequencing of parental strains

Construction of haploid segregant panels

Preparation of whole-genome sequencing libraries for segregants

Segregant genotype calling

Phenotyping by endpoint colony growth

Within-cross QTL mapping

Cross-validation procedures to estimate heritability explained by QTL

Within-cross analysis to estimate additive and pairwise genetic interaction variance

Allele-frequency lookup in 1011 yeast isolate population

Genotype recoding for joint analyses

Mixed model analysis with allele-frequency partitioning

Accounting for large effect QTL and polygenic background for all chromosomes except the chromosome of interest for joint QTL mapping

Joint QTL mapping

Effect size estimation for joint QTL mapping

Statistical fine-mapping to identify causal genes

Gene ontology enrichment analyses

Acknowledgements

Funding Statement

Contributor Information

Funding Information

Additional information

Competing interests

Author contributions

Additional files

Data availability

References

Decision letter

Roles

Author response

Author response image 1.

Associated Data

Data Citations

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases