Skip to main content
eLife logoLink to eLife
. 2018 Dec 6;7:e41038. doi: 10.7554/eLife.41038

Drought adaptation in Arabidopsis thaliana by extensive genetic loss-of-function

J Grey Monroe 1,2,, Tyler Powell 1,3, Nicholas Price 1, Jack L Mullen 1, Anne Howard 1, Kyle Evans 1, John T Lovell 4, John K McKay 1,2
Editors: Daniel J Kliebenstein5, Christian S Hardtke6
PMCID: PMC6326724  PMID: 30520727

Abstract

Interdisciplinary syntheses are needed to scale up discovery of the environmental drivers and molecular basis of adaptation in nature. Here we integrated novel approaches using whole genome sequences, satellite remote sensing, and transgenic experiments to study natural loss-of-function alleles associated with drought histories in wild Arabidopsis thaliana. The genes we identified exhibit population genetic signatures of parallel molecular evolution, selection for loss-of-function, and shared associations with flowering time phenotypes in directions consistent with longstanding adaptive hypotheses seven times more often than expected by chance. We then confirmed predicted phenotypes experimentally in transgenic knockout lines. These findings reveal the importance of drought timing to explain the evolution of alternative drought tolerance strategies and further challenge popular assumptions about the adaptive value of genetic loss-of-function in nature. These results also motivate improved species-wide sequencing efforts to better identify loss-of-function variants and inspire new opportunities for engineering climate resilience in crops.

Research organism: A. thaliana

eLife digest

Water shortages caused by droughts lead to crop losses that affect billions of people around the world each year. By discovering how wild plants adapt to drought, it may be possible to identify traits and genes that help to improve the growth of crop plants when water is scarce. It has been suggested that plants have adapted to droughts by flowering at times of the year when droughts are less likely to occur. For example, if droughts are more likely to happen in spring, the plants may delay flowering until the summer.

Arabidopsis thaliana is a small plant that is found across Eurasia, Africa and North America, including in areas that are prone to drought at different times of the year. Individual plants of the same species may carry different versions of the same gene (known as alleles). Some of these alleles may not work properly and are referred to as loss-of-function alleles. Monroe et al. investigated whether A. thaliana plants carry any loss-of-function alleles that are associated with droughts happening in the spring or summer, and whether they are linked to when those plants will flower.

Monroe et al. analyzed satellite images collected over the last 30 years to measure when droughts have occurred. Next, they searched genome sequences of Arabidopsis thaliana for alleles that might help the plants to adapt to droughts in the spring or summer. Combining the two approaches revealed that loss-of-function alleles associated with spring droughts were strongly predicted to be associated with the plants flowering later in the year. Similarly, loss-of-function alleles associated with summer droughts were predicted to be associated with the plants flowering earlier in the year.

These findings support the idea that plants can adapt to drought by changing when they produce flowers, and suggest that loss-of-function alleles play a major role in this process. New techniques for editing genes mean it is easier than ever to generate new loss-of-function alleles in specific genes. Therefore, the results presented by Monroe et al. may help researchers to develop new varieties of crop plants that are better adapted to droughts.

Introduction

Discovering the environmental drivers and functional genetics of adaptation in nature is a key goal of evolutionary biology and valuable to advance applied genetics in agriculture. Understanding the genetics of drought adaptation in plants is particularly important as crop losses resulting from droughts affect billions of people each year, posing the greatest threat to global food stability. Because droughts also impose strong selection on natural plant populations, investigating drought adaptation in wild species is both useful for addressing fundamental questions of evolutionary biology, such as determining whether adaptation proceeds by few or many alleles, and informative for efforts to reverse engineer drought tolerance in crops (Mickelbart et al., 2015). Such an evolutionary research program is motivated by the need to understand adaptive drought tolerance strategies for different types of drought conditions, which can vary in severity and timing (Tardieu, 2012). Furthermore, previous limitations of single gene approaches have reinforced the necessity of developing methods to identify beneficial alleles at genomic scales and functional molecular resolutions (Dean and Thornton, 2007; Passioura, 2010).

Drought stress can occur throughout the year and drought timing is forecast to change over the next century (Trenberth et al., 2014). While dramatic evolutionary responses to drought events have been documented, (e.g. Franks et al., 2007), little is known about the relationship between drought timing and adaptation. However, the observation both in nature and agriculture that plants are particularly susceptible to drought while flowering (Nam et al., 2001; Dietrich and Smith, 2016) has contributed to the longstanding hypothesis that adaptive flowering time should reflect patterns in the seasonal timing of drought events (Passioura, 1996). Detailed studies of life history also reveal that locally adapted Arabidopsis thaliana (Arabidopsis hereafter) populations begin flowering in their home environments just prior to and after periods of increased historical drought frequency (Mojica et al., 2016).

Flowering time in Arabidopsis is correlated with other drought tolerance traits such as water use efficiency and can serve as a proxy for alternative drought tolerance strategies, with early flowering genotypes being associated with low water use efficiency (drought escape strategy) and late flowering genotypes with high water use efficiency (dehydration avoidance strategy) (McKay et al., 2003; Lovell et al., 2013; Kenney et al., 2014). Thus, the historical timing of drought experienced by locally adapted populations may explain the evolution of these strategies and the distribution of alleles responsible for natural flowering time variation. This hypothesis motivated our investigation to identify alleles associated with drought timing and test the prediction that they contribute to adaptive flowering time evolution.

Identifying functionally relevant genetic variation contributing to adaptation is needed to understand fundamental evolutionary processes. In contrast to early theoretical predictions and popular assumptions, loss-of-function (LoF) alleles, those that eliminate or ‘knockout’ a gene’s molecular function, are overrepresented among alleles reported as responsible for crop improvement and often produce adaptive phenotypes in wild species (Hoekstra et al., 2006; Rausher, 2008; Olsen and Wendel, 2013; Alonso-Blanco and Méndez-Vigo, 2014; Weigel and Nordborg, 2015b; Torkamaneh et al., 2018). Indeed, a number of individual genes exhibiting evidence of locally adaptive loss-of-function have been documented in Arabidopsis (Grant et al., 1998; Johanson et al., 2000; Kliebenstein, 2001; Kroymann et al., 2003; Mouchel et al., 2004Aukerman, 1997; Hauser et al., 2001; Mauricio et al., 2003; Alonso-Blanco et al., 2005; Werner et al., 2005; Barboza et al., 2013; Xiang et al., 2014).

Discovering adaptive LoF alleles is particularly valuable for inspiring targeted molecular breeding because functionally similar mutations can be mined from the breeding pool or generated directly by non-transgenic native gene editing. Unfortunately, traditional genome-wide association scans based on the one-locus two-allele model perform poorly at detecting adaptive LoF alleles, which because of the large number of mutations that can create them, are likely to arise through parallel molecular evolution (Pennings and Hermisson, 2006; Barboza et al., 2013; Kerdaffrec et al., 2016). Species-wide whole genome sequences however, present the opportunity to advance beyond previous mapping and scanning methods that relied on linked polymorphisms by instead characterizing and contrasting functionally defined alleles.

Here, we combined long-term satellite-detected drought histories, whole genome sequence scans based on allele function, and transgenic knockout experiments in Arabidopsis to test historical predictions about how drought timing shapes the evolution of flowering time and outline a broadly scalable approach for discovering loss-of-function gene variants contributing to plant climate adaptation.

Results and discussion

To study global seasonal drought timing, satellite-detected measurements offer a valuable historical record. One such measurement, the Vegetative Health Index (VHI) has been used for decades to monitor drought, including in many places across the natural range of Arabidopsis (Kogan, 1997). Though primarily used as a tool to predict crop productivity, by quantifying drought induced vegetative stress this index also provides a resource for evolutionary ecologists to study seasonal patterns in drought-related episodes of natural selection. We analyzed 34 years of VHI data to characterize drought regimens at the home environments of Arabidopsis ecotypes (Figure 1, Supplementary file 1). We found that drought frequency during the spring (ß = 50.016, p < 2×10−16) and summer (ß = −28.035, p = 4.4×10−7) significantly predict flowering time among Arabidopsis ecotypes (Supplementary file 2A). We then generated a drought-timing index that quantifies the relative frequency of drought between spring and summer over the typical reproductive growing season and observed substantial differences in drought timing experienced by ecotypes (Figure 1—figure supplement 1). This environmental variation presented a useful cline to address classical hypotheses about the evolution of flowering time in relation to drought timing and identify LoF alleles potentially contributing to this evolution.

Figure 1. Seasonal drought timing varies across the Arabidopsis species range.

(A) Examples of home environments for two well-studied Arabidopsis ecotypes (Mojica et al., 2016) from Italy and Sweden, left and right plots respectively, showing historical drought conditions detected using the VHI and (B) drought frequency (VHI <40, NOAA drought classification) by week (line) and season (bars). Arrows mark locally observed flowering dates (Mojica et al., 2016) and gray bars highlight the typical reproductive growing season used to quantify a drought-timing index. (C) Variation in historical drought timing experienced at the home environments of Arabidopsis ecotypes across the species range (figure supplement). Large values indicate environments where spring droughts occur more frequently than summer drought (i.e. where the frequency of drought decreases over the course of the typical reproductive growing season) and vice versa.

Figure 1.

Figure 1—figure supplement 1. Arabidopsis ecotypes are distributed across satellite-detected drought timing gradients.

Figure 1—figure supplement 1.

Historical patterns in drought conditions were calculated from the Vegetative Health Index (VHI, Figure 1A) and converted into a drought-timing index (Figure 1B and C). Large values of this index indicate environments where spring droughts occur more frequently than summer drought (i.e. where the frequency of drought decreases over the course of the reproductive growing season) and vice versa (seasonal drought frequency map data available at greymonroe.github.io/data).

To identify candidate LoF alleles underlying drought adaptation and flowering time evolution, we analyzed whole genome sequences in Arabidopsis. We first surveyed the genomes of 1135 ecotypes (1001 Genomes Consortium, 2016) for LoF alleles in protein coding genes predicted to encode truncated amino acid sequences (Supplementary file 3A). To overcome the likely parallel evolutionary origins of LoF alleles that would have challenged previous methods, we classified alleles based functional allele state rather than individual polymorphisms for association testing. After filtering to reduce the likelihood of false positives (see materials and methods), we thus tested 2088 genes for LoF allele associations with drought timing (Figure 2A) and flowering time (Figure 2B). These analyses identified 247 genes in which LoF alleles are significantly associated with drought timing and/or flowering time after accounting for population structure and multiple testing (Supplementary file 3B). In contrast, when we performed these analyses on a permuted LoF genotype matrix, we found no genes that were significantly associated with drought timing or flowering time (Figure 1—figure supplement 1).

Figure 2. LoF alleles share associations between drought timing and flowering time, exhibit evidence of positive selection.

(A) Visualization of the frequency of LoF alleles across environments in genes associated to summer (upper) or spring drought environments (lower). Darker lines indicate the mean across genes. (B) Contrasting flowering times between ecotypes with functional versus LoF alleles in genes associated with earlier (upper) or later (lower) flowering time phenotypes. (C) Overlap and relationships between the strength of LoF allele associations in genes associated with summer drought and earlier flowering, and (D) spring drought and later flowering. (E) Increased frequencies of independent LoF alleles in genes associated with drought timing and/or flowering time compared to genes without detected associations (t-test, p = 3.4 × 10−7), a signature of recurrent mutation accompanied by positive selection (Pennings and Hermisson, 2006).

Figure 2.

Figure 2—figure supplement 1. P values of LoF allele associations.

Figure 2—figure supplement 1.

Observed vs. expected P values, created using GWASTools in R (Gogarten et al., 2012), for associations between drought timing and (A) LoF alleles observed in Arabidopsis ecotypes and (B) randomized LoF genotypes with the same allele frequencies. Observed vs. expected p values for associations between flowering time and (C) LoF alleles observed in Arabidopsis ecotypes and (D) randomized LoF genotypes with the same allele frequencies. Relationship between LoF allele associations with drought timing and flowering time for (E) actual (C ~ A, r2 = 0.48) and (F) randomized genes (D ~ B, r2 = 0.01). The P values shown have not yet been corrected for multiple testing and are log10 transformed. Red lines in A-D represent y = x line.

Figure 2—figure supplement 2. Signatures of selection on LoF genes identified differ from null expectations.

Figure 2—figure supplement 2.

(A) Contrasts (t-test, α = 0.05) between genes identified with LoF alleles associated to drought timing and/or flowering time (colors correspond to Figure 2C and D, boxplots visualized at ±1.5 times the data interquartile range) and the genomic background (light gray), as well as genes having LoF alleles but without observed associations (dark gray) for the ratio of non-synonymous (PN) and synonymous polymorphisms (PS) among A. thaliana ecotypes and (B) the ratio of non-synonymous (DN) and synonymous divergence (DS) from A. lyrata. (C) Contrasts (t-test, α = 0.05) between (log10) global frequency of LoF alleles in genes identified with LoF alleles associated to drought timing and/or flowering time and genes with LoF alleles but without observed associations for the global frequency of LoF alleles and (D) the number of (log10) unique LoF alleles. The corresponding average frequencies of unique LoF alleles for genes are shown in Figure 2E.

Figure 2—figure supplement 3. LoF alleles are not broadly overabundant in Arabidopsis ecotypes originating from spring drought environments or flowering later.

Figure 2—figure supplement 3.

(A) The frequency of LoF alleles across environments (sliding window plot) in random genes. The darker line indicates the mean across genes. The distribution of LoF alleles in these random genes contrasts with LoF alleles in genes associated to drought timing, which are overwhelming associated to spring drought environments (Figure 2A) (B) Flowering times compared between ecotypes with functional versus LoF alleles in random genes. The phenotypic differences predicted by these random genes contrasts with LoF alleles in those associated to flowering time, which are overwhelming associated to later flowering time (Figure 2B).

It should be noted that the 2088 genes tested for associations to flowering time and drought timing are not a complete representation of LoF alleles in Arabidopsis. In some cases, previously studied LoF alleles did not pass filtering steps (Supplementary file 3D,E). This was primarily because the frequency or quality of LoF allele calls in these genes fell below our filtering requirements (see materials and methods). In other cases, the Col-0 reference genome already has a documented LOF allele. Finally, we expect LoF alleles to be undetectable if they are the product of large insertions or deletions which cannot be properly identified with currently available resequencing data. Thus, while the methods used here are designed to minimize false positives (alleles classified as LoF, but which are actually functional), the likely occurrence of false negatives (undetected LoF alleles) in available data motivates the need for more sophisticated species wide genome sequencing efforts including a greater diversity of de-novo quality genomes for comprehensive detection of functionally relevant genetic variation across the species.

Associations to drought timing predicted associations of LoF alleles to flowering time directly. Together, summer drought and earlier flowering associated genes (Figure 2C), and spring drought and later flowering associated genes (Figure 2D) overlapped seven times more often than expected by chance (χ2=492, p < 2 × 10−16) and no shared associations were observed in the opposite direction. The strengths of the associations between LoF alleles and drought timing (P values) was also strongly correlated with the strengths of the associations to flowering time (r2 = 0.48. Figure 2—figure supplement 1E, Figure 2C,D). This result is comparable to overlapping peaks in a ‘Manhattan plot’ generated from a traditional genome wide association scan (e.g. Bosse et al., 2017). In contrast, these associations were weakly correlated when genotypes were permuted (r2 = 0.01 Figure 2—figure supplement 1F), indicating that the result is not simply explained as an artifact of allele frequencies or by the relationship between drought timing and flowering time (i.e. Supplementary file 1A). Thus, satellite-detected drought histories and a functional genome-wide scanning approach prove useful for predicting the direction and molecular targets of phenotypic evolution. Similar investigations with ecologically meaningful environmental variation could be valuable for discovering candidates underlying other important traits that are especially difficult to measure.

These results further support the classical hypothesis that the relationship between phenology and drought timing is the most important feature of plant drought tolerance (Passioura, 1996), indicating the evolution of ‘drought escape’ through earlier flowering in summer drought environments, and ‘dehydration avoidance’ by later flowering genotypes in spring drought environments. Because most Arabidopsis populations appear to exhibit a winter annual life habit, germinating in the fall and overwintering as a rosette (Ratcliffe, 1961; Thompson, 1994; Burghardt et al., 2015), late flowering genotypes in spring drought environments are expected to still encounter drought conditions. However, delayed flowering may ensure that droughts co-occur with vegetative growth rather than during the drought sensitive reproductive phase. This pattern is also consistent with hypotheses explaining the more water conservative water use and stomatal traits observed in late flowering genotypes (McKay et al., 2003; Lovell et al., 2013; Kenney et al., 2014; Kooyers, 2015) and those from spring drought environments (Dittberner et al., 2018). Future experimental work will be valuable to identify other plant physiological traits affected by the LoF alleles associated with drought timing.

These results provide new insight into the ecology and genetics of Arabidopsis life history evolution, but the complex ecological reality of these processes is undoubtedly beyond the scope of this study. We found that drought timing remains a significant predictor of allele associations to flowering time when controlling for allele associations with latitude and minimum temperature (slope estimate in multiple linear regression, p < 2×10−16, Supplementary file 2B). However, other unknown climatic variables or environmental interactions and non-linearities likely contribute to the flowering time adaptation as well. Flowering time is only one component of phenology and other adaptive life history transitions such a germination timing (Donohue, 2002) may also be influenced by drought timing and could change how drought timing affects the evolution of flowering time, a hypothesis that warrants further investigation. Furthermore, measuring flowering time in other environments, such alternate light regimes, may yield a different set of candidate genes using similar approaches.

Signatures of selection in the genes identified differ from the genome average and neutral expectations. As expected for genes harboring LoF alleles, these show parallel evolution of LoF and accelerated amino acid sequence evolution among Arabidopsis ecotypes (Figure 2—figure supplement 2A,B, Supplementary file 2C). We also found evidence of positive selection for LoF alleles in genes associated with drought timing and/or flowering time. While these genes have similar global frequencies of LoF alleles compared to genes not showing associations with drought timing and/or flowering time (Figure 2—figure supplement 2C), they tend to have significantly fewer unique LoF alleles (Figure 2—figure supplement 2D) and greater frequencies of each independent LoF allele (Figure 2E). This pattern is consistent with theoretical predictions and results from simulations of adaptation by parallel molecular evolution involving recurrent mutation combined with more rapid local fixation of alleles experiencing positive selection (Pennings and Hermisson, 2006). In cases where adaptation proceeds through the fixation of a single adaptive allele, traditional genome scanning approaches may be sufficient to detect causal loci. However, when genetic variation consists of multiple independent alleles, as is often the case for the genes examined here (Figure 2—figure supplement 2D), classifying alleles functionally before testing for associations is likely necessary.

The extent of LoF responsible for adaptive phenotypic evolution is much greater than once assumed (Smith, 1970; Albalat and Cañestro, 2016). LoF alleles identified were overwhelmingly associated with spring drought or later flowering rather than summer drought or earlier flowering (χ2 = 132, p < 2 × 10−16, Figure 2). Because the reference genome and gene models are from an early flowering Arabidopsis line, Col-0, this is consistent with the hypothesis that LoF alleles are particularly important in the evolution of phenotypic divergence (Rausher, 2008). This result also highlights the need to develop functional genomics resources informed by multiple de-novo quality reference genomes. We found that flowering time is strongly predicted by the accumulation of LoF alleles across the 214 candidate genes associated to spring drought and/or later flowering time (Figure 3A–E), estimating a 1 day increase for every three additional LoF alleles across these candidate genes (Figure 3F). This relationship is best represented as a simple linear regression; the addition of a non-linear quadratic predictor variable did not significantly improve the fit of the model (F = 0.7005, p = 0.4028). Importantly, we did not find a broader overabundance of LoF alleles in later flowering ecotypes or those from spring drought environments that would explain this relationship (e.g. Figure 2—figure supplement 3). Rather, these findings support a model of climate-associated evolution in complex traits that includes a substantial contribution from widespread genetic LoF and give promise to targeted LoF for directed phenotypic engineering.

Figure 3. Widespread LoF contributing to later flowering time evolution.

Figure 3.

(A) Genomic map of 214 candidate genes with associations between LoF alleles and spring drought environments and/or later flowering time phenotypes. (B–E) Examples of the geography and flowering times among Arabidopsis ecotypes of LoF alleles in candidate genes including; (B) a previously unstudied rhamnogalacturonate lyase, (C) a cyclin linked to later flowering in prior knockout experiments (Cui et al., 2007), (D) members of the drought-responsive Nramp2 (Qin et al., 2017) (E) and RmlC-like cupin (Aghdasi et al., 2012) protein families. (F) Later flowering time in ecotypes predicted by the accumulation of LoF alleles across all candidate genes. The line shows the best fitting model. Color scale of points reflects proportion of total LoF in ecotypes that are candidate genes (darker points = greater proportion) (G) Experimental validation of hypothesized later flowering time in T-DNA knockout lines of candidate genes compared to the wild type genotype.

Experimental knockout lines confirmed the later flowering times predicted from natural allele associations. To test phenotypic effects, we screened a panel of confirmed T-DNA insertion mutants representing a sample of candidate LoF alleles associated with spring drought and/or later flowering. As predicted by variation among Arabidopsis ecotypes (Figure 2D), the vast majority of knockout lines in these candidate genes (57 of 59, χ2 = 51, p = 8.045e-13) flowered later on average than the wild type genotype (Figure 3G, Supplementary file SF). LoF alleles identified through these analyses and experiments include those previously linked to flowering time (Cui et al., 2007) and drought responses (Aghdasi et al., 2012; Qin et al., 2017). Implementing a functional genome-wide association scan, we find that allele associations with ecologically meaningful environmental variation (drought timing) accurately predict associations with adaptive phenotypes directly (flowering time).

Together with validation in transgenic lines, these findings outline a scalable model for gaining deeper insights into the functional genomics of climate adaptation in nature. Combining large scale knockout experiments with functional genome wide association scans may be a valuable approach for future research to quantify the power to predict LoF allele effects. These results also further challenge historical assumptions about molecular adaptation that have implications for influencing evolutionary theory and public attitudes toward emerging molecular breeding approaches.

Groundbreaking yield increases during the green revolution of the 1960 s were largely attributable to semi-dwarf phenotypes caused by LoF alleles in both rice and barley (Spielmeyer et al., 2002; Jia et al., 2009). Later it was found that natural LoF alleles of the same gene in wild Arabidopsis produce similar phenotypes (Barboza et al., 2013), suggesting the potential to mine ecological species for information directly useful for crop improvement. Visions of a second green revolution powered and informed by such natural variation call for discoveries in evolutionary functional genomics at scales that have now become possible. The genes identified here could inspire future molecular breeding of climate resilient crops and this work more broadly highlights the value of integrating diverse disciplines to scale up the discovery of the climatic drivers of adaptation and functionally significant genetic variation at molecular resolutions.

Materials and methods

Satellite-Detected drought histories of Arabidopsis

To study patterns in historical drought, the remotely sensed Vegetative Health Index (VHI) was used, a satellite-detected drought measurement tool whose advantage is that it includes information about vegetative impacts of drought (Passioura, 1996; AghaKouchak et al., 2015). This index is based on multiple data sources from NOAA satellites, combining deviations from historic climatic (Temperature Condition Index derived from AVHRR-based observations in thermal bands) and vegetative conditions (Vegetative Condition Index derived from NDVI) to detect periods of ecological drought conditions and distinguish between other sources of vegetative stress such as cold (Kogan, 1997; Kogan et al., 2005; Rojas et al., 2011). VHI was collected weekly since 1981 at 16 km2 resolution on a scale from 0 to 100, where values below 40 reflect drought conditions (Kogan, 1997) (Figure 1A). The frequencies of observing drought conditions during photoperiodic spring (quarter surrounding spring equinox), summer (quarter surrounding summer solstice), fall (quarter surrounding fall equinox), and winter (quarter surrounding winter solstice) were calculated globally from 1981 to 2015 (Figure 1B) in R (R Core Development Team, 2017) using the raster package (Hijmans, 2016).

After removing ecotypes with missing location data or locations falling within pixels classified as water, seasonal drought frequencies and drought timing were calculated at the location of origin for 1,097 Arabidopsis ecotypes that were included as part of the 1001 Genomes Project (1001 Genomes Consortium, 2016) (Figure 1C, Supplementary file 1). Up to date global map files of seasonal drought frequency and the drought-timing index used here are available on Dryad and greymonroe.github.io/data alongside a brief tutorial showing how to extract data for points of interest in R. We tested whether seasonal drought frequencies significantly predicted with flowering time (flowering time described in subsequent section regarding LoF associations) by multiple linear regression (Supplementary file 2A)

To characterize the seasonal timing of droughts during an important period of Arabidopsis’ life history, a univariate drought-timing index was generated that quantifies whether the historical frequency of drought increases or decreases over the course of the typical Arabidopsis reproductive growing season (Ratcliffe, 1961; Thompson, 1994; Burghardt et al., 2015). Specifically, this index is equal to the natural log transformed ratio between spring and summer drought frequency. More negative values reflect environments where drought frequency increases from spring to summer and are referred to here as ‘summer drought environments,’ (e.g. Figure 1B left). Conversely, more positive values reflect environments where drought frequency decreases from spring to summer and are referred to here as ‘spring drought environments,’ (e.g. Figure 1B right).

Loss-of-Function (LoF) Alleles in Arabidopsis genomes

To identify functionally definitive gene variants (Hoekstra and Coyne, 2007; Weigel and Nordborg, 2015a; Byers et al., 2017), LoF alleles (Albalat and Cañestro, 2016) were identified from whole genome sequence data of 1,135 Arabidopsis accessions (Olson, 1999; Cutter and Jovelin, 2015; 1001 Genomes Consortium, 2016) using R scripts. First, genes were filtered to those containing at least 5% frequency of predicted frameshift or premature stop mutations and less than 5% missing allele calls from results generated by the 1,001 Genomes Consortium (1001 Genomes Consortium, 2016) using ‘SnpEff’ (Cingolani et al., 2012). To reduce instances where exon skipping might ameliorate LoF mutations (Gan et al., 2011), genes were filtered to those with a single predicted gene model (Lamesch et al., 2012). Additionally, to preclude false LoF calls for cases where compensatory mutations restore gene function or in which an insignificant portion of the final protein product is affected by putative LoF mutations (MacArthur et al., 2012), coding regions were translated into predicted amino acid sequences from which lengths from start to stop codon were calculated in R. LoF alleles were defined as those producing protein products with at least 10% lost because of late start codons and/or prematurely truncated translation. Allelic heterogeneity expected to mask these genes from traditional GWAS (Remington, 2015; Monroe et al., 2016; Flood and Hancock, 2017) was corrected for by classifying all alleles as either functional (0) or non-functional (1). A final frequency filter was re-applied (5% global LoF allele frequency), resulting in 2088 genes for downstream association analyses (Supplementary file 3B). Finally, to compare the results of this pipeline to genes known to harbor natural LoF alleles (Mouchel et al., 2004; Shindo et al., 2008; Gujas et al., 2012; Kliebenstein, 2001; Kroymann et al., 2003; Grant et al., 1998; Tian et al., 2003; Mauricio et al., 2003; Werner et al., 2005; Aukerman, 1997; Flowers et al., 2009; Xiang et al., 2014; Xiang et al., 2016; Amiguet-Vercher et al., 2015; Johanson et al., 2000; Le Corre et al., 2002; McKay et al., 2003; Stinchcombe et al., 2004; Shindo et al., 2005; Flowers et al., 2009; Méndez-Vigo et al., 2011; Lovell et al., 2013; Hauser et al., 2001; Bloomer et al., 2012; Alonso-Blanco et al., 2005; Zhen and Ungerer, 2008; Kang et al., 2013; Monroe et al., 2016; Zhu et al., 2015; Barboza et al., 2013), we manually performed this functional allele calling approach on a set of 16 genes (Supplementary file D,E)

LoF associations to drought timing and flowering time

To identify candidate LoF alleles responsible for climate adaptation and phenotypic evolution, the relationships between functional allele state and drought timing and between functional allele state and flowering time were evaluated for each of the 2088 genes that passed preceding filtering steps. Specifically, the association between functional allele state among Arabidopsis ecotypes and historical drought timing at their locations of origin was tested by logistic regression in a generalized linear model in R (R Core Development Team, 2017). This association study differs from traditional GWAS in several respects. First, because the alleles studied here are functionally defined, they are expected to be more likely to have a phenotypic impact than random SNPs. Second, the scope of our analyses were restricted to a subset of the genome - 2088 genes with high confidence LoF allele calls that passed previous filtering steps, rather than tens of thousands to millions of SNPs. Finally, in contrast to traditional GWAS, which is designed to identify associated chromosomal regions rather than functionally definitive genetic variations, our approach is motivated by the ability to identify alleles at molecular resolutions whose functional relevance can be tested empirically. Thus, the balance of opportunity costs related to trade-offs between false positive and false negative associations that generally challenge GWAS are shifted to reduce false negatives rather than minimizing false positives. For these reasons, we implemented analyses based on (Price et al., 2006) to balance false positives and false negatives. Population structure was accounted for by performing a principal component analysis on the kinship matrix among all ecotypes and including in each model the first three resulting principal components, which explain >75% of variance in relatedness between ecotypes (Price et al., 2006). The P-values (Pdrought timing) of the slope estimates (βdrought timing) for drought timing in these models were adjusted to account for multiple tests by a Bonferroni correction to identify those significantly associated (Supplementary file 3C).

Summer drought genes were identified as those in which LoF alleles are found in ecotypes that experience a significantly (βdrought timing <0 and Pdrought timing <0.05) more negative drought-timing index (summer drought environments where drought frequency increases over the course of the reproductive growing season, Figure 1B left and Figure 2A top). Conversely, spring drought genes were identified as those in which LoF alleles are found in ecotypes that experience a significantly (βdrought timing >0 and Pdrought timing <0.05) more positive drought-timing index (spring drought environments where drought frequency decreases over the course of the reproductive growing season, Figure 1B right and Figure 2A bottom).

The above analytical approach was repeated to test whether functional allele state is associated with the reported common garden flowering times of Arabidopsis ecotypes (Alonso-Blanco and Méndez-Vigo, 2014) (Supplementary file 1). See Alonso-Blanco et al. (Alonso-Blanco and Méndez-Vigo, 2014) for details, but in brief, flowering time was measured in growth chambers at 10°C (considerably less missing data than experiment at 16°C) under 16 hour days. Earlier flowering genes were identified as those in which LoF alleles are found in ecotypes that flower significantly (βflowering time <0 and Pflowering time <0.05) earlier than ecotypes with a functional allele (Figure 2B top). Later flowering genes were identified as those in which LoF alleles are found in ecotypes that flower significantly (βflowering time >0 and Pflowering time <0.05) later than ecotypes with a functional allele (Figure 2B bottom). The preceding analyses revealed considerable overlap between genes associated with both drought timing and flowering time. To assess whether this result was an artifact of the binary LoF allele calls, we randomly permuted the genotype matrix and repeated the analyses described above, testing for significant associations between allele states and drought timing and/or flowering time. Quantile-quantile plots of P values were visualized using qqPlot in the GWASTools package in R (Gogarten et al., 2012) (Figure 2—figure supplement 1A–D)

Overlap between drought timing and flowering time associated genes

To address the longstanding hypothesis that flowering time reflects adaptation to drought timing (Fox, 1990; Passioura, 1996; Kooyers, 2015), and to test the corresponding prediction that alleles associated with drought timing are also associated with flowering time, the groups of genes identified with significant associations to drought timing or flowering time were compared (Figure 2C and D). Deviation from the null hypothesis of independent associations to drought timing and flowering time was evaluated by a chi-squared test (Expected number of co-associated genes = 12, Observed = 83, χ2 = 492, p = 2×10−16).

The magnitude of P-values have historically served as the basis of selecting candidate loci for further examination toward their contribution to environmental adaptation or phenotypic evolution in quantitative trait locus mapping and genome wide association scans [e.g. (Bosse et al., 2017). To test whether associations to environment (drought timing) can be used to identify loci associated with phenotypes (flowering time) directly, the correlation between log transformed P-values describing allele associations with drought timing (Pdrought timing) and with flowering time (Pflowering time) was calculated (Figure 2—figure supplement 1E, r2 = 0.48,) and visualized separately for genes associated to summer drought/earlier flowering (Figure 2C) and to spring drought/later flowering (Figure 2D). To control for the possibility that allele frequencies or the relationship between drought timing and flowering time explained these observations, we also tested whether allele associations were correlated when generated from association analyses using a matrix of randomly permuted genotypes with the same allele frequencies (Figure 2—figure supplement 1F, r2 = 0.01).

Finally, to control for the possibility that correlated LoF allele associations were explained by confounding environmental variables we tested whether the LoF allele associations to drought timing remained predictive while accounting for LoF allele associations with latitude and minimum temperature of the coldest month (Hijmans et al., 2005) using a multiple linear regression in R (Supplementary file 3B). To do so, we repeated the association analyses described in the previous section but instead tested for LoF allele associations with latitude and minimum temperatures. We then included these P values (Supplementary file 2B) in a multiple linear regression where the strength of the association to flowering time was predicted by the associations to drought timing, latitude, and minimum temperature simultaneously.

Signatures of selection

To assess whether histories of selection for genes identified differ from the genome wide expectation, measures of amino acid sequence evolution were evaluated for 122 genes in which loss-of-function is associated with drought timing or flowering time and for which there are orthologs identified between A. lyrata and A. thaliana (Goodstein et al., 2012). For each gene, sequences were aligned using MAFFT (Katoh and Standley, 2013), codons with gaps removed, and the number of non-synonymous and synonymous polymorphisms among A. thaliana accessions (PN and PS) as well as synonymous and non-synonymous divergence (DN and DS) from A. lyrata were measured using mkTest.rb (https://github.com/kern-lab/). The ratios PN/PS and DN/DS were then calculated to measure the proportion of variants predicted to affect amino acid sequences that are segregating among ecotypes and diverged from A. lyrata, respectively. These calculations were also performed for genes not associated to drought timing or flowering time (n = 912) and the remaining genes across the A. thaliana genome (n = 20373) with orthologs between A. lyrata and A. thaliana. To test whether genes identified show evidence of accelerated protein sequence evolution, comparisons were made to genes associated with drought timing or flowering time for both PN/PS (Figure 2—figure supplement 2A) and DN/DS ((Figure 2—figure supplement 2A,B) by two-sided students t-tests (α = 0.05) in R (R Core Development Team, 2017).

Because theory predicts adaptation by loss-of-function to proceed through multiple independent alleles, but to exhibit a fewer number of different alleles than in neutral loci at similar LoF allele frequencies (Pennings and Hermisson, 2006; Ralph and Coop, 2010; Ralph and Coop, 2015), the number of unique LoF alleles was estimated by protein length in the genes that passed preceding filtering steps. To address the hypothesis that genes in which LoF alleles are associated to drought history or flowering time are likely to reflect positive selection compared to genes in which LoF are random with respect to drought history or flowering time, the total number of unique LoF alleles between these groups was compared using a two-sided students t-test (log10 transformed, p = 5.8×10−7, (Figure 2—figure supplement 2D). To control for the possibility that this result in an artifact of reduced frequency of LoF alleles in genes identified, the global frequency of LoF was also compared between these groups (log10 transformed, two-sided students t-test, p = 0.11, (Figure 2—figure supplement 2C). Finally, to further test the prediction that LoF alleles in genes identified have increased in frequency because of more positive selection, the frequency per specific LoF allele was compared between groups (log10 transformed, two-sided students t-test, p = 3.4×10−7, Figure 2E).

Candidate genes contributing to later flowering time by widespread LoF

The significance of the tendency for LoF associations to spring drought/later flowering time (Figure 2D) was tested by chi-squared tests (spring drought vs. summer drought, p < 2×10−16; later vs. earlier flowering, p < 2×10−16, spring drought/later flowering vs. summer drought/earlier flowering, p < 2×10−16). The chromosomal locations of candidate genes (those associated to spring drought/later flowering time) were mapped onto the Arabidopsis genome (Lamesch et al., 2012) (Figure 3A). To address the hypothesis that widespread LoF contributes to later flowering time phenotypes, the total number of LoF in candidate genes for each ecotype was calculated and the correlation between this value and flowering time evaluated (Figure 3F, r2 = 0.39, p < 2×10−16). We also tested whether a model which included a non-linear predictor (squared value of the total number of LoF in candidate genes) was a better fit than the simple linear model by an analysis of variance (F = 0.7005, p = 0.4028).

Experimental testing of predicted phenotypes in gene knockout lines

The preceding analyses provided compelling evidence of LoF in candidate genes as important in the evolution of later flowering time phenotypes. To test the prediction that non-functionalization of these genes causes increased flowering time, phenotypes were measured in transgenic lines in a subsample of candidate genes showing a significant association between loss-of-function and spring drought environments and/or later flowering time. Motivated by the general need to develop a high throughput approach of studying naturally adaptive LoF, knockout lines from the Arabidopsis Biological Resource Center were chosen from a collection created by the SALK Institute in which a T-DNA insertion in an exon of candidate genes has already been identified and confirmed to be homozygous (O'Malley and Ecker, 2010; Rutter et al., 2017). These T-DNA knockout lines were generated by the SALK institute (Supplementary file 3F) and exist in a common genetic background (Columbia) (Alonso et al., 2003). Seeds were planted in 2’ pots containing wet potting soil and stratified for 5 days at 4°C. Seedlings were thinned to a single plant per pot one week after stratification. Plants were grown (59 T-DNA knockout lines, 10 reps of each line and 30 reps Columbia) in a stratified (by shelf), randomized design in growth chambers (Conviron ATC60, Controlled Environments, Winnipeg, MB) under 16 hr of light at 20°C. Flowering time was measured as days after planting to the emergence of the first open flower, based on the definition of flowering time used by the 1,001 Genomes Consortium (1001 Genomes Consortium, 2016). We calculated the least squares mean (lsmean from ‘lsmeans’ package in R) flowering time for each line from a mixed model where shelf and tray were included as random effects (Supplementary file 3F). We tested the prediction that knockout lines would flower later (have higher lsmean flowering time estimates) than the wild type Columbia genotype by a chi-squared test (p = 8.1×10−13).

Acknowledgements

E Buckler, D Des Marais, A Henry, J Lasky, T Mitchell-Olds, J Ross-Ibarra, and D Sloan provided valuable feedback and insightful discussion that improved this work. This study was financially supported by NSF Awards DEB 1022196 and 1556262 to JKM, NSF Award 1701918 and USDA-NIFA Award 2014-38420-21801 to JGM, as well as generous funding from Cargill, Inc. The work conducted by the US Department of Energy Joint Genome Institute, a DOE Office of Science User Facility, is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231. Data used are included in the main text, supplementary materials, and public repositories.

Funding Statement

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Contributor Information

J Grey Monroe, Email: greymonroe@gmail.com.

Daniel J Kliebenstein, University of California, Davis, United States.

Christian S Hardtke, University of Lausanne, Switzerland.

Funding Information

This paper was supported by the following grants:

  • National Science Foundation 1701918 to John Grey Monroe.

  • U.S. Department of Agriculture 2014- 38420-21801 to John Grey Monroe.

  • National Science Foundation IOS-1402393 to John T Lovell.

  • National Science Foundation 1022196 to John K McKay.

  • National Science Foundation 1556262 to John K McKay.

  • Cargill Research support to John K McKay.

Additional information

Competing interests

No competing interests declared.

Author contributions

Conceptualization, Data curation, Formal analysis, Funding acquisition, Validation, Investigation, Visualization, Methodology, Writing—original draft, Project administration, Writing—review and editing.

Validation, Investigation, Methodology, Writing—review and editing.

Data curation, Formal analysis, Methodology, Writing—review and editing.

Supervision, Investigation, Methodology, Project administration, Writing—review and editing.

Supervision, Investigation, Methodology, Project administration, Writing—review and editing.

Investigation, Methodology, Writing—review and editing.

Data curation, Formal analysis, Investigation, Methodology, Writing—review and editing.

Conceptualization, Resources, Supervision, Funding acquisition, Methodology, Project administration, Writing—review and editing.

Additional files

Source data 1. Raw flowering time measurements for of wild-type genomic background and T-DNA knockout lines.
elife-41038-data1.csv (25.2KB, csv)
DOI: 10.7554/eLife.41038.010
Supplementary file 1. Arabidopsis ecotypes examined.

Includes ecotype identifiers as well as latitude and longitude of origin, seasonal drought frequencies (winter, spring, summer, fall), drought timing index (drought_timing), flowering time (FT10), and minimum temperature (BIO6).

elife-41038-supp1.xlsx (158.2KB, xlsx)
DOI: 10.7554/eLife.41038.011
Supplementary file 2. Multiple linear regression model summaries.

(A) Flowering time predicted by seasonal drought frequencies. Arabidopsis common garden flowering times were predicted by historic drought frequencies (DF) during different seasons at ecotypes’ location of origin using multiple linear regression. (B) The strength of association between LoF alleles and flowering time (-log10 transformed P values) predicted by the strength of LoF alleles with drought timing, latitude, and minimum temperature.

elife-41038-supp2.xlsx (10.2KB, xlsx)
DOI: 10.7554/eLife.41038.012
Supplementary file 3. Genes.

(A) Matrix of functional allele calls for 2088 genes among 1135 Arabidopsis ecotypes. LoF alleles are those with less than 90% predicted protein product and are classified with a ‘1’. Function alleles are classified with a ‘0’. (B) Associations between functional allele state and drought timing and flowering time for 2088 genes. Includes gene, estimate for logistic regression model testing the association between functional allele state and drought timing (Drought_timing_B) and flowering time (flw_10_B) after accounting for population structure, and the P-value of these estimates before Bonferroni correction for multiple testing (Drought_timing_p and flw_10_p). These values are also reported for LoF associations with latitude (lat_B, lat_p) and minimum temperature (temp_B, temp_p). (C) Selection statistics for 2088 genes. Includes PN/PS (pnps), DN/DS (dnds), frequency, number of LoF alleles, and average frequency per LoF allele. (D). Survey of sample genes with previously identified LoF alleles. (E) LoF alleles identified in previously studied genes (those surveyed in Table D). (F) Flowering time in T-DNA knockout lines. Flowering time (lsmean and standard error) of wild-type genomic background and T-DNA knockout lines of a sample of candidate genes in which LoF alleles are associated with spring drought environments or later flowering time phenotypes in Arabidopsis ecotypes.

elife-41038-supp3.xlsx (7.5MB, xlsx)
DOI: 10.7554/eLife.41038.013
Transparent reporting form
DOI: 10.7554/eLife.41038.014

Data availability

All data generated or analyzed during this study are included in the manuscript and supporting files.

The following previously published datasets were used:

The 1001 Genomes Consortium. 2016. GMI-MPI Arabidopsis thaliana genomes. 1001 Genomes Data Center. GMI-MPI

Kogan F. 1995. Vegetative Health Index. National Oceanic and Atmospheric Administration. VHI

References

  1. 1001 Genomes Consortium 1,135 Genomes reveal the global pattern of polymorphism in arabidopsis thaliana. Cell. 2016;166:481–491. doi: 10.1016/j.cell.2016.05.063. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. AghaKouchak A, Farahmand A, Melton FS, Teixeira J, Anderson MC, Wardlow BD, Hain CR. Remote sensing of drought: Progress, challenges and opportunities. Reviews of Geophysics. 2015;53:452–480. doi: 10.1002/2014RG000456. [DOI] [Google Scholar]
  3. Aghdasi M, Fazli F, Bagherieh MB. Cloning and expression analysis of Arabidopsis TRR14 gene under salt and drought stress. Journal of Cell and Molecular Research. 2012;4:1–10. doi: 10.22067/jcmr.v4i1.12269. [DOI] [Google Scholar]
  4. Albalat R, Cañestro C. Evolution by gene loss. Nature Reviews Genetics. 2016;17:379–391. doi: 10.1038/nrg.2016.39. [DOI] [PubMed] [Google Scholar]
  5. Alonso JM, Stepanova AN, Leisse TJ, Kim CJ, Chen H, Shinn P, Stevenson DK, Zimmerman J, Barajas P, Cheuk R, Gadrinab C, Heller C, Jeske A, Koesema E, Meyers CC, Parker H, Prednis L, Ansari Y, Choy N, Deen H, Geralt M, Hazari N, Hom E, Karnes M, Mulholland C, Ndubaku R, Schmidt I, Guzman P, Aguilar-Henonin L, Schmid M, Weigel D, Carter DE, Marchand T, Risseeuw E, Brogden D, Zeko A, Crosby WL, Berry CC, Ecker JR. Genome-wide insertional mutagenesis of Arabidopsis thaliana. Science. 2003;301:653–657. doi: 10.1126/science.1086391. [DOI] [PubMed] [Google Scholar]
  6. Alonso-Blanco C, Gomez-Mena C, Llorente F, Koornneef M, Salinas J, Martínez-Zapater JM. Genetic and molecular analyses of natural variation indicate CBF2 as a candidate gene for underlying a freezing tolerance quantitative trait locus in Arabidopsis. Plant Physiology. 2005;139:1304–1312. doi: 10.1104/pp.105.068510. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Alonso-Blanco C, Méndez-Vigo B. Genetic architecture of naturally occurring quantitative traits in plants: an updated synthesis. Current Opinion in Plant Biology. 2014;18:37–43. doi: 10.1016/j.pbi.2014.01.002. [DOI] [PubMed] [Google Scholar]
  8. Amiguet-Vercher A, Santuari L, Gonzalez-Guzman M, Depuydt S, Rodriguez PL, Hardtke CS. The IBO germination quantitative trait locus encodes a phosphatase 2C-related variant with a nonsynonymous amino acid change that interferes with abscisic acid signaling. New Phytologist. 2015;205:1076–1082. doi: 10.1111/nph.13225. [DOI] [PubMed] [Google Scholar]
  9. Aukerman MJ. A deletion in the PHYD gene of the arabidopsis wassilewskija ecotype defines a role for phytochrome D in Red/Far-red light sensing. The Plant Cell Online. 1997;9:1317–1326. doi: 10.1105/tpc.9.8.1317. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Barboza L, Effgen S, Alonso-Blanco C, Kooke R, Keurentjes JJ, Koornneef M, Alcázar R. Arabidopsis semidwarfs evolved from independent mutations in GA20ox1, ortholog to green revolution dwarf alleles in rice and barley. PNAS. 2013;110:15818–15823. doi: 10.1073/pnas.1314979110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Bloomer RH, Juenger TE, Symonds VV. Natural variation in GL1 and its effects on trichome density in Arabidopsis thaliana. Molecular Ecology. 2012;21:3501–3515. doi: 10.1111/j.1365-294X.2012.05630.x. [DOI] [PubMed] [Google Scholar]
  12. Bosse M, Spurgin LG, Laine VN, Cole EF, Firth JA, Gienapp P, Gosler AG, McMahon K, Poissant J, Verhagen I, Groenen MAM, van Oers K, Sheldon BC, Visser ME, Slate J. Recent natural selection causes adaptive evolution of an avian polygenic trait. Science. 2017;358:365–368. doi: 10.1126/science.aal3298. [DOI] [PubMed] [Google Scholar]
  13. Burghardt LT, Metcalf CJ, Wilczek AM, Schmitt J, Donohue K. Modeling the influence of genetic and environmental variation on the expression of plant life cycles across landscapes. The American Naturalist. 2015;185:212–227. doi: 10.1086/679439. [DOI] [PubMed] [Google Scholar]
  14. Byers KJ, Xu S, Schluter PM. Molecular mechanisms of adaptation and speciation: why do we need an integrative approach? Molecular Ecology. 2017;26:277–290. doi: 10.1111/mec.13678. [DOI] [PubMed] [Google Scholar]
  15. Cingolani P, Platts A, Wang leL, Coon M, Nguyen T, Wang L, Land SJ, Lu X, Ruden DM. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly. 2012;6:80–92. doi: 10.4161/fly.19695. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Cui X, Fan B, Scholz J, Chen Z. Roles of arabidopsis cyclin-dependent kinase C complexes in cauliflower mosaic virus infection, plant growth, and development. The Plant Cell Online. 2007;19:1388–1402. doi: 10.1105/tpc.107.051375. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Cutter AD, Jovelin R. When natural selection gives gene function the cold shoulder. BioEssays. 2015;37:1169–1173. doi: 10.1002/bies.201500083. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Dean AM, Thornton JW. Mechanistic approaches to the study of evolution: the functional synthesis. Nature Reviews Genetics. 2007;8:675–688. doi: 10.1038/nrg2160. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Dietrich JD, Smith MD. The effect of timing of growing season drought on flowering of a dominant C4 grass. Oecologia. 2016;181:391–399. doi: 10.1007/s00442-016-3579-4. [DOI] [PubMed] [Google Scholar]
  20. Dittberner H, Korte A, Mettler-Altmann T, Weber APM, Monroe G, de Meaux J. Natural variation in stomata size contributes to the local adaptation of water-use efficiency in Arabidopsis thaliana. Molecular Ecology. 2018;27:4052–4065. doi: 10.1111/mec.14838. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Donohue K. Germination timing influences natural selection on life-history characters in arabidopsis thaliana. Ecology. 2002;83:1006–1016. doi: 10.1890/0012-9658(2002)083[1006:GTINSO]2.0.CO;2. [DOI] [Google Scholar]
  22. Flood PJ, Hancock AM. The genomic basis of adaptation in plants. Current Opinion in Plant Biology. 2017;36:88–94. doi: 10.1016/j.pbi.2017.02.003. [DOI] [PubMed] [Google Scholar]
  23. Flowers JM, Hanzawa Y, Hall MC, Moore RC, Purugganan MD. Population genomics of the Arabidopsis thaliana flowering time gene network. Molecular Biology and Evolution. 2009;26:2475–2486. doi: 10.1093/molbev/msp161. [DOI] [PubMed] [Google Scholar]
  24. Fox GA. Drought and the evolution of flowering time in desert annuals. American Journal of Botany. 1990;77:1508–1518. doi: 10.1002/j.1537-2197.1990.tb12563.x. [DOI] [Google Scholar]
  25. Franks SJ, Sim S, Weis AE. Rapid evolution of flowering time by an annual plant in response to a climate fluctuation. PNAS. 2007;104:1278–1282. doi: 10.1073/pnas.0608379104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Gan X, Stegle O, Behr J, Steffen JG, Drewe P, Hildebrand KL, Lyngsoe R, Schultheiss SJ, Osborne EJ, Sreedharan VT, Kahles A, Bohnert R, Jean G, Derwent P, Kersey P, Belfield EJ, Harberd NP, Kemen E, Toomajian C, Kover PX, Clark RM, Rätsch G, Mott R. Multiple reference genomes and transcriptomes for Arabidopsis thaliana. Nature. 2011;477:419–423. doi: 10.1038/nature10414. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Gogarten SM, Bhangale T, Conomos MP, Laurie CA, McHugh CP, Painter I, Zheng X, Crosslin DR, Levine D, Lumley T, Nelson SC, Rice K, Shen J, Swarnkar R, Weir BS, Laurie CC. GWASTools: an R/Bioconductor package for quality control and analysis of genome-wide association studies. Bioinformatics. 2012;28:3329–3331. doi: 10.1093/bioinformatics/bts610. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Goodstein DM, Shu S, Howson R, Neupane R, Hayes RD, Fazo J, Mitros T, Dirks W, Hellsten U, Putnam N, Rokhsar DS. Phytozome: a comparative platform for green plant genomics. Nucleic Acids Research. 2012;40:D1178–D1186. doi: 10.1093/nar/gkr944. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Grant MR, McDowell JM, Sharpe AG, de Torres Zabala M, Lydiate DJ, Dangl JL. Independent deletions of a pathogen-resistance gene in Brassica and Arabidopsis. PNAS. 1998;95:15843–15848. doi: 10.1073/pnas.95.26.15843. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Gujas B, Alonso-Blanco C, Hardtke CS. Natural Arabidopsis brx loss-of-function alleles confer root adaptation to acidic soil. Current Biology. 2012;22:1962–1968. doi: 10.1016/j.cub.2012.08.026. [DOI] [PubMed] [Google Scholar]
  31. Hauser MT, Harr B, Schlötterer C. Trichome distribution in Arabidopsis thaliana and its close relative Arabidopsis lyrata: molecular analysis of the candidate gene GLABROUS1. Molecular Biology and Evolution. 2001;18:1754–1763. doi: 10.1093/oxfordjournals.molbev.a003963. [DOI] [PubMed] [Google Scholar]
  32. Hijmans RJ, Cameron SE, Parra JL, Jones PG, Jarvis A. Very high resolution interpolated climate surfaces for global land areas. International Journal of Climatology. 2005;25:1965–1978. doi: 10.1002/joc.1276. [DOI] [Google Scholar]
  33. Hijmans RJ. raster: Geographic Data Analysis and Modeling. 2016 https://rdrr.io/cran/raster/
  34. Hoekstra HE, Hirschmann RJ, Bundey RA, Insel PA, Crossland JP. A single amino acid mutation contributes to adaptive beach mouse color pattern. Science. 2006;313:101–104. doi: 10.1126/science.1126121. [DOI] [PubMed] [Google Scholar]
  35. Hoekstra HE, Coyne JA. The locus of evolution: evo devo and the genetics of adaptation. Evolution. 2007;61:995–1016. doi: 10.1111/j.1558-5646.2007.00105.x. [DOI] [PubMed] [Google Scholar]
  36. Jia Q, Zhang J, Westcott S, Zhang XQ, Bellgard M, Lance R, Li C. GA-20 oxidase as a candidate for the semidwarf gene sdw1/denso in barley. Functional & Integrative Genomics. 2009;9:255–262. doi: 10.1007/s10142-009-0120-4. [DOI] [PubMed] [Google Scholar]
  37. Johanson U, West J, Lister C, Michaels S, Amasino R, Dean C. Molecular analysis of FRIGIDA, a major determinant of natural variation in Arabidopsis flowering time. Science. 2000;290:344–347. doi: 10.1126/science.290.5490.344. [DOI] [PubMed] [Google Scholar]
  38. Kang J, Zhang H, Sun T, Shi Y, Wang J, Zhang B, Wang Z, Zhou Y, Gu H. Natural variation of C-repeat-binding factor (CBFs) genes is a major cause of divergence in freezing tolerance among a group of Arabidopsis thaliana populations along the Yangtze River in China. New Phytologist. 2013;199:1069–1080. doi: 10.1111/nph.12335. [DOI] [PubMed] [Google Scholar]
  39. Katoh K, Standley DM. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Molecular Biology and Evolution. 2013;30:772–780. doi: 10.1093/molbev/mst010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Kenney AM, McKay JK, Richards JH, Juenger TE. Direct and indirect selection on flowering time, water-use efficiency (WUE, δ (13)C), and WUE plasticity to drought in Arabidopsis thaliana. Ecology and Evolution. 2014;4:4505–4521. doi: 10.1002/ece3.1270. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Kerdaffrec E, Filiault DL, Korte A, Sasaki E, Nizhynska V, Seren Ü, Nordborg M. Multiple alleles at a single locus control seed dormancy in Swedish Arabidopsis. eLife. 2016;5:e22502. doi: 10.7554/eLife.22502. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Kliebenstein DJ. Gene duplication in the diversification of secondary metabolism: tandem 2-oxoglutarate-dependent dioxygenases control glucosinolate biosynthesis in arabidopsis. The Plant Cell Online. 2001;13:681–693. doi: 10.1105/tpc.13.3.681. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Kogan FN. Global drought watch from space. Bulletin of the American Meteorological Society. 1997;78:621–636. doi: 10.1175/1520-0477(1997)078&#x0003c;0621:GDWFS&#x0003e;2.0.CO;2. [DOI] [Google Scholar]
  44. Kogan F, Yang B, Wei G, Zhiyuan P, Xianfeng J. Modelling corn production in China using AVHRR‐based vegetation health indices. International Journal of Remote Sensing. 2005;26:2325–2336. doi: 10.1080/01431160500034235. [DOI] [Google Scholar]
  45. Kooyers NJ. The evolution of drought escape and avoidance in natural herbaceous populations. Plant Science. 2015;234:155–162. doi: 10.1016/j.plantsci.2015.02.012. [DOI] [PubMed] [Google Scholar]
  46. Kroymann J, Donnerhacke S, Schnabelrauch D, Mitchell-Olds T. Evolutionary dynamics of an Arabidopsis insect resistance quantitative trait locus. PNAS. 2003;100:14587–14592. doi: 10.1073/pnas.1734046100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Lamesch P, Berardini TZ, Li D, Swarbreck D, Wilks C, Sasidharan R, Muller R, Dreher K, Alexander DL, Garcia-Hernandez M, Karthikeyan AS, Lee CH, Nelson WD, Ploetz L, Singh S, Wensel A, Huala E. The Arabidopsis Information Resource (TAIR): improved gene annotation and new tools. Nucleic Acids Research. 2012;40:D1202–D1210. doi: 10.1093/nar/gkr1090. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Le Corre V, Roux F, Reboud X. DNA polymorphism at the FRIGIDA gene in Arabidopsis thaliana: extensive nonsynonymous variation is consistent with local selection for flowering time. Molecular Biology and Evolution. 2002;19:1261–1271. doi: 10.1093/oxfordjournals.molbev.a004187. [DOI] [PubMed] [Google Scholar]
  49. Lovell JT, Juenger TE, Michaels SD, Lasky JR, Platt A, Richards JH, Yu X, Easlon HM, Sen S, McKay JK. Pleiotropy of FRIGIDA enhances the potential for multivariate adaptation. Proceedings of the Royal Society B: Biological Sciences. 2013;280:20131043. doi: 10.1098/rspb.2013.1043. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. MacArthur DG, Balasubramanian S, Frankish A, Huang N, Morris J, Walter K, Jostins L, Habegger L, Pickrell JK, Montgomery SB, Albers CA, Zhang ZD, Conrad DF, Lunter G, Zheng H, Ayub Q, DePristo MA, Banks E, Hu M, Handsaker RE, Rosenfeld JA, Fromer M, Jin M, Mu XJ, Khurana E, Ye K, Kay M, Saunders GI, Suner MM, Hunt T, Barnes IH, Amid C, Carvalho-Silva DR, Bignell AH, Snow C, Yngvadottir B, Bumpstead S, Cooper DN, Xue Y, Romero IG, Wang J, Li Y, Gibbs RA, McCarroll SA, Dermitzakis ET, Pritchard JK, Barrett JC, Harrow J, Hurles ME, Gerstein MB, Tyler-Smith C, 1000 Genomes Project Consortium A systematic survey of loss-of-function variants in human protein-coding genes. Science. 2012;335:823–828. doi: 10.1126/science.1215040. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Mauricio R, Stahl EA, Korves T, Tian D, Kreitman M, Bergelson J. Natural selection for polymorphism in the disease resistance gene Rps2 of Arabidopsis thaliana. Genetics. 2003;163:735–746. doi: 10.1093/genetics/163.2.735. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. McKay JK, Richards JH, Mitchell-Olds T. Genetics of drought adaptation in Arabidopsis thaliana: I. Pleiotropy contributes to genetic correlations among ecological traits. Molecular Ecology. 2003;12:1137–1151. doi: 10.1046/j.1365-294X.2003.01833.x. [DOI] [PubMed] [Google Scholar]
  53. Méndez-Vigo B, Picó FX, Ramiro M, Martínez-Zapater JM, Alonso-Blanco C. Altitudinal and climatic adaptation is mediated by flowering traits and FRI, FLC, and PHYC genes in Arabidopsis. Plant Physiology. 2011;157:1942–1955. doi: 10.1104/pp.111.183426. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Mickelbart MV, Hasegawa PM, Bailey-Serres J. Genetic mechanisms of abiotic stress tolerance that translate to crop yield stability. Nature Reviews Genetics. 2015;16:237–251. doi: 10.1038/nrg3901. [DOI] [PubMed] [Google Scholar]
  55. Mojica JP, Mullen J, Lovell JT, Monroe JG, Paul JR, Oakley CG, McKay JK. Genetics of water use physiology in locally adapted Arabidopsis thaliana. Plant Science. 2016;251:12–22. doi: 10.1016/j.plantsci.2016.03.015. [DOI] [PubMed] [Google Scholar]
  56. Monroe JG, McGovern C, Lasky JR, Grogan K, Beck J, McKay JK. Adaptation to warmer climates by parallel functional evolution of CBF genes in Arabidopsis thaliana. Molecular Ecology. 2016;25:3632–3644. doi: 10.1111/mec.13711. [DOI] [PubMed] [Google Scholar]
  57. Mouchel CF, Briggs GC, Hardtke CS. Natural genetic variation in Arabidopsis identifies BREVIS RADIX, a novel regulator of cell proliferation and elongation in the root. Genes & Development. 2004;18:700–714. doi: 10.1101/gad.1187704. [DOI] [PMC free article] [PubMed] [Google Scholar]
  58. Nam NH, Chauhan YS, Johansen C. Effect of timing of drought stress on growth and grain yield of extra-short-duration pigeonpea lines. The Journal of Agricultural Science. 2001;136:179–189. doi: 10.1017/S0021859601008607. [DOI] [Google Scholar]
  59. O'Malley RC, Ecker JR. Linking genotype to phenotype using the Arabidopsis unimutant collection. The Plant Journal. 2010;61:928–940. doi: 10.1111/j.1365-313X.2010.04119.x. [DOI] [PubMed] [Google Scholar]
  60. Olsen KM, Wendel JF. A bountiful harvest: genomic insights into crop domestication phenotypes. Annual Review of Plant Biology. 2013;64:47–70. doi: 10.1146/annurev-arplant-050312-120048. [DOI] [PubMed] [Google Scholar]
  61. Olson MV. When less is more: gene loss as an engine of evolutionary change. The American Journal of Human Genetics. 1999;64:18–23. doi: 10.1086/302219. [DOI] [PMC free article] [PubMed] [Google Scholar]
  62. Passioura JB. Drought and drought tolerance. Plant Growth Regulation. 1996;20:79–83. doi: 10.1007/BF00024003. [DOI] [Google Scholar]
  63. Passioura JB. Scaling up: the essence of effective agricultural research. Functional Plant Biology. 2010;37:585–591. doi: 10.1071/FP10106. [DOI] [Google Scholar]
  64. Pennings PS, Hermisson J. Soft sweeps III: the signature of positive selection from recurrent mutation. PLoS Genetics. 2006;2:e186. doi: 10.1371/journal.pgen.0020186. [DOI] [PMC free article] [PubMed] [Google Scholar]
  65. Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D. Principal components analysis corrects for stratification in genome-wide association studies. Nature Genetics. 2006;38:904–909. doi: 10.1038/ng1847. [DOI] [PubMed] [Google Scholar]
  66. Qin L, Han P, Chen L, Walk TC, Li Y, Hu X, Xie L, Liao H, Liao X. Genome-Wide identification and expression analysis of NRAMP family genes in soybean (Glycine Max L.) Frontiers in Plant Science. 2017;8:1436. doi: 10.3389/fpls.2017.01436. [DOI] [PMC free article] [PubMed] [Google Scholar]
  67. R Core Development Team . Vienna, Austria: R Foundation for Statistical Computing; 2017. [Google Scholar]
  68. Ralph P, Coop G. Parallel adaptation: one or many waves of advance of an advantageous allele? Genetics. 2010;186:647–668. doi: 10.1534/genetics.110.119594. [DOI] [PMC free article] [PubMed] [Google Scholar]
  69. Ralph PL, Coop G. Convergent evolution during local adaptation to patchy landscapes. PLOS Genetics. 2015;11:e1005630. doi: 10.1371/journal.pgen.1005630. [DOI] [PMC free article] [PubMed] [Google Scholar]
  70. Ratcliffe D. Adaptation to habitat in a group of annual plants. The Journal of Ecology. 1961;49:187–203. doi: 10.2307/2257433. [DOI] [Google Scholar]
  71. Rausher MD. Evolutionary transitions in floral Color. International Journal of Plant Sciences. 2008;169:7–21. doi: 10.1086/523358. [DOI] [Google Scholar]
  72. Remington DL. Alleles versus mutations: Understanding the evolution of genetic architecture requires a molecular perspective on allelic origins. Evolution. 2015;69:3025–3038. doi: 10.1111/evo.12775. [DOI] [PubMed] [Google Scholar]
  73. Rojas O, Vrieling A, Rembold F. Assessing drought probability for agricultural areas in Africa with coarse resolution remote sensing imagery. Remote Sensing of Environment. 2011;115:343–352. doi: 10.1016/j.rse.2010.09.006. [DOI] [Google Scholar]
  74. Rose L, Atwell S, Grant M, Holub EB. Parallel Loss-of-Function at the RPM1 bacterial resistance locus in arabidopsis thaliana. Frontiers in Plant Science. 2012;3:287. doi: 10.3389/fpls.2012.00287. [DOI] [PMC free article] [PubMed] [Google Scholar]
  75. Rutter MT, Wieckowski YM, Murren CJ, Strand AE. Fitness effects of mutation: testing genetic redundancy in Arabidopsis thaliana. Journal of Evolutionary Biology. 2017;30:1124–1135. doi: 10.1111/jeb.13081. [DOI] [PubMed] [Google Scholar]
  76. Shindo C, Aranzana MJ, Lister C, Baxter C, Nicholls C, Nordborg M, Dean C. Role of FRIGIDA and FLOWERING LOCUS C in determining variation in flowering time of Arabidopsis. Plant Physiology. 2005;138:1163–1173. doi: 10.1104/pp.105.061309. [DOI] [PMC free article] [PubMed] [Google Scholar]
  77. Shindo C, Bernasconi G, Hardtke CS. Intraspecific competition reveals conditional fitness effects of single gene polymorphism at the Arabidopsis root growth regulator BRX. New Phytologist. 2008;180:71–80. doi: 10.1111/j.1469-8137.2008.02553.x. [DOI] [PubMed] [Google Scholar]
  78. Smith JM. Natural selection and the concept of a protein space. Nature. 1970;225:563–564. doi: 10.1038/225563a0. [DOI] [PubMed] [Google Scholar]
  79. Spielmeyer W, Ellis MH, Chandler PM. Semidwarf (sd-1), "green revolution" rice, contains a defective gibberellin 20-oxidase gene. PNAS. 2002;99:9043–9048. doi: 10.1073/pnas.132266399. [DOI] [PMC free article] [PubMed] [Google Scholar]
  80. Stahl EA, Dwyer G, Mauricio R, Kreitman M, Bergelson J. Dynamics of disease resistance polymorphism at the Rpm1 locus of Arabidopsis. Nature. 1999;400:667–671. doi: 10.1038/23260. [DOI] [PubMed] [Google Scholar]
  81. Stinchcombe JR, Weinig C, Ungerer M, Olsen KM, Mays C, Halldorsdottir SS, Purugganan MD, Schmitt J. A latitudinal cline in flowering time in Arabidopsis thaliana modulated by the flowering time gene FRIGIDA. PNAS. 2004;101:4712–4717. doi: 10.1073/pnas.0306401101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  82. Tardieu F. Any trait or trait-related allele can confer drought tolerance: just design the right drought scenario. Journal of Experimental Botany. 2012;63:25–31. doi: 10.1093/jxb/err269. [DOI] [PubMed] [Google Scholar]
  83. Thompson L. The spatiotemporal effects of nitrogen and litter on the population dynamics of arabidopsis thaliana. The Journal of Ecology. 1994;82:63–68. doi: 10.2307/2261386. [DOI] [Google Scholar]
  84. Tian D, Traw MB, Chen JQ, Kreitman M, Bergelson J. Fitness costs of R-gene-mediated resistance in Arabidopsis thaliana. Nature. 2003;423:74–77. doi: 10.1038/nature01588. [DOI] [PubMed] [Google Scholar]
  85. Torkamaneh D, Laroche J, Rajcan I, Belzile F. Identification of candidate domestication-related genes with a systematic survey of loss-of-function mutations. The Plant Journal. 2018;106 doi: 10.1111/tpj.14104. [DOI] [PubMed] [Google Scholar]
  86. Trenberth KE, Dai A, van der Schrier G, Jones PD, Barichivich J, Briffa KR, Sheffield J. Global warming and changes in drought. Nature Climate Change. 2014;4:17–22. doi: 10.1038/nclimate2067. [DOI] [Google Scholar]
  87. Weigel D, Nordborg M. Population genomics for understanding adaptation in wild plant species. Annual Review of Genetics. 2015a;49:315–338. doi: 10.1146/annurev-genet-120213-092110. [DOI] [PubMed] [Google Scholar]
  88. Weigel D, Nordborg M. Population genomics for understanding adaptation in wild plant species. Annual Review of Genetics. 2015b;49:315–338. doi: 10.1146/annurev-genet-120213-092110. [DOI] [PubMed] [Google Scholar]
  89. Werner JD, Borevitz JO, Warthmann N, Trainer GT, Ecker JR, Chory J, Weigel D. Quantitative trait locus mapping and DNA array hybridization identify an FLM deletion as a cause for natural flowering-time variation. PNAS. 2005;102:2460–2465. doi: 10.1073/pnas.0409474102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  90. Xiang Y, Nakabayashi K, Ding J, He F, Bentsink L, Soppe WJ. Reduced Dormancy5 encodes a protein phosphatase 2C that is required for seed dormancy in Arabidopsis. The Plant Cell Online. 2014;26:4362–4375. doi: 10.1105/tpc.114.132811. [DOI] [PMC free article] [PubMed] [Google Scholar]
  91. Xiang Y, Song B, Née G, Kramer K, Finkemeier I, Soppe WJ. Sequence polymorphisms at the reduced dormancy5 pseudophosphatase underlie natural variation in arabidopsis dormancy. Plant Physiology. 2016;171:2659–2670. doi: 10.1104/pp.16.00525. [DOI] [PMC free article] [PubMed] [Google Scholar]
  92. Zhen Y, Ungerer MC. Relaxed selection on the CBF/DREB1 regulatory genes and reduced freezing tolerance in the southern range of Arabidopsis thaliana. Molecular Biology and Evolution. 2008;25:2547–2555. doi: 10.1093/molbev/msn196. [DOI] [PubMed] [Google Scholar]
  93. Zhu W, Ausin I, Seleznev A, Méndez-Vigo B, Picó FX, Sureshkumar S, Sundaramoorthi V, Bulach D, Powell D, Seemann T, Alonso-Blanco C, Balasubramanian S. Natural variation identifies ICARUS1, a universal gene required for cell proliferation and growth at high temperatures in arabidopsis thaliana. PLOS Genetics. 2015;11:e1005085. doi: 10.1371/journal.pgen.1005085. [DOI] [PMC free article] [PubMed] [Google Scholar]

Decision letter

Editor: Daniel J Kliebenstein1
Reviewed by: Arthur Korte2

In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.

Thank you for submitting your article "Drought adaptation in Arabidopsis thaliana by extensive genetic loss-of-function" for consideration by eLife. Your article has been reviewed by three peer reviewers, one of whom is a member of our Board of Reviewing Editors, and the evaluation has been overseen by Christian Hardtke as the Senior Editor.

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

Summary:

This manuscript identifies a subset of existing natural knockouts and provides evidence that they are likely causal in natural variation within the species.

Essential revisions:

1) There was a concern that the inflated GWAS significance may be a result of some consistent error imparted by the functional allele assignment rather than all the genes being causal. Some analysis that can argue against this would be helpful especially to convince the generalist reviewer. We came up with two ideas but are willing to assess any other test you can develop. The ideas raised in our discussion were a) permute the phenotype to see that after this the glm will not lead to inflated results or b) permute the respective binary gene-wise score to see that an arbitrary assignment with the same allele frequency will agree with the null hypothesis.

2) The eco/eco aspects of the events needs to be assessed more broadly given the breadth of Arabidopsis life styles and how positive in one may be negative in another.

3) There is a need to better discuss the calling of specific events as a number of known events were not found.

Reviewer #1:

The authors conduct a survey to find LOF mutations with regards to the Col-0 reference genome. They then work to show that there is an association to potential adaptation and drought. This is a highly interesting manuscript but there are some issues with false negative rates in the LOF lists and referencing of the primary literature.

One conflict in this manuscript that I had was the idea that the whole manuscript was about drought adaptation yet the validation was on flowering time. There was no real discussion on if these mutants may or may not alter drought responses and if so, are those effects as unidirectional as for flowering. This conflict would optimally be resolved with experimental data at best or alternatively with discussion reflecting this difficulty.

I find it odd that none of the citations for LOF mutations contributing to adaptation or fitness are prior to 2006 even though there are a large number of Arabidopsis and other mutations that had LOF natural variation found prior to that. This includes key genes controlling flowering and defense such as RPM1, RPS2, FLM, AOP2, etc. Some of these genes such as the work by Bergelson on R genes and Kliebenstein on glucosinolates have direct evidence of field fitness effects of these natural variants in LOF. I understand that the authors prefer to use review articles but they should really use primary research literature as that is the real work that should be given acknowledgement especially as there is no length limit in eLife. This lack of primary literature may have led to the next issue about the LOF gene list.

A cursory analysis of the list of genes in the supplementary information found that the list is missing a number of genes with published loss of function events, I.e. BRX, AOP2, MAM, etc. This indicates that there is a significant false negative issue within the compilation of genes. The authors need to go through the literature to identify a collection of genes with known loss-of-function events and then assess how many they did or did not find. This is essential to let future researchers know how complete the list of genes is or is not. Is it possible that this is biased by use of the Col-0 genome as the reference and potentially not looking for GOF alleles in the other accessions which would be LOF if you shift the reference genome?

Equally, it seems like the authors should discuss settings where the LOF are not multiple independent events as is the case for RPM1 and RPS2. The general text has a feeling that all LOF are multiple independent events which may come from the soft sweep citation but that is not the exclusive view for plant natural variation.

Flowering time analysis seems to have only been conducted in one environment. The authors should discuss the fact that the environment has a key role in determining flowering time and how doing a broader range of environments with the mutants may influence the results.

For Figure 3F, is a linear correlation the best fit to the data? It looks like a non-linear correlation would be a better fit. The authors should do a model comparison of linear and non-linear regressions to see which best fits the data as a non-linear fit could alter the interpretation as that would suggest a maximal effect.

Reviewer #2:

This comprehensive study integrates across diverse approaches to detect drought timing and evaluate the genetic basis of adaptation to drought in the context of loss of function in the model organism, Arabidopsis thaliana. The innovative use of the Vegetative Health Index generated data on the timing of drought for numerous accessions of Arabidopsis. This approach could potentially be applied to other systems. The current study uses previously published genomic data to detect potential candidate genes associated with drought (as measured via the VHI) and flowering time (from a previously published growth chamber experiment). After evaluating statistical associations between drought, loss of function genes, and flowering time, the authors conducted gene knock out studies at several candidate genes showing relationships between loss of function and spring drought to evaluate causal link with flowering time.

I wonder about the adaptive nature of these associations. For example, is delayed flowering adaptive under spring drought and earlier flowering adaptive under summer drought? That is, are loss of function alleles associated with adaptive changes in flowering phenology? In the third paragraph of the Results and Discussion, the authors point to two studies (Kooyers, 2015; and Dittberner, 2018) to support the assertion that these phonological changes are adaptive. Unfortunately, the Dittberner Endnote citation was inadvertently excluded from the references, so that I cannot look at it. Kooyers, 2015, discusses drought avoidance vs. escape as general plant strategies, with escape associated with rapid growth and avoidance associated with other morphological and physiological traits that confer higher water-use efficiency. The typical thought is that plants can escape from drought by flowering early. In the current study, the authors suggest that later flowering genotypes may avoid spring drought. When does germination occur in sites with spring drought? Late flowering genotypes would still experience the spring drought as juveniles, depending on when germination occurred. It does not seem clear that delay flowering enables those plants to escape from the drought, given that early life history stages are very susceptible to drought. It seems problematic to refer to loss of function as generating adaptive shifts in flowering phenology without fitness data (ideally in the field) to test those hypotheses directly. That said, I appreciate that the Dn/Ds and Pn/Ps analyses point to positive selection for loss of function alleles in genes associated with drought or flowering time.

Figure 1A: What data are used to determine the regions of drought stress (in graded brown at the bottom of the top two panels)? The Materials and methods (subsection “Satellite-Detected Drought Histories of Arabidopsis”, first paragraph) set 40 as the threshold for drought (values of HVI <40 are indicative of drought). How was that value determined? How does it relate to drought stress as perceived or experienced by Arabidopsis in nature? The authors use the HVI to determine the timing of drought for Arabidopsis. Have they ground-truthed these drought metrics in any of the field sites? How reliable is the HVI for characterizing exposure of Arabidopsis to drought in its native range?

Figure 1C focuses on spring vs. summer droughts. Have winter droughts (present in panel 1B, right side) affected the timing of flowering of Arabidopsis in any of these populations? It seems like winter drought could affect flowering time for both fall germinating and spring germinating ecotypes.

What other factors could drive population divergence between populations with spring vs. summer drought? The manuscript seems to assume that drought is the only factor affecting those differences. For example, the Materials and methods state "Summer drought genes were identified as those in which LoF alleles are found in ecotypes that experience a significantly (βdrought timing <0 & Pdrought timing <0.05) more negative drought-timing index […] Conversely, spring drought genes were identified as those in which LoF alleles are found in ecotypes that experience a significantly (βdrought timing > 0 & Pdrought timing <0.05) more positive drought-timing index…" Are there other environmental factors that covary with drought that could also influence evolution at these loci? How can the authors be sure that these are really "summer drought" vs. "spring drought" genes? Are these genes consistent with mapped regions for drought tolerance in Arabidopsis?

Additional points:

I recommend deleting the first part of the sentence ("Plants have been adapting to drought for millennia.…"). For one, plants have been adapting to drought ever since they colonized land from the mid-Ordovician to the Devonian, over 400 million years ago. Secondly, this phrase does not provide information that advances the narrative.

Introduction, second paragraph: Please provide citations for the statement that most research has focused on late-season droughts. This statement does not resonate with my experience conducting studies and reviewing manuscripts. When possible to manipulate in the field, researchers impose drought in an ecologically-relevant fashion. In the lab, researchers generally time drought treatments for a developmentally-relevant stage.

Figure 1A It might be useful to label the two locations with the names of the Arabidopsis accessions or provide the geographic region, in addition to the latitudes and longitude.

Both panels of Figure 1A (especially the panel on the right) seem to imply that drought stress is occurring less frequently through time. The darker lines indicative of more recent years seem to occur in regions of higher VHI. Is that correct?

In the subsection “Experimental Testing of Predicted Phenotypes in Gene Knockout Lines”, it states that flowering time was assessed as days from planting to the emergence of the first flower. Is there variation in germination timing? Why not measure flowering time as days from germination to the first flower?

Figure 3F: Is this relationship linear or might a curvilinear model fit better?

Reviewer #3:

Monroe and colleagues describe the link of loss-of-function Alleles with drought adaptation and flowering time in A. thaliana. The manuscript is well written and interesting conclusions are reported. Especially the high overlap of associations for summer drought and early flowering and spring drought and late flowering is intriguing. Additional the functional follow-up in T-DNA knock out lines is excellent.

Still, I have one major comment.

My major concern is the statistical framework used for GWAS.

The authors used logistic regression in a glm and added the first 3 principle components to correct for population structure. This differs from the standard GWAS procedure in A. thaliana which uses a linear mixed model to correct for population structure confounding. The rational why the authors used this model is not well described in the manuscript. Additionally, the results differ markedly from the analysis with a classical LMM. (I run a normal LMM with the provided data for comparison, happy to provide this if needed) Next, the qq_plot is also highly inflated (which might be expected collapsing LoF Alleles to one score per gene), but is not if a normal LMM is used.

To summarize, I am not completely sure what to make out of this, especially as the results and conclusion look really nice with the presented method (e.g. Figure 2 is really impressive).

Still, it would be good if the authors at least comment on why to use the proposed framework and the inflation observed in a qq_plot.

eLife. 2018 Dec 6;7:e41038. doi: 10.7554/eLife.41038.021

Author response


Essential revisions:

1) There was a concern that the inflated GWAS significance may be a result of some consistent error imparted by the functional allele assignment rather than all the genes being causal. Some analysis that can argue against this would be helpful especially to convince the generalist reviewer. We came up with two ideas but are willing to assess any other test you can develop. The ideas raised in our discussion were a) permute the phenotype to see that after this the glm will not lead to inflated results or b) permute the respective binary gene-wise score to see that an arbitrary assignment with the same allele frequency will agree with the null hypothesis.

Thank you for expressing these concerns. We too wanted to verify that the results were not an artifact of the functional allele assignment and greatly appreciate the ideas to address this.

The revised manuscript includes new analyses inspired by this suggestion. We chose to implement a permutation method based on idea (b) above, permuting the genotype matrix but keeping allele frequencies identical. We then repeated the genome wide association scan and compared the results to those generated by the natural genotype matrix. By permuting the genotype matrix we were able to address 1) the concerns about the allele assignments explaining the inflated significance as well as 2) the possibility that the overlap of gene associations with drought timing and flowering time being simply explained by the correlation between ecotype flowering times and drought timing at their home environments.

As expected, we found that for the permuted genotype matrix, p values were not inflated and fell within the confidence interval of the expected observed=expected line. Indeed, when we permuted the genotype matrix, we found no genes that were significantly associated with drought timing or flowering time after a Bonferroni correction. This result has been added to a new supplementary figure in the manuscript (Figure 2—figure supplement 1).

Additionally, we found that the correlation between p values for gene associations with drought timing and p values for gene associations to flowering time fell from 0.48 to 0.01 in the permuted genotypes. Because drought timing and flowering time vectors were unmodified, this indicates that the overlaps we observed between gene associations are not explained entirely by collinearity between drought timing and flowering time. This result has also been included in Figure 2—figure supplement 1.

Finally, we added a section to the Materials and methods with a more detailed explanation of the rationale behind our functional genome wide association scanning approach. Specifically, we discuss why the tests are more likely to point to phenotypically impactful genetic variation and why a functional genome wide association approach places greater priority on reducing false negative associations than traditional GWAS. We hope that together, these new results and the more thorough explanation of the methods, provide readers with a better understanding of the primary findings. Further details are provided in the responses to specific reviewer concerns below.

2) The eco/eco aspects of the events needs to be assessed more broadly given the breadth of Arabidopsis life styles and how positive in one may be negative in another.

Thank you for the suggestion. We agree that the initial manuscript failed to adequately consider the results in the context of Arabidopsis ecology and life history. The revised manuscript contains numerous changes to better frame this work around the natural ecology of Arabidopsis. For example:

“Flowering time is only one component of phenology and other adaptive life history transitions such a germination timing (Donohue 2002) may also be influenced by drought timing and could change how drought timing affects the evolution of flowering time, a hypothesis that warrants further investigation.”

To the revised manuscript, we have also added several new analyses that we hope will yield address some reviewer concerns about the conclusions drawn from the findings presented in the original submission. Specifically, we present two new multiple linear regressions showing that spring and summer drought frequency are the most important predictors of flowering time, rather than drought frequency during other seasons, and that drought timing is a good predictor of allele associations to flowering time even while accounting for allele associations to other potentially important environmental variables, latitude and minimum temperature. These results are included in two supplemental tables (Supplementary file 3A, B). Further details are provided in the responses to specific reviewer concerns below.

3) There is a need to better discuss the calling of specific events as a number of known events were not found.

Thank you for bringing up this important point. The revised manuscript now contains a survey of LoF alleles in previously studied genes using our calling method and with the findings reported in new table (Supplementary file 2D, E). We confirmed the presence of LoF alleles in all of these genes except aop2, which has no gene model in Col-0 because it is annotated as a pseudogene and could not be evaluated with our pipeline. Most of these previously known LoF containing genes however, were not included in the 2088 which we tested for associations to drought timing and flowering time because the LoF allele frequencies were below our filtering threshold. The revised manuscript now contains the following paragraph:

“It should be noted that the 2088 genes tested for associations to flowering time and drought timing are not a complete representation of LoF alleles in Arabidopsis. […] Thus, while the methods used here are designed to minimize false positives (alleles classified as LoF, but which are actually functional), the likely occurrence of false negatives (undetected LoF alleles) in available data motivates the need for more sophisticated species wide genome sequencing efforts including a greater diversity of de-novo quality genomes for comprehensive detection of functionally relevant genetic variation across the species.”

Further details are provided in the responses to specific reviewer concerns below.

Reviewer #1:

The authors conduct a survey to find LOF mutations with regards to the Col-0 reference genome. They then work to show that there is an association to potential adaptation and drought. This is a highly interesting manuscript but there are some issues with false negative rates in the LOF lists and referencing of the primary literature.

One conflict in this manuscript that I had was the idea that the whole manuscript was about drought adaptation yet the validation was on flowering time. There was no real discussion on if these mutants may or may not alter drought responses and if so, are those effects as unidirectional as for flowering. This conflict would optimally be resolved with experimental data at best or alternatively with discussion reflecting this difficulty.

Thank you for raising this point. We have added the following statement to express this need for further experimental data:

“Future experimental work will be valuable to identify other plant physiological traits affected by the LoF alleles associated with drought timing.”

We have also revised the manuscript to clarify the connection between drought timing and flowering time in several places.

In revised manuscript:

“Flowering time in Arabidopsis is correlated with other drought tolerance traits such as water use efficiency and can serve as a proxy for alternative drought tolerance strategies, with early flowering genotypes being associated with low water use efficiency (drought escape strategy) and late flowering genotypes with high water use efficiency (dehydration avoidance strategy) (McKay et al., 2003; Lovell et al., 2013; Kenney et al., 2014). […] This hypothesis motivated our investigation to identify alleles associated with drought timing and test the prediction that they contribute to adaptive flowering time evolution.”

In revised manuscript:

“These results further support the classical hypothesis that the relationship between phenology and drought timing is the most important feature of plant drought tolerance (Passioura, 1996), indicating the evolution of “drought escape” through earlier flowering in summer drought environments, and “dehydration avoidance” by later flowering genotypes in spring drought environments. […]This pattern is also consistent with hypotheses explaining the more water conservative water use and stomatal traits observed in late flowering genotypes (Kooyers, 2015)(McKay et al., 2003; Lovell et al., 2013; Kenney et al., 2014) and those from spring drought environments (Dittberner et al., 2018).”

I find it odd that none of the citations for LOF mutations contributing to adaptation or fitness are prior to 2006 even though there are a large number of Arabidopsis and other mutations that had LOF natural variation found prior to that. This includes key genes controlling flowering and defense such as RPM1, RPS2, FLM, AOP2, etc. Some of these genes such as the work by Bergelson on R genes and Kliebenstein on glucosinolates have direct evidence of field fitness effects of these natural variants in LOF. I understand that the authors prefer to use review articles but they should really use primary research literature as that is the real work that should be given acknowledgement especially as there is no length limit in eLife. This lack of primary literature may have led to the next issue about the LOF gene list.

Thank you, this is an excellent point. These papers have been an inspiration and provide important background for this work. The revised manuscript now includes 32 citations for primary literature studies of adaptive LoF in Arabidopsis.

In revised manuscript:

“Indeed, a number of individual genes exhibiting evidence of locally adaptive loss-of-function have been documented in Arabidopsis (Grant et al., 1998; Johanson et al., 2000; Kliebenstein et al., 2001; Kroymann et al., 2003; Mouchel et al., 2004)(Aukerman et al., 1997; Hauser et al., 2001; Mauricio et al., 2003; Alonso-Blanco et al., 2005; Werner et al., 2005; Barboza et al., 2013; Xiang et al., 2014).”

Additional citations are found in the new table (Supplementary file 3D) which contains a survey of previously identified LoF mutants in Arabidopsis.

A cursory analysis of the list of genes in the supplementary information found that the list is missing a number of genes with published loss of function events, I.e. BRX, AOP2, MAM, etc. This indicates that there is a significant false negative issue within the compilation of genes. The authors need to go through the literature to identify a collection of genes with known loss-of-function events and then assess how many they did or did not find.

Thank you, this is a great suggestion. The revised manuscript includes a survey of previously studied LoF mutants (Supplementary file 3D).

The revised manuscript also discusses the issue of false negatives more directly.

In revised manuscript:

“It should be noted that the 2088 genes tested for associations to flowering time and drought timing are not a complete representation of LoF alleles in Arabidopsis. […] Thus, while the methods used here are designed to minimize false positives (alleles classified as LoF, but which are actually functional), the likely occurrence of false negatives (undetected LoF alleles) in available data motivates the need for more sophisticated species wide genome sequencing efforts including a greater diversity of de-novo quality genomes for comprehensive detection of functionally relevant genetic variation across the species.”

This is essential to let future researchers know how complete the list of genes is or is not. Is it possible that this is biased by use of the Col-0 genome as the reference and potentially not looking for GOF alleles in the other accessions which would be LOF if you shift the reference genome?

We agree. See quote above for points about the potential sources of false negatives (including LOF in Col-0) and the need to develop multiple high quality reference genomes to better study functional genetic variation at genomic scales.

In revised manuscript:

“Because the reference genome and gene models are from an early flowering Arabidopsis line, Col-0, this is consistent with the hypothesis that LoF alleles are particularly important in the evolution of phenotypic divergence (Rausher 2008). This result also highlights the need to develop functional genomics resources informed by multiple de-novo quality reference genomes.”

In revised manuscript, Abstract

“These results also motivate improved species-wide sequencing efforts to better identify loss-of-function variants”

Equally, it seems like the authors should discuss settings where the LOF are not multiple independent events as is the case for RPM1 and RPS2. The general text has a feeling that all LOF are multiple independent events which may come from the soft sweep citation but that is not the exclusive view for plant natural variation.

We agree that it is important to also consider cases with single LoF allele. See excerpt from the revised manuscript reflecting on this idea below.

In revised manuscript:

“In cases where adaptation proceeds through the fixation of a single adaptive allele, traditional genome scanning approaches may be sufficient to detect causal loci. However, when genetic variation consists of multiple independent alleles, as is often the case for the genes examined here (Figure 2—figure supplement 2), classifying alleles functionally before testing for associations is likely necessary.”

Flowering time analysis seems to have only been conducted in one environment. The authors should discuss the fact that the environment has a key role in determining flowering time and how doing a broader range of environments with the mutants may influence the results.

Agreed. See text below:

In revised manuscript:

“Furthermore, measuring flowering time in other environments, such alternate light regimes, may yield a different set of candidate genes using similar approaches.”

For Figure 3F, is a linear correlation the best fit to the data? It looks like a non-linear correlation would be a better fit. The authors should do a model comparison of linear and non-linear regressions to see which best fits the data as a non-linear fit could alter the interpretation as that would suggest a maximal effect.

We agree that the scatterplot appears to have a non-linear trend. However, adding a non-linear predictor did not improve the model fit. We have added this result to the revised manuscript.

In revised manuscript:

“We found that flowering time is strongly predicted by the accumulation of LoF alleles across the 214 candidate genes associated to spring drought and/or later flowering time (Figure 3A-E), estimating a 1-day increase for every 3 additional LoF alleles across these candidate genes (Figure 3F). This relationship is best represented as a simple linear regression; the addition of a non-linear quadratic predictor variable did not significantly improve the fit of the model (F= 0.7005, P = 0.4028).”

Reviewer #2:

This comprehensive study integrates across diverse approaches to detect drought timing and evaluate the genetic basis of adaptation to drought in the context of loss of function in the model organism, Arabidopsis thaliana. The innovative use of the Vegetative Health Index generated data on the timing of drought for numerous accessions of Arabidopsis. This approach could potentially be applied to other systems. The current study uses previously published genomic data to detect potential candidate genes associated with drought (as measured via the VHI) and flowering time (from a previously published growth chamber experiment). After evaluating statistical associations between drought, loss of function genes, and flowering time, the authors conducted gene knock out studies at several candidate genes showing relationships between loss of function and spring drought to evaluate causal link with flowering time.

I wonder about the adaptive nature of these associations. For example, is delayed flowering adaptive under spring drought and earlier flowering adaptive under summer drought? That is, are loss of function alleles associated with adaptive changes in flowering phenology? In the third paragraph of the Results and Discussion, the authors point to two studies (Kooyers, 2015; and Dittberner, 2018) to support the assertion that these phonological changes are adaptive. Unfortunately, the Dittberner Endnote citation was inadvertently excluded from the references, so that I cannot look at it. Kooyers, 2015, discusses drought avoidance vs. escape as general plant strategies, with escape associated with rapid growth and avoidance associated with other morphological and physiological traits that confer higher water-use efficiency. The typical thought is that plants can escape from drought by flowering early. In the current study, the authors suggest that later flowering genotypes may avoid spring drought. When does germination occur in sites with spring drought? Late flowering genotypes would still experience the spring drought as juveniles, depending on when germination occurred. It does not seem clear that delay flowering enables those plants to escape from the drought, given that early life history stages are very susceptible to drought. It seems problematic to refer to loss of function as generating adaptive shifts in flowering phenology without fitness data (ideally in the field) to test those hypotheses directly.

Thank you for raising these concerns. The revised manuscript includes several new sections inspired by the points brought up here. We agree that it is extremely challenging to demonstrate the adaptive value of genetic variation, but hope that the revised manuscript provides a clearer picture of the hypotheses about drought adaptation that we are aiming to address. Here are some examples:

In revised manuscript:

“Flowering time in Arabidopsis is correlated with other drought tolerance traits such as water use efficiency and can serve as a proxy for alternative drought tolerance strategies, with early flowering genotypes being associated with low water use efficiency (drought escape strategy) and late flowering genotypes with high water use efficiency (dehydration avoidance strategy) (McKay et al., 2003; Lovell et al., 2013; Kenney et al., 2014). […] This hypothesis motivated our investigation to identify alleles associated with drought timing and test the prediction that they contribute to adaptive flowering time evolution.”

In revised manuscript:

“These results further support the classical hypothesis that the relationship between phenology and drought timing is the most important feature of plant drought tolerance (Passioura, 1996), indicating the evolution of “drought escape” through earlier flowering in summer drought environments, and “dehydration avoidance” by later flowering genotypes in spring drought environments. Because most Arabidopsis populations appear to exhibit a winter annual life habit, germinating in the fall and overwintering as a rosette (Ratcliffe, 1961; Thompson 1994; Burghardtet al., 2015), late flowering genotypes in spring drought environments are expected to still encounter drought conditions. […] This pattern is also consistent with hypotheses explaining the more water conservative water use and stomatal traits observed in late flowering genotypes (Kooyers, 2015)(McKay et al., 2003; Lovell et al., 2013; Kenney et al., 2014) and those from spring drought environments (Dittberner et al., 2018).”

That said, I appreciate that the Dn/Ds and Pn/Ps analyses point to positive selection for loss of function alleles in genes associated with drought or flowering time.

Figure 1A: What data are used to determine the regions of drought stress (in graded brown at the bottom of the top two panels)? The Materials and methods (subsection “Satellite-Detected Drought Histories of Arabidopsis”, first paragraph) set 40 as the threshold for drought (values of HVI <40 are indicative of drought). How was that value determined? How does it relate to drought stress as perceived or experienced by Arabidopsis in nature? The authors use the HVI to determine the timing of drought for Arabidopsis. Have they ground-truthed these drought metrics in any of the field sites? How reliable is the HVI for characterizing exposure of Arabidopsis to drought in its native range?

Great questions. The definition of drought as VHI below 40 was determined by models used by the developers of the VHI at NOAA. We have made several changes in the revised manuscript that we hope will provide some clarification.

In revised manuscript:

“(B) drought frequency (VHI<40, NOAA drought classification) by week (line) and season (bars).”

In revised manuscript:

“One such measurement, the Vegetative Health Index (VHI) has been used for decades to monitor drought, including in many places across the natural range of Arabidopsis (Kogan, 1997).”

Figure 1C focuses on spring vs. summer droughts. Have winter droughts (present in panel 1B, right side) affected the timing of flowering of Arabidopsis in any of these populations? It seems like winter drought could affect flowering time for both fall germinating and spring germinating ecotypes.

This is a good point. The revised manuscript has an additional analysis (multiple linear regression) to address this. We found that only spring and summer drought frequencies are significant predictors of flowering time. (Supplementary file 2A).

What other factors could drive population divergence between populations with spring vs. summer drought? The manuscript seems to assume that drought is the only factor affecting those differences. For example, the Materials and methods state "Summer drought genes were identified as those in which LoF alleles are found in ecotypes that experience a significantly (βdrought timing <0 & Pdrought timing <0.05) more negative drought-timing index [..] Conversely, spring drought genes were identified as those in which LoF alleles are found in ecotypes that experience a significantly (βdrought timing > 0 & Pdrought timing <0.05) more positive drought-timing index" Are there other environmental factors that covary with drought that could also influence evolution at these loci? How can the authors be sure that these are really "summer drought" vs. "spring drought" genes? Are these genes consistent with mapped regions for drought tolerance in Arabidopsis?

Another good point. We created an additional multiple linear regression approach to address this. Specifically, we tested whether flowering time allele associations were predicted by drought timing allele associations while controlling for the associations between alleles and both latitude and minimum temperature, two variables that could also drive flowering time evolution. Nevertheless, we recognize that other factors could (likely) explain some of the variance in the distribution flowering time alleles. See some relevant excerpts from the revised manuscript below to address this point and Supplementary file 2B:

In revised manuscript:

“These results provide new insight into the ecology and genetics of Arabidopsis life history evolution, but the complex ecological reality of these processes is undoubtedly beyond the scope of this study. […] However, other unknown climatic variables or environmental interactions and non-linearities likely contribute to the flowering time adaptation as well.”

Additional points:

I recommend deleting the first part of the sentence ("Plants have been adapting to drought for millennia.…"). For one, plants have been adapting to drought ever since they colonized land from the mid-Ordovician to the Devonian, over 400 million years ago. Secondly, this phrase does not provide information that advances the narrative.

Thank you for the feedback. We have removed this line and the paragraph now begins with:

In revised manuscript:

“Drought stress can occur throughout the year and drought timing is forecast to change over the next century (Trenberth et al., 2014). While dramatic evolutionary responses to drought events have been documented, (e.g. Franks et al., 2007), little is known about the relationship between drought timing and adaptation.”

Introduction, second paragraph: Please provide citations for the statement that most research has focused on late-season droughts. This statement does not resonate with my experience conducting studies and reviewing manuscripts. When possible to manipulate in the field, researchers impose drought in an ecologically-relevant fashion. In the lab, researchers generally time drought treatments for a developmentally-relevant stage.

We have removed this statement.

Figure 1A It might be useful to label the two locations with the names of the Arabidopsis accessions or provide the geographic region, in addition to the latitudes and longitude.

Fixed.

Both panels of Figure 1A (especially the panel on the right) seem to imply that drought stress is occurring less frequently through time. The darker lines indicative of more recent years seem to occur in regions of higher VHI. Is that correct?

We analyzed the data and found no significant increase in the frequency of drought at these two locations. However, this is interesting observation that might warrant a thorough investigation. Indeed, this data presents lots of opportunities for future work.

In the subsection “Experimental Testing of Predicted Phenotypes in Gene Knockout Lines”, it states that flowering time was assessed as days from planting to the emergence of the first flower. Is there variation in germination timing? Why not measure flowering time as days from germination to the first flower?

This is a good point. As we now report in the Materials and methods section, we chose to measure flowering time based on the definition used by the 1,001 Genomes Consortium.

Figure 3F: Is this relationship linear or might a curvilinear model fit better?

We agree that the scatterplot appears to have a non-linear trend. However, adding a non-linear predictor did not improve the model fit. We have added this result to the revised manuscript.

In revised manuscript:

“We found that flowering time is strongly predicted by the accumulation of LoF alleles across the 214 candidate genes associated to spring drought and/or later flowering time (Figure 3A-E), estimating a 1-day increase for every 3 additional LoF alleles across these candidate genes (Figure 3F). This relationship is best represented as a simple linear regression; the addition of a non-linear quadratic predictor variable did not significantly improve the fit of the model (F= 0.7005, P = 0.4028).”

Reviewer #3:

Monroe and colleagues describe the link of loss-of-function Alleles with drought adaptation and flowering time in A. thaliana. The manuscript is well written and interesting conclusions are reported. Especially the high overlap of associations for summer drought and early flowering and spring drought and late flowering is intriguing. Additional the functional follow-up in T-DNA knock out lines is excellent.

Still, I have one major comment.

My major concern is the statistical framework used for GWAS.

The authors used logistic regression in a glm and added the first 3 principle components to correct for population structure. This differs from the standard GWAS procedure in A. thaliana which uses a linear mixed model to correct for population structure confounding. The rational why the authors used this model is not well described in the manuscript. Additionally, the results differ markedly from the analysis with a classical LMM. (I run a normal LMM with the provided data for comparison, happy to provide this if needed) Next, the qq_plot is also highly inflated (which might be expected collapsing LoF Alleles to one score per gene), but is not if a normal LMM is used.

To summarize, I am not completely sure what to make out of this, especially as the results and conclusion look really nice with the presented method (e.g. Figure 2 is really impressive).

Still, it would be good if the authors at least comment on why to use the proposed framework and the inflation observed in a qq_plot.

Thank you for sharing your thoughts. To the revised manuscript, we have added a section to the Materials and methods that we hope will clarify the rationale behind our method. We have also added additional analyses relevant for interpreting the qqplot results. Specifically, we permuted the genotype matrix and repeated the association analyses, to see if allele assignments explained the inflated results. Here are a few excerpts from the revised manuscript to inspired by the concerns you raised.

In revised manuscript:

“This association study differs from traditional GWAS in several respects. […] For these reasons, we implemented analyses based on (Price et al., 2006) to balance false positives and false negatives.”

In revised manuscript:

“The preceding analyses revealed considerable overlap between genes associated with both drought timing and flowering time. […] Quantile-quantile plots of P values were visualized using qqPlot in the GWASTools package in R (Gogarten et al., 2012) (Figure 2—figure supplement 1A-D)”

In revised manuscript:

“To control for the possibility that allele frequencies or the relationship between drought timing and flowering time explained these observations, we also tested whether allele associations were correlated when generated from association analyses using a matrix of randomly permuted genotypes with the same allele frequencies (Figure 2—figure supplement 1F, r2 = 0.01).”

In revised manuscript:

“After filtering to reduce the likelihood of false positives (see Materials and methods), we thus tested 2088 genes for LoF allele associations with drought timing (Figure 2A) and flowering time (Figure 2B). […] In contrast, when we performed these analyses on a permuted LoF genotype matrix, we found no genes that were significantly associated with drought timing or flowering time (Figure 2—figure supplement 1B, D).”

In revised manuscript:

“The strengths of the associations between LoF alleles and drought timing (P values) was also strongly correlated with the strengths of the associations to flowering time (r2 = 0.48 Figure 2—figure supplement 1E, Figure 2C, D). […] In contrast, these associations were weakly correlated when genotypes were permuted (r2 = 0.01, Figure 2—figure supplement 1F), indicating that the result is not simply explained as an artifact of allele frequencies or by the relationship between drought timing and flowering time.”

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Data Citations

    1. The 1001 Genomes Consortium. 2016. GMI-MPI Arabidopsis thaliana genomes. 1001 Genomes Data Center. GMI-MPI
    2. Kogan F. 1995. Vegetative Health Index. National Oceanic and Atmospheric Administration. VHI

    Supplementary Materials

    Source data 1. Raw flowering time measurements for of wild-type genomic background and T-DNA knockout lines.
    elife-41038-data1.csv (25.2KB, csv)
    DOI: 10.7554/eLife.41038.010
    Supplementary file 1. Arabidopsis ecotypes examined.

    Includes ecotype identifiers as well as latitude and longitude of origin, seasonal drought frequencies (winter, spring, summer, fall), drought timing index (drought_timing), flowering time (FT10), and minimum temperature (BIO6).

    elife-41038-supp1.xlsx (158.2KB, xlsx)
    DOI: 10.7554/eLife.41038.011
    Supplementary file 2. Multiple linear regression model summaries.

    (A) Flowering time predicted by seasonal drought frequencies. Arabidopsis common garden flowering times were predicted by historic drought frequencies (DF) during different seasons at ecotypes’ location of origin using multiple linear regression. (B) The strength of association between LoF alleles and flowering time (-log10 transformed P values) predicted by the strength of LoF alleles with drought timing, latitude, and minimum temperature.

    elife-41038-supp2.xlsx (10.2KB, xlsx)
    DOI: 10.7554/eLife.41038.012
    Supplementary file 3. Genes.

    (A) Matrix of functional allele calls for 2088 genes among 1135 Arabidopsis ecotypes. LoF alleles are those with less than 90% predicted protein product and are classified with a ‘1’. Function alleles are classified with a ‘0’. (B) Associations between functional allele state and drought timing and flowering time for 2088 genes. Includes gene, estimate for logistic regression model testing the association between functional allele state and drought timing (Drought_timing_B) and flowering time (flw_10_B) after accounting for population structure, and the P-value of these estimates before Bonferroni correction for multiple testing (Drought_timing_p and flw_10_p). These values are also reported for LoF associations with latitude (lat_B, lat_p) and minimum temperature (temp_B, temp_p). (C) Selection statistics for 2088 genes. Includes PN/PS (pnps), DN/DS (dnds), frequency, number of LoF alleles, and average frequency per LoF allele. (D). Survey of sample genes with previously identified LoF alleles. (E) LoF alleles identified in previously studied genes (those surveyed in Table D). (F) Flowering time in T-DNA knockout lines. Flowering time (lsmean and standard error) of wild-type genomic background and T-DNA knockout lines of a sample of candidate genes in which LoF alleles are associated with spring drought environments or later flowering time phenotypes in Arabidopsis ecotypes.

    elife-41038-supp3.xlsx (7.5MB, xlsx)
    DOI: 10.7554/eLife.41038.013
    Transparent reporting form
    DOI: 10.7554/eLife.41038.014

    Data Availability Statement

    All data generated or analyzed during this study are included in the manuscript and supporting files.

    The following previously published datasets were used:

    The 1001 Genomes Consortium. 2016. GMI-MPI Arabidopsis thaliana genomes. 1001 Genomes Data Center. GMI-MPI

    Kogan F. 1995. Vegetative Health Index. National Oceanic and Atmospheric Administration. VHI


    Articles from eLife are provided here courtesy of eLife Sciences Publications, Ltd

    RESOURCES