Skip to main content
PLOS Genetics logoLink to PLOS Genetics
. 2021 Jan 25;17(1):e1008748. doi: 10.1371/journal.pgen.1008748

Polygenic adaptation of rosette growth in Arabidopsis thaliana

Benedict Wieters 1, Kim A Steige 1, Fei He 1, Evan M Koch 2,3, Sebastián E Ramos-Onsins 4, Hongya Gu 5, Ya-Long Guo 6, Shamil Sunyaev 2,3, Juliette de Meaux 1,*
Editor: Magnus Nordborg7
PMCID: PMC7861555  PMID: 33493157

Abstract

The rate at which plants grow is a major functional trait in plant ecology. However, little is known about its evolution in natural populations. Here, we investigate evolutionary and environmental factors shaping variation in the growth rate of Arabidopsis thaliana. We used plant diameter as a proxy to monitor plant growth over time in environments that mimicked latitudinal differences in the intensity of natural light radiation, across a set of 278 genotypes sampled within four broad regions, including an outgroup set of genotypes from China. A field experiment conducted under natural conditions confirmed the ecological relevance of the observed variation. All genotypes markedly expanded their rosette diameter when the light supply was decreased, demonstrating that environmental plasticity is a predominant source of variation to adapt plant size to prevailing light conditions. Yet, we detected significant levels of genetic variation both in growth rate and growth plasticity. Genome-wide association studies revealed that only 2 single nucleotide polymorphisms associate with genetic variation for growth above Bonferroni confidence levels. However, marginally associated variants were significantly enriched among genes with an annotated role in growth and stress reactions. Polygenic scores computed from marginally associated variants confirmed the polygenic basis of growth variation. For both light regimes, phenotypic divergence between the most distantly related population (China) and the various regions in Europe is smaller than the variation observed within Europe, indicating that the evolution of growth rate is likely to be constrained by stabilizing selection. We observed that Spanish genotypes, however, reach a significantly larger size than Northern European genotypes. Tests of adaptive divergence and analysis of the individual burden of deleterious mutations reveal that adaptive processes have played a more important role in shaping regional differences in rosette growth than maladaptive evolution.

Author summary

The rate at which plants grow is a major functional trait in plant ecology. However, little is known about its genetic variation in natural populations. Here, we investigate genetic and environmental factors shaping variation in the growth rate of Arabidopsis thaliana and ask whether genetic variation in plant growth contributes to adaptation to local environmental conditions. We grew plants under two light regimes that mimic latitudinal differences in the intensity of natural light radiation, and measured plant diameter as it grew over time. When the light supply was decreased, plant diameter grew more slowly but reached a markedly larger final size, confirming that plants can adjust their growth to prevailing light conditions. Yet, we also detected significant levels of genetic variation both in growth rate and in how the growth dynamics is adjusted to the light conditions. We show that this variation is encoded by many loci of small effect that are hard to locate in the genome but overall significantly enriched among genes associated with growth and stress reactions. We further observe that Spanish genotypes tended to reach, on average, a significantly larger rosette size than Northern European genotypes. Tests of adaptive divergence indicate that these differences may reflect adaptation to local environmental conditions.

Introduction

Growth rate is a crucial component of individual fitness, as it reflects the capacity of the organism to acquire resources and conditions reproductive output [1,2]. In experimental evolutionary studies, relative growth rate provides a measure of microbial adaptation in response to selection [3]. In plants, however, little is known about the evolutionary processes that influence variation in plant growth rate, despite its cornerstone importance in plant ecology [46].

Four processes may explain variation in growth rate: random evolution due to drift, plasticity, adaptation or maladaptation. Plasticity describes the immediate adjustment of plant growth rate in response to environmental modifications [7]. Such change may occur as a passive consequence of resource limitations. Plant growth, for example, is slower in drought conditions or at lower temperatures [8,9]. Plastic adjustments of plant growth, however, can also actively contribute to maintaining fitness under challenging conditions. For example, shade avoidance allows plants to outgrow neighbors competing for light [10]. Such reactions may allow the organism to maintain high fitness when the environment becomes challenging, without having to evolve genetically [11].

As the distribution range of a species expands, plastic modifications may become insufficient to adjust fitness, and genetic variation may be required for local adaptation [7,12]. There is clear evidence that genetic variation in plastic life history traits such as flowering time or seed dormancy contributes to the evolution of life-history decisions that are tailored to the local optimal growth season [1316]. Surprisingly, the extent to which genetic variation in plant growth rate itself contributes to local adaptation is not known. Answering this question requires that the effect of natural selection on phenotypic divergence be disentangled from the effect of drift [17].

Genetic variation in growth rate may also arise in the absence of a compelling environmental change, as a consequence of population genetics processes. In bottlenecked populations, or in the aftermath of rapid range expansion, increased drift hampers the efficient removal of deleterious mutations, and individuals may become less fit [1822]. Because plant growth is a component of fitness, genotypes carrying a larger burden of deleterious mutations may show decreased growth. Genetic variation in growth rates may thus also reflect maladaptation resulting from decreased population size.

The annual species A. thaliana has become a model system for both molecular and evolutionary biology, and it is well suited for determining the ecological and evolutionary significance of plant growth rates [23,24]. A. thaliana individuals can adjust their growth rate plastically to maintain their fitness. Plant rosettes grow to a larger diameter when light becomes limited [10,25]. Ample genetic variation in plant growth rates has also been documented in this species [2628]. In addition, there is evidence that the resources allocated to growth are not identical throughout the species’ range, because trade-offs between growth rate and development change with latitude (reviewed in [12,29,30]). Furthermore, traits related to how resources are allocated to growth, such as growth inhibition upon the activation of plant defense, or plant dwarfism, have been associated with adaptation [3135]. In summary, adaptive variation in the rate of plant growth may have evolved in A. thaliana. At the same time, the maladaptive or neutral evolution of a decreased growth rate cannot be excluded a priori. Indeed, A. thaliana has experienced recent severe bottlenecks in parts of its range, such as in Northern European or Chinese populations, which locally increased the rate of genetic drift and led to an accumulation of deleterious genetic variants [3638]. Neutral evolutionary forces could therefore also have modified growth rate in these populations.

To determine the roles of deleterious variation, adaptive evolution and/or plasticity in the genetic variation among plant growth rates, we analyzed variation among rosette growth rates across genotypes sampled from four broad regions (China, Spain, Northern and Western Europe). To assess the relative roles of genetic and plastic variation, we grew plants under two light regimes that mimicked constitutive latitudinal differences in natural light intensity and characterized genetic variation in growth plasticity. This analysis reveals significant regional differences in growth dynamics, most of which have a polygenic basis. Population genetics analyses indicate that local selective pressures have helped shape this variation.

Materials and methods

Phenotypic analysis and estimation of growth rate parameters

We chose 278 genotypes of Arabidopsis thaliana originating from 220 locations distributed throughout 4 regions for phenotypic analyses of growth rate variation (Northern Europe, Western Europe, Spain and Central-Eastern China, S1 Table and S1 Fig). A PCA confirmed that genotypes within these regions formed distinct phylogeographic clusters (S2 Fig), whose specific evolutionary history has been previously documented [13,3739].

Seeds were stratified for 3 days at 4°C in the dark on wet paper, and six replicate seedlings per genotype were replanted, each in one 6x6 cm round pots containing soil (“Classic” from Einheitserde) mixed with perlite. Growth was measured in a split-plot design, under two light regimes, high light (HL) and low light (LL) in the same chamber but in successive independent trials. Plants were grown in a temperature-controlled walk-in growth chamber (Dixell, Germany) set at 20°C day and 18°C night, and watered once a week. For each light regime, pots were randomized within three blocks of 8 trays with 7x5 pots, with one replicate of each genotype in each block. Trays were randomized and the rows in the trays were rotated every two to three days to account for variability within the chamber. The plants were exposed to light for 12 h with LEDs (LED Modul III DR-B-W-FR lights by dhlicht) set to 100% intensity of blue (440nm), red (660nm) and white (HL conditions) or 30% of red and blue plus 100% of white light (LL conditions), followed by a 10 min far-red light pulse to simulate sunset (40% intensity at 735nm). The total measured light intensity was 224 +/- 10 μmol/m2s in HL and 95 +/- 7 in LL. These two light regime mimick latitudinal differences in natural light intensity (S3 Fig).

Individual plants were photographed approximately bi-weekly with a Canon EOS 5D Mark III digital camera until days 46 (8 weeks) and 89 (13 weeks), for those grown under the HL and LL regimes, respectively (image data is available on dryad, doi:10.5061/dryad.s1rn8pk5m) [40]. We only measured diameter for one time point per week, but included additional measurements if it was necessary to fit the logistic curves. Flowering time was measured as days to first flower opening. For genotypes without a flowering individual by the end of the experiment, a flowering time value of 59 or 90 days (last date that flowering was scored) was assigned to HL and LL plants, respectively. Since only 37% of the plants in the experiment flowered, we also used flowering time data from the 1001 Genomes project, according to which flowering was scored at 10 and 16°C for 177 genotypes [39]. A measure of the diameter of each plant (defined as the longest distance between two leaves) was extracted at least once a week with ImageJ (v.1.50b, [41]). In a preliminary experiment conducted on a subset of 17 genotypes, we used Rosette Tracker, an ImageJ tool [42], to show that diameter correlated positively with rosette area under both light regimes (HL r = 0.83, p<3.2e-5, LL r = 0.56, p<0.0186). We confirmed that plant diameter accurately predicts rosette area on this larger data set (r = 0.929, p = <2.2e-16 in HL). Rosette diameter was therefore used to determine the increase in rosette area over time.

We conducted two additional experiments to test the ecological relevance of rosette growth variation measured under controlled conditions. First, all genotypes were grown under HL conditions, in 5 replicates. In this experiment, instead of rosette diameter, we measured hypocotyl length after 15 days, to quantify variation in seedling growth. We further weighted 3-week old plants with a precision balance (Sartorius AC 210 P with accuracy of 0.1 mg) to quantify variation in plant biomass. We also followed a similar experimental design to measure the diameter of plants grown outdoor in 2 replicates in the field of the Cologne Institute of Plant Sciences. Sand was used instead of soil in 9 cm diameter pots. Seeds were sown in September 2016, which corresponds to the native season in the area and put outside after a week.

Statistical analysis of genetic variance

All following analyses were conducted using R (version 3.6.3) [43], and function names refer to those in the R package mentioned unless otherwise noted. We provide an Rmarkdown script detailing the statistical analysis of phenotypic variation (S1 File).

The split-plot design of our study allowed us to conduct the analysis in successive steps. First, we extracted three parameters that together provided a comprehensive description of individual rosette growth. For this, rosette diameter measurements over time (our input phenotype) were modeled as a three-parameter logistic growth using the drm function from the drc package in R [44]. The three following growth parameters were extracted: final size (FS, largest estimated rosette diameter in cm), slope (factor of magnification in the linear phase) and t50 (inflection point; time at which growth is maximum and half of FS has been reached, which quantifies the duration of the rosette area growth phase in number of days). We show examples for the estimation of growth rate in S4 Fig For each parameter, a genotypic mean correcting for block, tray and position effect was computed with a generalized linear model with a Gaussian error distribution and the following model: parameter~accession+block+tray/(row+col)+error. Genotypic means in HL and LL were extracted separately, because light treatments had to be performed in separate trials. To quantify the plasticity of growth to the light regime of each genotype, we correlated genotypic means in LL against HL and extracted the residuals. GxE estimates thus quantify the deviation of the response of a given genotype from the mean response of all genotypes for the respective parameter. The estimate increases with the magnitude of growth plasticity induced by a decrease in light intensity (S5 Fig). Broad-sense heritability (H2) was determined for each trait in each environment as previously reported [45]. Briefly, genetic and environmental variances were estimated using the lme function from the nlme package [46], with the block as fixed and genotype as random effect and heritability was determined as the ratio of genetic variance over the total variance. The heritability of GxE was not estimated, because we quantified plasticity on the basis of changes in genotypic means in the two light conditions. For this reason, we also computed trait pseudo-heritability, which is based on the genome-wide association study (GWAS) mixed model (see below) and allowed us to estimate the proportion of the observed phenotypic variance that is explained by genotypic relatedness for all traits [47].

To assess the correlation of phenotypic traits with climatic variables, we investigated solar radiation estimates, temperature, precipitation, humidity and wind speed with 2.5-min grid resolution (WorldClim2 database, [48], accessed on March 20, 2018) and soil water content [49]. Following [45], we estimated the mean over the putative growth season for each genotype in addition to the annual averages.

Because of the strong correlations between climatic variables, we conducted principal component analyses (PCAs) to combine the data. We analyzed annual average radiation separately and combined the other variables into the PCAs: growing season data, variables related to precipitation and to temperature. Raw climatic data and the principle components (PCs) are in S1 Table and the loadings of the PCA are in S11 Table.

Regional differences in mean growth parameters were tested with a multivariate analysis, using the manova function, the matrix of growth parameters (genetic means) or plasticity and the following model: growth~ population*light regime. Significance levels were determined by the Pillai test.

For univariate analysis, we used GLMs to test the effect of population of origin on the genotypic means. A Gaussian distribution was taken for error distribution, and the dispersion parameter was estimated by the glm function. Group means were compared with the glht function (which performs general linear hypothesis testing) and plotted on boxplots using the cld function, both of the multcomp package [50].

Pairwise trait correlations within and across populations were calculated with the cor.test function (Pearson’s product-moment correlation), and p-values were established using the lmekin function in R, which includes a kinship matrix of individuals (see below) and thus corrects for population structure (after [29]). We used the corrplot function from the corrplot package to plot correlations [51]. Plots were modified using inkscape version 0.92.3 (inkscape.org, [52]). Significance levels were adjusted for false-discovery rates with the function p.adjust.

Genome-wide association studies

Genomic data were available for 231 of the 278 genotypes included in the phenotypic analysis, i.e. for 84 genotypes from Northern Europe (NE, predominantly Sweden), 3 from Western Europe (WE), 119 from Spain (SP) and 15 from China (CH) [38,39]⁠. Chinese genotypes were excluded from the GWAS because of their limited number (15 genotypes) and their strong genetic divergence ([38], S2 Fig). In total, the growth parameters of 203 and 201 genotypes grown under HL and LL, respectively, were used for the GWAS. Genome-wide association studies were conducted using the method from [53]. The corresponding GWAS package was downloaded from https://github.com/arthurkorte/GWAS. Single Nucleotide Polymorphisms (SNPs) with minor allele frequency below 5% or with more than 5% missing data were removed from the genotype matrix, resulting in a matrix of 1,448,192 SNPs, produced with vcftools (—012 recode option) (version 0.1.15, [54]). A kinship matrix was computed with the emma.kinship function. For each growth parameter, genotypic means were used as phenotype measurements. We performed GWAS across the European sample of genotypes but also within region (Spain and Northern Europe). For each SNP, the script output delivered p-values, which were Bonferroni corrected for multiple testing, and effect sizes. For each trait, we estimated pseudo-heritability, the proportion of the observed phenotypic variance that is explained by the estimated relatedness (e.g. kinship matrix, [47]). To identify candidate genes underpinning significant GWAS associations, we calculated the linkage disequilibrium (LD) around the SNP of interest and selected all genes that were in a genomic window with LD above 0.5 within a 250 kb window around the SNP. The LD was calculated as the Pearson correlation between the frequencies of allele pairs. Additionally, we downloaded an annotation of loss-of-function (LOF) variants [55] and performed a GWAS association following the procedure described above except that the SNP data set was replaced with the LOF data set, which assigned one of two states (functional or LOF), for each genotype and each of the 2500 genes with known LOF alleles.

Validation of the polygenic signal

In humans, where population structure and environmental variation are correlated, insufficient correction of the genetic associations caused by shared ancestry has been shown to create spurious associations [5658]. Even though environmental variance is much better controlled in common garden experiments including kinship as a covariate, association tests can still be confounded by genetic relatedness [57]. This is of particular concern when many trait/SNP associations are below the Bonferroni significance threshold. The rate of false positive was not excessively inflated by genomic differentiation between regions, because GWAS performed within regions (Northern Europe or Spain) had similar p-value distributions than GWAS performed on the complete phenotypic dataset (S6 and S7 Figs). We nevertheless used two additional approaches to confirm the polygenic basis of traits. First, we examined whether phenotypic variation could be predicted by polygenic scores derived from sub-significant GWAS hits, with p<10−4. To this end and for each trait, we calculated genotypic means, performed GWAS, as described above, and computed polygenic scores following [53]. SNPs with a significant association with the phenotype were pruned to remove SNPs standing in strong linkage disequilibrium with plink version 1.90 [59], following [56]. The plink -clump function was set to select SNPs below a (GWAS) P- value threshold of 0.0001, start clumps around these index SNPs in windows of 1 Mb, and remove all SNPs with P < 0.01 that are in LD with the index SNPs. The SNP with the lowest p-value in a clump was retained for further analysis. Briefly, input files, including allele frequencies for all SNPs, all SNPs with GWAS p-values lower than 10−4 and their effect size estimates were created with a custom R script. We defined each genotype as its own population (genotypes from Spain and Northern Europe were grouped in regions). Scripts were downloaded from https://github.com/jjberg2/PolygenicAdaptationCode. The pipeline was run with default parameters, and polygenic scores (Z-scores) were estimated. We used three approaches to validate the relevance of GWAS associations for predicting the phenotype. First, we used 80% of the genotypes and used the resulting GWAS association to compute a polygenic score for the remaining genotypes. Second, we took two replicates to compute the polygenic scores, and tested whether it predicted the phenotype of the third replicate. Third, we correlated the phenotypic values predicted by polygenic scores (calculated this time on the basis of all three replicates) with the observed phenotypic value. We repeated this replacing the set of SNP associated at p<10−4 with 1000 sets of an equal number of randomly chosen SNPs. We then compared the correlation of polygenic scores to the input phenotype for SNPs associated at subsignificant level (p<10–4) to the correlation expected for random sets of SNPs. Correlations were calculated with the R function cor.test (Pearson’s product-moment correlation).

In addition, we investigated functional enrichment among genes within 10kb of GWAS associated SNPs. To assign a single GWAS p-value for each gene, either we assigned for each trait the lowest p-value of SNPs within the gene, or, if no SNP was within the gene, we assigned the p-value from the physically closest SNP [47]. When there were GWAS hits in the vicinity of duplicated genes, we removed tandem duplicated genes within a 10-gene sliding window. For this, we first aligned all TAIR10 genes against each other by using BLAST (version 2.9.0, available at https://blast.ncbi.nlm.nih.gov). Then the duplicated genes were selected as genes with an e-value <1e-30. Finally, tandem duplicated genes identified with gene distance <10-genes were filtered out to avoid inflated functional enrichments. If the polygenic signal only due to insufficient correction for population structure, we expect that similar functions will be enriched among population structure outliers and among genes with low GWAS p-values. We thus computed Fst-values for each gene with the F_ST.stats function of the PopGenome library [60]tween Spain and Northern Europe. Negative Fst values were set to zero.

Enrichments were tested as previously described [61]. To call GO enrichment significant, we determined the conservative threshold p = 0.008. This threshold was determined as the 0.01% quantile of the p-value distribution when GO enrichments were tested for 1000 random sets of the same number of SNP. To assess similarity between traits in Gene Ontology (GO) enrichments, we calculated graph-based similarity with the GOSemSim package [62]. A distance matrix was estimated with average connectivity between the GO terms. The clustered GO categories were then plotted as a dendrogram with the plot.phylo function from the ape package (version 5.3, [63]). GO categories enriched at p-value below 0.001 were highlighted. The distribution of enriched GO categories was evaluated by visual inspection.

Testing for adaptation or maladaptation

For population genetics analyses, we sampled one genotype at random whenever plants were sampled in the same location, acquiring a total of 220 genotypes. As a proxy for the genomic load imposed by deleterious mutations, the number of derived non-synonymous mutations per haploid genome has been proposed [64]. This approach was not possible here because the genomes of individuals from China and Europe were sequenced in different labs, and the depth and quality of sequencing varied too much to make a fair comparison. Instead, we used two data sets that together catalogued LOF alleles after controlling as much as possible for heterogeneity in sequencing quality: one that included European genotypes [55] and a more recent data set that included Chinese genotypes [65]. As an estimate of the individual burden of deleterious mutations, we counted the number of LOF alleles for each individual and tested whether individuals with a larger number tended to have a lower growth rate using the Spearman rank correlation.

To search for footprints of adaptive evolution, we computed an Fst value between Spain and Northern Europe for each SNP in the GWAS analysis using the R-package hierfstat and the basic.stats function [66]. Negative Fst values were set to zero, and the quantile function was used to calculate the 95th percentile. The Fst distribution of SNPs associated with any GWAS (p<10e-4) was compared to the genome-wide distribution with a Kolmogorov-Smirnov test. We also computed the likelihood that its 95th percentile was greater than the 95th percentile of 10 000 random samples of an equally large set of SNPs. To compare the phenotypic differentiation of traits, Qst values for the phenotypic traits were estimated as previously described [45]. Briefly, Qst was estimated as VarB / (VarW + VarB), where VarW is the genotypic variance within and VarB between regions. These variances were estimated with the lme function of the nlme package [46], with the block as fixed and population/genotype as random effect. We extracted the intercept variance for VarB and the residual variance for VarW. Since replicates were taken from the selfed progeny of each genotype, VarB and VarW are broad-sense genetic variance components. To reveal signatures of local adaptation, the Qst of each trait was compared to the 95th percentile of the Fst distribution (between Spain and Northern Europe) [67,68]. We verified that outlier Qst values were unlikely to arise randomly. For this, we permuted phenotypic data by randomizing genotype labels and verified that the difference between observed Qst and 95th percentile of Fst was significantly greater than for randomized Qst, following [45]. In a second approach, we used a multivariate normal distribution to generate phenotypic divergence based on the kinship matrix to generate an expected Qst distribution [69]. Finally, we applied the over-dispersion test (Qx test), which compares polygenic scores computed for associated versus random SNPs (null model), in a process similar to a Qst/Fst comparison, but assuming that each population is composed of the selfing progeny of one genotype [53]. A Qx significantly larger than the Qx computed for the null model indicates that polygenic trait prediction is more differentiated than expected from the kinship matrix and can be taken as an indication that the trait has evolved under divergent selection, either within or between regions [53].

Results & discussion

Ecological relevance of rosette growth variation

On the basis of the more than 15,000 rosette images we collected, we used rosette diameter as a proxy to describe rosette growth variation with three parameters; each refers to the ways in which growth can differ among genotypes: i) the time until the exponential growth phase is reached (t50), ii) the speed of growth during the linear growth phase (slope) and iii) the final size (FS) at which rosette diameter plateaus at the end of the rosette growth phase (Fig 1 and S2 and S3 Tables). Of the parameters, FS displayed the highest broad-sense heritability, in plants grown under both regimes: high light (HL, H2 = 0.636) and low light (LL, H2 = 0.794, S4 Table). Trait variation measured in controlled settings sometimes fails to reflect variation expressed in natural conditions [70,71]. This is not the case for rosette growth variation in A. thaliana. FS in HL conditions correlated positively with plant biomass (r = 0.267, p-value = 4.6e-5) and seedling growth (r = 0.372, p-value = 9.1e-7) in the growth chamber. FS measured in HL also correlated positively with plant diameter measured under natural light in the field (r = 0.263, p-value = 0.0009). This indicates that a significant part of the variation we report is ecologically relevant.

Fig 1. Regional growth rate estimates in HL and LL.

Fig 1

Predicted growth curves averaged over region (from drm function). The growth curves were estimated from diameter measurements at different time points. Diameter measurements for HL are from day 11 to 46 and for LL from day 24 to 89. An illustration of the parameters that are estimated from these growth curves are included in the plot (Final Size is a diameter, t50 a time point and Slope the fold increase in the linear phase). HL (dashed line), LL (solid line), China (orange), Northern Europe (green), Spain (purple) and Western Europe (red).

Environmental plasticity has the strongest impact on plant growth variation

Light regimes revealed that plasticity has the strongest impact on rosette growth (Fig 1, MANOVA HL vs LL: F = 2275.37, df = 1, p-value = <2.2e-16, Table 1). In plants grown under LL, the maximum growth rate was delayed and rosette growth plateaued at a larger size (Table 1 and Fig 2). This observation was in agreement with the reduced relative growth rate reported in many plant species when light supply decreases, whereas the larger FS reflected the expected shade avoidance reaction [72]. We observed that plants reached a larger diameter (and rosette area) by elongating their petiole and minimizing leaf blade overlap in LL, a reaction known as the shade avoidance response. This strong modification of leaf shape may explain the predominant impact of environmental variation we report here (Table 1). Nevertheless, we detect significant levels of genetic variation in growth plasticity to light (F = 2.0, df = 270, p<2.2e-16). We quantified growth plasticity as the individual deviation of the genotypic mean of each genotype in HL and LL from the average reaction of the population to the change in light regime (S8 Fig).

Table 1. Multi- and uni-variate analyses of growth variation in response to light regime, genotype and their interaction.

The multivariate analysis was conducted on the estimates of FS, t50 and slope for all 270 genotypes in three replicates and accounting for block effects nested within light treatment.

Multivariate analysis (MANOVA) Final Size t50 Slope
Response df F p-value F p-value F p-value F p-value
Block 4 27.5 < 2.2E-16 30.15 < 2.2E-16 12.24 9.8e-10 30.50 < 2.2E-16
Light regime 1 7388.7 < 2.2E-16 15687.23 < 2.2E-16 9289.44 < 2.2E-16 34.01 7.2e-9
Genotype 279 3.6 < 2.2E-16 8.43 < 2.2E-16 2.51 < 2.2E-16 2.036 < 2.2E-16
Light*Genotype 270 2.0 < 2.2E-16 2.97 < 2.2E-16 1.59 2.3e-07 1.61 4.4e-08

Fig 2. Significant regional differentiation of Final Size and t50 in HL and LL.

Fig 2

A.thaliana genotypes are grouped based on geographical origin. Box plots show regional variation in Final Size (upper row) and t50 (lower row) for HL (left) and LL (right). Groups that do not share a letter are significantly different according to Tukey’s HSD (p-value < 0.05). Region information: China (CH, n = 20), Northern Europe (NE, n = 58), Spain (SP, n = 119) & Western Europe (WE, n = 29).

Spanish genotypes show the most vigorous rosette growth

We found evidence for rosette growth variation across regions (MANOVA in Table 1 and Figs 1 and 2). Within Europe, Spanish genotypes reached the largest FS in both HL and LL plants (Tables 1 and S5, MANOVA: F = 16.37, df = 3, p-value = 5.35e-10). Although the growth slope did not differ significantly across regions, we observed that, under HL conditions, Spanish genotypes reached 50% of their FS (= t50) significantly later than the genotypes originating from Northern Europe (t50 = 15.17 vs 13.79, respectively, GLHT z = 3.061, p-value = 0.011, Fig 2C). This effect was also observed for plants grown under LL conditions but we detected no regional difference in GxE (Fig 2D). Since Spain and Northern Europe do not differ in their average flowering time (S9 and S10 Figs), the larger rosette size observed in Spain is not due to an extension of the duration of vegetative growth in this population.

Chinese genotypes show that growth rate variation is constrained in evolution

Despite a long history of population isolation that was magnified by a strong bottleneck after the last glacial period [38,73], the growth rate of Chinese genotypes was comparable to that shown by most European genotypes (S5 Table and Fig 2A and 2B). Under LL conditions, Chinese genotypes showed lower t50 and FS values only when compared to Spanish genotypes (S5 Table and Fig 2A–2D). Under HL, genotypes from China did not differ significantly from those from any other region (Figs 2 and S11). The analysis of Chinese genotypes indicates that the phenotypic evolution of rosette growth does not scale with the extent of genetic divergence (Fst between Europe and China is 0.057 on average, with a standard deviation of 0.147, and much greater than Fst between Spain and Northern Europe, KS test, D = 0.39, p<2.2e-16). A parsimonious explanation to the fact that growth rate has not significantly changed despite extensive population divergence, is that the evolution of growth rate is likely to be constrained by stabilizing selection around a growth optimum [1].

The Chinese population was also the only one to show a difference in GxE (S8 Fig). Compared to Spanish genotypes, Chinese genotypes displayed a GxE that was lower for t50 and higher for slope (t50: GLHT z = 2.748, p-value = 0.028; slope: GLHT z = -3.224, p-value = 0.006; S8 Fig). When grown under the LL regime, these genotypes displayed a lower FS than genotypes from Spain. In contrast, within Europe, we observed no significant difference in the growth plasticity of plants in relation to light regime, despite the fact that Northern populations are exposed to lower average light intensity (S3 Fig).

GWAS reveal only two SNPs significantly associated with rosette growth variation

We used GWAS to determine the genetic basis of variation in growth rate within Europe (Figs 3 and S12S17). The sample size (15) and strong population structure of Chinese genotypes precluded their inclusion in this analysis (S2 Fig). Henceforth, we focused on the analysis of genetic variation within and among European populations. Overall, we found few significant genetic associations, indicating that genetic variance for growth rate is generally polygenic. One SNP (chromosome 1, position 24783843) associated with t50 variations in LL plants (effect size = -2.475, p-value = 2.6E-9, Fig 3 and S6 Table). A second SNP (chromosome 3, position 951043) was significantly associated with the slope of rosette diameter growth in HL plants within Spain (effect size = 1.229, p-value = 8.4E-7, Fig 3 and S6 Table) and was polymorphic only in the Spanish set of genotypes. This SNP was within a 1Mb DNA fragment showing strong local LD and enclosing 21 genes. Two additional SNPs were associated with GxE for FS and t50 in HL plants in Northern Europe, respectively, with p-values just below the Bonferroni threshold (S6 Table). Yet, we found no SNP significantly associated with FS above the Bonferroni threshold, although FS is the most heritable trait (S4 Table). Diverse genetic setups can result in such polygenic architecture: large effect size variants that are too rare to be detected, many variants with effect sizes too small to be individually significant, or the presence of multiple alleles at causal loci that will blur the genetic association signal [47,74]. Local genetic variation in slope and t50, growth parameters which display moderate but significant genetic variance, appear to be controlled by low-frequency variants of comparatively larger effect, since some of them were associated above Bonferroni threshold (S6 Table). This genetic architecture resembles that reported in the same species for flowering time [75,76]. In contrast to slope and t50, variation in FS appeared more polygenic since it has the highest heritability and no SNP association above Bonferroni confidence levels.

Fig 3. GWAS-results for 4 phenotypes.

Fig 3

Manhattan plots of GWAS of t50 in LL with all European genotypes (a), with a peak on Chromosome 1, GxE of Final Size with all European genotypes (b) with a peak on Chromosome 2, Slope in HL within Spain (c) with a peak on Chromsome 3 and t50 in HL in Northern Europe (d) with a peak on Chromosome 1. The dotted line shows the corresponding Bonferroni threshold adjusted for a p-value of 0.05.

Polygenic scores and functional enrichments confirm the polygenic basis of growth variation

Traits with polygenic architecture are controlled by variation in many loci of low frequency and/or low effect sizes and dissecting their evolution is arguably a major challenge today in evolutionary biology [7780]. Specifically, random SNPs with outlier frequency are not always sufficiently corrected for with the kinship matrix and these may give rise to spurious associations. Studies of polygenic traits such as human height have shown that residual effects of population structure can give signals of genetic association [56,81]. Similar effects were also encountered in studies of phenotypic variation in plant systems [57]. They are expected whenever environmental variance co-varies with population structure, as is likely the case in human studies, but can also persist in common garden studies if populations are geographically differentiated in the genetic component of the trait. To confirm the polygenic basis of growth variation, we evaluated the biological relevance of marginally significant genetic associations. The associated sets were composed of 22 to 37 unlinked SNPs. We used their effect sizes to compute polygenic scores for each parameter [53]. We first used to use 80% of the data to identify SNPs associating with rosette growth and test whether they can be used to correctly predict the phenotype of the remaining 20% of the data. This approach, however, did not yield significant predictions (rho = 0.07979094, p = 0.6189), which is not surprising because it usually does not perform well in structured populations [82]. We took a second approach to measure polygenic score accuracy. We used two of the three replicates to compute polygenic scores and tested whether they correlated significantly with the phenotype measured independently in the third replicate (S7A Table). The correlation was highest for FS measured in LL plants (Rho = 0.567, p = <2.2e-16). In fact, FS, the most heritable trait, could be predicted with the highest accuracy in plants grown under both light regimes (S7A Table). When we used random sets of SNPs as input, the computed polygenic scores were significantly correlated with the observed phenotype, indicating that population structure contributes to a significant but small fraction of the variance in polygenic scores. Nevertheless, with this third approach, we showed that polygenic scores computed on the effect sizes of SNPs associated at sub-significant level were markedly more correlated with the observed phenotype than those computed with random SNP sets (S7B Table). This confirms that sub-significant genetic associations, despite their marginal significance, effectively recapitulate some of the traits’ heritability.

We further asked whether sub-significant associations could collectively reveal the specific molecular basis of each trait. We selected SNPs showing a sub-significant association (p<0.0001) and investigated functional enrichment among genes that mapped within 10kb of the SNP. To consolidate our confidence in the functional enrichment, we also pruned tandem duplicates from the annotated set, and determined a p-value threshold that was below the level of significance that can be obtained with GWAS on a permuted data set (see Materials and Methods). While the results reveal many categories without an easily interpretable link to growth, many traits showed functional enrichment within gene ontology (GO) categories, whose link to growth has been documented (S9 Table). For example, genes associated with variation in FS, the most polygenic trait, were enriched among genes involved in the growth-related functions “cotyledon development”, “auxin polar transport” and “response to mechanical stimulus”(p-value = 0.0053 or lower). Interestingly, mechanical stimuli have been shown to strongly influence seedling growth, and we observed that FS correlated with hypocotyl length and biomass in 3-week-old plants (S18 Fig, [82]). Additionally, several categories related to defense and stress reactions, such as “response to salt stress”, “response to chitin”, “regulation of defense response to fungus” and “negative regulation of defense response”, were enriched. Variation in stress-related functions is known to have an impact on plant growth in A. thaliana [33]. Furthermore, we also found that SNPs associated with FS plasticity to light are enriched among genes involved in the shade avoidance response (p = 0.0023), by which plants exposed to limited light conditions increase stem elongation [10,83]. Associated genomic regions included, for example, PHY RAPIDLY REGULATED 2 (PAR2, AT3G58850), a negative regulator of shade avoidance [84] or LONG HYPOCOTYL UNDER SHADE (BBX21, AT1G75540), a regulator of de-etiolation and shade avoidance [85]. Altogether, functional enrichments among genes located in the vicinity of GWAS hits indicated that a biological signal is detectable among sub-significant genetic associations.

As shown above, population structure impacts the results of GWAS and population structure outliers may drive this signal of association. Indeed, genes with elevated Fst reflecting population structure or even regional adaptation of (other) traits could create spurious associations with traits that have a distinct genetic basis but are also differentiated between regions. We thus verified that functional enrichment among genes with SNP associations were different from those observed among genes with elevated Fst. We determined enriched GO categories among genes in the vicinity of GWAS associated loci (p<10–4) and among genes ranked by Fst between Spain and Northern Europe. We visualized overlaps in functional enrichment by clustering GO terms on the basis of the genes they shared (S19 Fig). The enrichment based on Fst revealed three strongly enriched GO terms: “organ morphogenesis”, “circadian rhythm” and “virus-induced gene silencing” (p = 0.0009 or lower, S10 Table). The enrichment in GO category “circadian rhythm” may reflect the local adaptation to Northern variations in day length [86,87]. Genes close to SNPs associated with the different growth parameters, however, had clearly distinctive patterns of functional enrichment (S19 Fig and S10 Table). We therefore argue that even though population structure outliers may create some false-positive associations, the polygenic pattern of association that we observe at sub-significant level cannot be explained by the history of population divergence alone.

No association between per-individual burden and growth

In areas located at the edge of the distribution range of A. thaliana, populations may have accumulated an excess of deleterious mutations in the aftermath of their genetic isolation [39,88]. This could have resulted in a mutational load that would have decreased fitness components such as plant growth, because it influences the resources available for the production of progeny [1,20,22]. We thus hypothesized that the lower FS observed in Northern Europe may result from maladaptive forces associated with the demographic history of the region.

This hypothesis could not be supported. No significant difference was detected in total number of LOF mutations per genome in Northern Europe compared to Spain (GLHT: z-value = 0.634, p-value = 0.526, S20 Fig). This observation has been previously reported [55]. In addition, we detected no significant correlation between the number of LOF alleles per genome and the average final size in HL or LL plants within Europe (r in HL = 0.079, p = 0.262, r in LL = 0.029, p = 0.684). Furthermore, we observed no significant difference in growth between Northern European and Chinese populations, despite their significantly higher burden of LOF alleles per genome (GLHT China versus Northern Europe: z-value = -20.259, p-value = <1e-4, S21 Fig, [65]). Therefore, we conclude that the individual burden of LOF mutations is unrelated to rosette growth variation.

We reasoned that lower growth rate might also be associated with a small subset of LOF mutations. To test this hypothesis, we investigated genetic associations between LOF alleles and the three growth parameters (see Materials and Methods). This analysis is similar to a GWAS, but utilises information on approximately 2500 genes that have at least one loss-of-function allele in any of the 1001 Genomes lines [39,55]. We detected no association between LOF alleles and FS, yet there was a significant association of LOF variation at gene AT2G17750 with variation in both t50 in LL plants and t50 plasticity (Fig 4, effect size = -3.542, p-value = 7.49e-6, gene-Fst = 0.113, and effect size = -4.470, p-value = 4.99e-6 for t50 and t50 plasticity, respectively). AT2G17750 encodes the NEP-interacting protein (NIP1) active in chloroplasts, which was reported to mediate intra-plastidial trafficking of an RNA polymerase encoded in the nucleus [89]. NIP1controls the transcription of the rrn operon in protoplasts or amyloplasts during seed germination and in chloroplasts during later developmental stages [89]. The LOF variant is present primarily in Northern Europe (MAF = 16 and 0.8% in Northern Europe and Spain, respectively) but is unlikely to be deleterious: it correlates with a decrease of t50, which is a faster entry in the exponential growth phase indicative of increased growth vigor (Fig 4). Taken together, this result does not support the hypothesis that decreased FS in Northern Europe or China is controlled by deleterious variation.

Fig 4. Loss-of-function association and phenotype of t50 in LL/GxE.

Fig 4

Manhattan plot of a GWAS with loss-of-funcion alleles and t50LL (a) and t50GxE (c) as input phenotypes with the same association (AT2G17750) above the Bonferroni threshold (dashed line). Boxplot of the phenotype of t50LL (b) and t50GxE (d) versus the allele state at AT2G17750 (0 means functional, 1 is a loss-of-function). The colors separate the populations into Spain (purple) and Northern Europe (green).

FS variation might reflect local adaptation at the regional scale

During the growth season, Northern European A. thaliana populations are exposed to lower average temperatures (S3 Fig). Smaller rosettes are more compact, and increased compactness is often observed in populations adapted to cold temperatures [32,9092]. Freezing tolerance, which was indeed reported to be higher in Northern Europe, is associated with functions affecting rosette size [93]. We thus hypothesized that the decreased FS and t50 observed for Northern European genotypes grown under both light regimes is the result of polygenic adaptation to lower average temperatures. We used the 14 to 47 LD-pruned set of SNPs associating in GWAS at a sub-significant level (p<1e-4) to compute polygenic scores for each genotype and each trait, and used Qx, a summary statistic that quantifies their variance across locations of origin. A Qx value outside of neutral expectations inferred from the kinship variance in the population, indicates excess differentiation of polygenic scores, as expected if individual populations evolved under divergent selection [53]. We observed that all traits displayed a strongly significant Qx (S8 Table). The differentiation of polygenic scores between the individual populations of origin suggests that divergent selection may be acting locally. Local adaptation has indeed been reported at this scale in this species [94]. This result should however be taken with caution, because, like the GWAS hits it is using, the Qx statistics is sensitive to population structure outliers. Clearly, population structure might underpin more of the GWAS signal detected for slope or t50, which are markedly less heritable than FS.

Interestingly, we observed that FS measured in HL and t50 measured in LL displayed polygenic scores that differed significantly between regions (p-value = 0.0162 and 0.0309, respectively, S22 Fig). We thus further tested whether, at the phenotypic level, regional differentiation in growth rate departed from neutral expectations. We first investigated whether variants associated with phenotypic variation in rosette diameter showed increased genetic differentiation. Compared to the Fst distribution of 10 000 random sets of SNPs, the 95th percentile of 1360 SNPs associating with all three parameters was always higher (p<10−4). Thus, associated SNPs are collectively more likely to be differentiated than the rest of the genome. This pattern is not caused by the confounding effect of population structure, because the functional enrichments are mostly specific to the phenotypes (S19 Fig). We note, however, that a few spurious genetic associations could contribute to both higher Fst and over-dispersion of polygenic scores [5658]. Additional evidence based on approaches independent of GWAS is therefore required to support the adaptive significance of regional differences in growth rate in Europe. To this end, we used the population kinship matrix to parameterize a multivariate normal distribution and predict the amount of additive phenotypic divergence expected if the trait evolves neutrally [69]. We observed that differentiation for FS measured in HL plants was marginally more differentiated than predicted under neutral conditions (Qst = 0.325, p-value = 0.085, Fig 5). The other parameters did not depart from neutrality (Qst ranging from 0.029 to 0.27, min p = 0.11, Fig 5). Since the divergent Chinese population indicates that the unconstrained evolution of growth rate variation is unlikely, this test might be overly conservative. In addition, it predicts the divergence in additive genetic variance, but in the selfing species A. thaliana, the whole genetic variance, i.e. broad sense heritability, can contribute to adaptation. We also compared the distribution of phenotypic variation within and between regions to the SNP Fst-distribution [68]. We used the Fst between Northern Europe and Spain as an estimate for nucleotide differentiation and compared it to the differentiation of these populations at the phenotypic level (Qst) [13,45,67]. For FS and t50, the Qst was significantly greater than genetic differentiation at 95% of single nucleotide (Table 2). This suggests that selective forces have contributed to the regional adaptation of FS in Europe. Other climatic components like temperature could also have strong effects on growth differences between populations. Nevertheless, we detected only weak correlations between growth variation and temperature at the location of origin (S23 Fig and S11 Table), suggesting that growth rate could be locally adapted to the conditions prevailing in each region. The environmental factors contributing to adaptive divergence in plant growth thus remain to be determined in this species.

Fig 5. Expected distribution for quantitative trait differentiation between the Spanish and Northern European population.

Fig 5

Qst. The expectation is based on a multivariate normal distribution assuming a neutral trait with polygenic basis. Vertical lines indicate observed Qst for the individual growth parameters, FS (Solid line), t50 (Dashed line), Slope (Dot line), in HL (orange) and LL (cyan). The red arrows show the 90th, 95th and 99th percentiles of the distribution.

Table 2. FS and t50 quantitative differentiation (Qst) exceed differentiation given by single SNPs.

Trait Qst Percentile of Fst
FSHL 0.379 96.57
FSLL 0.282 95.80
t50HL 0.300 95.95
t50LL 0.189 94.79
SLHL 0.081 93.23
SLLL 0.010 91.62

Qst for each trait measured in HL and LL plants. Linear mixed models were used to quantify the ratio of genetic variation between versus within Spain and Northern Europe (Qst). The 95th percentile of the distribution for single SNP Fst between these two regions was 0.205. Permutations confirmed that this test is conservative (see Materials and Methods). HL: plants grown under high light regime, LL: plants grown under low Light regime, FS: Final Size, t50: time to maximum growth and SL: slope.

Conclusion

Our comprehensive analysis of genetic diversity in rosette growth rate, within and between three broad regions of the distribution area of A. thaliana, reveals the environmental and evolutionary factors that control this complex trait, which is of central importance for plant ecology. We show that plastic reactions to light intensity have the strongest impact on variation in rosette growth rates. Yet, we also provide evidence for significant genetic variation within and between regions. We observed that Spanish genotypes show more vigorous rosette growth and reach the larger size, regardless of light conditions. Although GWAS reveal very few associations that pass Bonferroni correction, analyses of functional enrichments and polygenic scores demonstrate that the polygenic basis of trait variation can also be explored in the presence of moderately significant genetic associations. The greater phenotypic differentiation observed within Europe compared to between Europe and China, a pattern opposite to measures of genetic divergence, provides a strong indication that stabilizing selective forces constrain the evolution of growth rate over time. The analysis of polygenic scores and patterns of differentiation suggests that much of the variation observed within Europe has been shaped by natural selection, rather than by the burden imposed by deleterious mutations. Leveraging polygenic associations in local adaptation studies remains challenging [78]. Methodological developments that improve the use polygenic associations for the study of local adaptation are needed to consolidate these conclusions. Understanding the potential of polygenic trait architectures will help better integrate complex traits in our understanding of the genetic processes underpinning ecological specialization [6,95].

Supporting information

S1 Fig. Genotype origin map.

Each dot represents the sampling point of a genotype. The genotypes where assigned to Northern Europe (green), Western Europe (red), Spain (purple) and China (orange).

(RAR)

S2 Fig. Principal component analysis of 227 genotypes.

The PCA is based on 1.5 millions SNPs with a minor allele frequency larger than 0.05. The first two principle components explain about 16% of the variance between the genotypes. Regions: China (orange), Northern Europe (green), Spain (purple), Western Europe (red).

(TIFF)

S3 Fig. Climatic variation between regions.

A) Annual average of the monthly radiation (left), monthly average temperature (center) and monthly precipitation (right) estimates for the sampling location of each genotype from Worldclim2 data (estimate per ~1km2). Boxplots with different letters are significantly different according to Tukey’s HSD (p-value < 0.05). Region information: China (CH, 20 unique locations), Northern Europe (NE, 46 unique locations), Spain (SP, 120 unique locations) & Western Europe (WE, 15 unique locations). B) Experimental Set-up in the growth chamber with the light-spectrum and intensity in HL (left) and LL (right). The bottom bar represent the timing of the light.

(TIF)

S4 Fig. Projected growth rates and diameter measurements of individual genotypes in HL and LL.

Predicted growth curves averaged per genotype (from drm function). To represent the regions 5 genotypes per region were chosen randomly. The growth curves were estimated from diameter measurements at different time points (points for three input replicates). Diameter measurements for HL are from day 11 to 46 and for LL from day 24 to 89. Legend: Title: Region and ID; HL (dashed line, red points), LL (solid line, blue points), China (orange), Northern Europe (green), Spain (purple) and Western Europe (red).

(TIF)

S5 Fig. GxE for Final Size.

GxE was estimated based on a glm(Final Size ~ genotype * environment) and is indicated by the color. Each dot corresponds to a genotype with its phenotype in HL (x-axis) and LL (y-axis) (269 genotypes in total). The black dot shows the average over all genotypes with standard deviation. The line shows a linear model for Final Size in LL ~ Finals Size HL.

(TIF)

S6 Fig. Regional comparison of the GWAS results within Spain to across Europe.

qq-plots comparing the GWAS data in S13 Fig (Spain, 117 genotypes, x-axis) to the data from S12 Fig (Europe, 201 genotypes, y-axis). The traits are Final Size (upper row), t50 (2nd row) and Slope (lower row) in HL (left column), LL (middle column) and their GxE (right column). The grey dotted line indicates the neutral expectation.

(TIF)

S7 Fig. Regional comparison of the GWAS results within Northern Europe to across Europe.

qq-plots comparing the GWAS data in S14 Fig (Northern Europe, 83 genotypes, x-axis) to the data from S12 Fig (Europe, 201 genotypes, y-axis). The traits are Final Size (upper row), t50 (2nd row) and Slope (lower row) in HL (left column), LL (middle column) and their GxE (right column). The grey dotted line indicates the neutral expectation.

(TIF)

S8 Fig. Regional differences for GxE for each trait.

GxE for FS (left), t50 (center) and Slope (SL, right). The phenotypic values are based on 217 genotypes of Arabidopsis thaliana. Groups that do not share a letter are significantly different according to Tukey’s HSD (p-value < 0.05). Region information: China (CH, n = 14), Northern Europe (NE, n = 58), Spain (SP, n = 117) & Western Europe (WE, n = 28).

(TIF)

S9 Fig. Flowering time from 1001 Genomes.

Flowering time in 16°C conditions of each genotype plotted for Northern Europe (green, n =) and Spain (purple, n =, based on data from 1001Genomes, 2016). The regions showed no phenotypic difference, as indicated by the same letter (pairwise GLHT, p-value> 0.05).

(TIF)

S10 Fig. Flowering time in the experiment.

Flowering time of each genotype in HL (left) and LL conditions(right). Missing values were replaced with 59 (HL) or 90 (LL) days after sowing. Boxplots with different letters are significantly different according to Tukey’s HSD (p-value < 0.05). Population information: China (CH, n = 22), Northern Europe (NE, n = 84), Spain (SP, n = 121) & Western Europe (WE, n = 53).

(TIF)

S11 Fig. Regional differences in Slope.

The phenotypic values are based on 220 genotypes of Arabidopsis thaliana in HL (left) and LL conditions (right). Groups that do not share a letter are significantly different according to Tukey’s HSD (p-value < 0.05). Region information: China (CH, n = 15), Northern Europe (NE, n = 58), Spain (SP, n = 119) & Western Europe (WE, n = 28).

(TIF)

S12 Fig. GWAS results for all phenotypes across Europe.

Manhattan plots using 201 (or more) genotypes from Europe (Spain and Northern Europe) as input. The traits are Final Size (upper row), t50 (2nd row) and Slope (lower row) in HL (left column), LL (middle column) and their GxE (right column). The dotted line denotes the 5% Bonferroni-corrected threshold.

(TIF)

S13 Fig. GWAS results for all phenotypes within Spain.

Manhattan plots using 117 (or more) genotypes from Spain as input. The traits are Final Size (upper row), t50 (2nd row) and Slope (lower row) in HL (left column), LL (middle column) and their GxE (right column). The dotted line denotes the 5% Bonferroni-corrected threshold.

(TIF)

S14 Fig. GWAS results for all phenotypes within Northern Europe.

Manhattan plots using 83 (or more) genotypes from Northern Europe as input. The traits are Final Size (upper row), t50 (2nd row) and Slope (lower row) in HL (left column), LL (middle column) and their GxE (right column). The dotted line denotes the 5% Bonferroni-corrected threshold.

(TIF)

S15 Fig. QQ-plots for GWAS results for all phenotypes across Europe.

QQ-plots of GWAS using 201 genotypes from Europe (Spain and Northern Europe) as input. The traits are Final Size (upper row), t50 (2nd row) and Slope (lower row) in HL (left column), LL (middle column) and their GxE (right column). The grey line denotes the neutral expectation and the red line the observation from the data. The axes describe the expected (x) and observed (y) values for -log10(p).

(TIF)

S16 Fig. QQ-plots for GWAS results for all phenotypes within Spain.

QQ-plots of GWAS using 117 genotypes from Spain as input. The traits are Final Size (upper row), t50 (2nd row) and Slope (lower row) in HL (left column), LL (middle column) and their GxE (right column). The grey line denotes the neutral expectation and the red line the observation from the data. The axes describe the expected (x) and observed (y) values for -log10(p).

(TIF)

S17 Fig. QQ-plots for GWAS results for all phenotypes within Northern Europe.

QQ-plots of GWAS using 83 genotypes from Northern Europe as input. The traits are Final Size (upper row), t50 (2nd row) and Slope (lower row) in HL (left column), LL (middle column) and their GxE (right column). The grey line denotes the neutral expectation and the red line the observation from the data. The axes describe the expected (x) and observed (y) values for -log10(p).

(TIF)

S18 Fig. Correlation of phenotypic traits.

Pearson correlations for each pair of traits. Colored boxes show significant correlations (p<0.05 after multiple testing correction (FDR correction) and correction for populations structure (lmekin)) for 193 genotypes across experiments. The significance is illustrated by box size (larger box represents lower p-values) and the color shows the direction and strength of correlation. Abbreviations are: HL = high light, GxE = Genome x Environment interaction, LL = low light, SL = Slope, FT = Flowering time, FS = Final Size, DiamFieldM2 = Diameter in Field conditions after 2 Months, Biomass21d = Biomass in controlled (HL) conditions after 21 days.

(TIF)

S19 Fig. Functional enrichment dendrogram for GO enrichment.

The enrichment is either based on ranking genes by p-value of the nearest SNP in GWAS (columns 1–9) or Fst of the gene (column 10). The GO terms are arranged into 9 clusters of similar function on the right side of the plot. Depicted are only enrichments with a p-value < 0.001.

(TIF)

S20 Fig. Loss-of-function alleles per population.

Based on data from Monroe et al. (2018). The sum of LOF alleles per genotype for Northern Europe (green, n =) and Spain (purple, n =). The regions were not different from each other (GLHT: z-value = 0.634, p-value = 0.526, negative binomial distribution).

(TIF)

S21 Fig. Loss-of-function alleles per population.

Based on data from Xu et al. (2019). Boxplot of the sum of LOF alleles per genotype for each region. Boxplots with different letters are significantly different according to Tukey’s HSD (p-value < 0.05). Region information: China (CH, n = 21), Northern Europe (NE, n = 84) & Spain (SP, n = 121).

(TIF)

S22 Fig. Polygenic Scores and regional differentiation for each trait.

Summary results from the analysis after Berg and Coop (2014). Each boxplot depicts the polygenic scores of a trait for genotypes from Northern Europe (green) & Spain (purple). Boxplots with different letters are significantly different according to Tukey’s HSD (p-value < 0.05). Furthermore, the plot contain information about the number of SNPs used as input, the Qx score for excess variance in SNPs associated with the trait and the p-value of the Qx-analysis. Traits: FS = Final Size, t50, SL = Slope, HL = High Light treatment, LL = Low Light treatment.

(TIF)

S23 Fig. Correlation of phenotypic traits and climate.

Pearson correlations for each pair of traits/climatic variable. Colored boxes show significant correlations (p<0.05 after multiple testing correction (FDR correction) and correction for populations structure (lmekin)) for 195 genotypes across experiments. The significance is illustrated by box size (larger box represents lower p-values) and the color shows the direction and strength of correlation. Abbreviations are: HL = high light, GxE = Genome x Environment interaction, LL = low light, SL = Slope, FT = Flowering time, FS = Final Size, DiamFieldM2 = Diameter in Field conditions after 2 Months, Biomass21d = Biomass in controlled (HL) conditions after 21 days, Radiation in kJ/m2/day, PC1/2_growS = Principle component 1 and 2 of all climatic data in the estimated growing Season (explaining 88.7 and 10.7% of the variance), PC1/2_T = Principle component 1 and 2 for climatic variables related to Temperature (explaining 98.1 and 1.3% of the variance), PC1/2_P = Principle component 1 and 2 for climatic variables related to Precipitation (explaining 89.8 and 8.22% of the variance).

(TIF)

S1 Table. Information on the genotypes used in this study, with their country of origin, assigned region, Genotype name and ID in 1001 Genomes, info on the sampling location and position (latitude and longitude) and the Collector.

In the second part of the table the climatic information on the respective location is summarized with: Number of growing months; in the growing season: average Temperature [°C], Soil water content [%], Water vapor pressure [kPa], Wind speed [m s-1], Radiation [kJ m-2 day-1], Rain [mm]. Afterwards the Bioclim variables 1 t 19 from the Worldclim database (http://worldclim.org/version2). After this the first 2 PCs for PCA on data based on growing season, Temperature variables from bioclim data and precipitation variables from bioclim data.

(XLSX)

S2 Table. Raw phenotypic measurements for each plant In the experiment.

Replicate is the block the plant was growing in with the corresponding tray number and row and column for position on the tray (5 rows and 7 columns per tray). The “diam” measurements are diameter measurements where the number corresponds to days after sowing.

(XLSX)

S3 Table. Genotypic mean of each genotype after correction for positional effects.

Information of the usage of genotypes: Phenotype_analysis is 1, if the genotype was used for phenotype-related analysis (regional differentiation, Qst) and GWAS is 1, if the genotype was used in GWAS and following analyses (also GO enrichment & polygenic scores). Additionally data from other experiments that was used for correlations: DiamFieldM2: Diameter in mm in the field in Cologne, after 2 months; Hypocotyllength: length of hypocotyls in mm in HL conditions, 15 days after sowing; Biomass21d: Plant dry weight in g after 21 days after sowing in HL conditions; FT_10/FT_16: flowering time in 10/16°C from 1001 Genomes, 2016.

(XLSX)

S4 Table. Estimated heritabilities and pseudo-heritability from EMMAX.

Rows contain the input sample size (N), heritability (H2) and pseudo-heritability for each trait, treatment and population. The p-value of a heritabily is the genotype effect of the mixed linear model.

(XLSX)

S5 Table. Pairwise comparisons of phenotypes for each trait and treatment.

The mean difference between traits is given with Z- and p-value from a GLHT of a glm(parameter~population).

(XLSX)

S6 Table. Associated SNPs for the different datasets, traits and environment.

For each associated SNP the Chromosome, Base, minor allele frequency (MAF), -log10(P) and effect size are given. The LD for the focal SNP was estimated, with the number of SNPs and genes within the LD range. The p-value of two SNPs that exceeded the Bonferroni threshold are marked in bold, the others were just below threshold.

(XLSX)

S7 Table. Testing the accuracy of polygenic trait predictions.

A. Polygenic scores were computed based on the phenotypic measurements for two replicates, and correlated with the phenotype observed for the third replicate. Correlation was tested with a Spearman rank correlation test Rho..Nr_SNPs: number of SNPs associated with each trait at p<10–4. B. SNPs associated with the phenotype at sub-significant level improve significantly the phenotype prediction but random SNPs show that population structure plays an important role. Rho_associated shows the correlation between polygenic score and the genotypic values. Based on 1000 random samples of an equal number of SNPs, a distribution of random Zscores was computed and compared to the spearman correlation of the prediction of associated variants to the input phenotypes (Rho_associated). The distributions of spearman correlations of the 1000 random sets is described with the median (Rho_random_median), 95th quantile (Rho_random_95quantile) and the maximal Rho (Rho_random_max). The correlation obtained with random SNP set is also often significant at p<0.05 (Percentage_significant), but the maximum correlation coefficient (Rho_random_max) is always markedly lower than the one obtained with sub-significant SNPs (Rho_associated).

(XLSX)

S8 Table. Results from Polygenic adaptation test after Berg & Coop (2014).

The trait column contains the respective traits that were used as input and a random set of equal size which was used to predict FSHL in the last row. Qx is the test statistic for a signal of polygenic adaptation using all phenotypic data. Rho are the results from a spearman correlation of Z-scores predicted versus the input phenotypes. The regional Z-values for Northern Europe and Spain are the region specific effect on the trait. P-values from each test are in parentheses. The SNPs column contains the number of input SNPs for the estimation of polygenic adaptation (after pruning).

(XLSX)

S9 Table. GO-enrichment of genes in LD (within 10kb) to SNPs with p < 0.008 (based on permutation) in a GWAS for the respective trait.

Shown are terms with an enrichment < 0.001. GO.ID and term give information on the enriched GO term. Annotated states all genes that are in the term, Significant is the number of genes that are associated in the input data set and Expected the number of genes that are expected to be enriched by chance. The resultFisher gives the Fisher score for enrichment. We only report GO terms with >5 genes in them.

(XLSX)

S10 Table. GO-enrichment of all genes ranked by their Fst or p-value of the closest SNP in a GWAS of the respective trait.

Shown are terms with an enrichment < 0.001. GO.ID and term give information on the enriched GO term. Nr_Genes is the number of genes in the respective term. The resultKS gives the Kolmogorov-Smirnov score for enrichment.

(XLSX)

S11 Table. Loadings of the climate PCAs for S23 Fig.

The input variables for the respective PCA are in the column Climatic_variable and the loading for PC1 and PC2 are in the following columns. The PCAs were performed with data within the projected growing season (PCA_growing_season, 185 unique locations), for bioclimatic variables related to temperature (PCA_temperature, 180 unique locations) and bioclimatic variables related to precipitation (PCA_precipitation, 180 unique locations).

(XLSX)

S1 File. R Markdown detailing the statistical analysis of rosette diameter variation.

(HTML)

Acknowledgments

We thank Prof. Andreas Beyer and Prof. Arthur Korte for advice regarding GWAS analyses and Emily Wheeler, Boston, for editorial assistance.

Data Availability

All relevant data are within the manuscript and its Supporting Information files. Raw image data and image analysis scripts are stored in the DRYAD repository (doi:10.5061/dryad.s1rn8pk5m).

Funding Statement

This research was funded by the European Research Council (ERC) through the “AdaptoSCOPE” grant 648617 to JdM, by the German American Fulbright Commission to JdM, and by grant AGL2016-78709-R (MEC, Spain) to S.E.R.-O. S.E.R.-O also acknowledges the financial support of the Spanish Ministry of Economy and Competitivity for the Center of Excellence Severo Ochoa 2016–2019 (SEV-2015-0533) and by the CERCA Programme/Generalitat de Catalunya. We further acknowledge grants R35 GM127131 and RO1 MH101244 from the National Institutes of Health (NIH) for S.S and E.M.K. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Mitchell-olds T. Genetic constraints on life-history. Evolution (N Y). 1996;50: 140–145. [DOI] [PubMed] [Google Scholar]
  • 2.Wilson AJ, Pemberton JM, Pilkington JG, Clutton-Brock TH, Coltman DW, Kruuk LEB. Quantitative genetics of growth and cryptic evolution of body size in an island population. Evol Ecol. 2007;21: 337–356. 10.1007/s10682-006-9106-z [DOI] [Google Scholar]
  • 3.Elena SF, Lenski RE. Evolution experiments with microorganisms: The dynamics and genetic bases of adaptation. Nat Rev Genet. 2003;4: 457–469. 10.1038/nrg1088 [DOI] [PubMed] [Google Scholar]
  • 4.Grime JP. Vegetation classification by reference to strategies. Nature. 1974;250: 26–31. [Google Scholar]
  • 5.Grime JP. Evidence for the Existence of Three Primary Strategies in Plants and Its Relevance to Ecological and Evolutionary Theory. Am Nat. 1977;111: 1169–1194. 10.1086/283244 [DOI] [Google Scholar]
  • 6.Byers KJRP, Xu S, Schlüter PM. Molecular mechanisms of adaptation and speciation: why do we need an integrative approach? Mol Ecol. 2017;26: 277–290. 10.1111/mec.13678 [DOI] [PubMed] [Google Scholar]
  • 7.Chevin LM, Lande R. When do adaptive plasticity and genetic evolution prevent extinction of a density-regulated population? Evolution (N Y). 2010;64: 1143–1150. 10.1111/j.1558-5646.2009.00875.x [DOI] [PubMed] [Google Scholar]
  • 8.Bac-Molenaar JA, Granier C, Keurentjes JJB, Vreugdenhil D. Genome wide association mapping of time-dependent growth responses to moderate drought stress in Arabidopsis. Plant Cell Environ. 2016;39: 88–102. 10.1111/pce.12595 [DOI] [PubMed] [Google Scholar]
  • 9.Körner C. Paradigm shift in plant growth control. Curr Opin Plant Biol. 2015;25: 107–114. 10.1016/j.pbi.2015.05.003 [DOI] [PubMed] [Google Scholar]
  • 10.Ballaré CL, Pierik R. The shade-avoidance syndrome: Multiple signals and ecological consequences. Plant Cell Environ. 2017;40: 2530–2543. 10.1111/pce.12914 [DOI] [PubMed] [Google Scholar]
  • 11.Pigliucci M. Evolution of phenotypic plasticity: Where are we going now? Trends Ecol Evol. 2005;20: 481–486. 10.1016/j.tree.2005.06.001 [DOI] [PubMed] [Google Scholar]
  • 12.Takou M, Wieters B, Kopriva S, Coupland G, Linstädter A. Linking genes with ecological strategies in Arabidopsis thaliana. J Exp Bot. 2019;70: 1141–1151. 10.1093/jxb/ery447 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Kronholm I, Picó F, Alonso-Blanco C, Goudet J, de Meaux J. Genetic basis of adaptation in Arabidopsis thaliana: Local adaptation at the seed dormancy QTL DOG1. Evolution (N Y). 2012;66: 2287–2302. 10.1111/j.1558-5646.2012.01590.x [DOI] [PubMed] [Google Scholar]
  • 14.Kerdaffrec E, Filiault DL, Korte A, Sasaki E, Nizhynska V, Seren Ü, et al. Multiple alleles at a single locus control seed dormancy in Swedish Arabidopsis. Elife. 2016;5: e22502 10.7554/eLife.22502 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Navarro JAR, Wilcox M, Burgueño J, Romay C, Swarts K, Trachsel S, et al. A study of allelic diversity underlying flowering-time adaptation in maize landraces. Nat Genet. 2017;49: 476–480. 10.1038/ng.3784 [DOI] [PubMed] [Google Scholar]
  • 16.Hughes PW, Soppe WJJ, Albani MC. Seed traits are pleiotropically regulated by the flowering time gene PERPETUAL FLOWERING 1 (PEP1) in the perennial Arabis alpina. Mol Ecol. 2019;28: 1183–1201. 10.1111/mec.15034 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Lewontin RC, Krakauer J. Distribution of gene frequency as a test of the theory of the selective neutrality of polymorphisms. Genetics. 1973;74: 175–195. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Hallatschek O, Hersen P, Ramanathan S, Nelson DR. Genetic drift at expanding frontiers promotes gene segregation. Proc Natl Acad Sci U S A. 2007;104: 19926–19930. 10.1073/pnas.0710150104 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Excoffier L, Foll M, Petit RJ. Genetic Consequences of Range Expansions. Annu Rev Ecol Evol Syst. 2009;40: 481–501. 10.1146/annurev.ecolsys.39.110707.173414 [DOI] [Google Scholar]
  • 20.Willi Y, Fracassetti M, Zoller S, Van Buskirk J. Accumulation of Mutational Load at the Edges of a Species Range. Mol Biol Evol. 2018;35: 781–791. 10.1093/molbev/msy003 [DOI] [PubMed] [Google Scholar]
  • 21.Klopfstein S, Currat M, Excoffier L. The Fate of Mutations Surfing on the Wave of a Range Expansion. Mol Biol Evol. 2006;23: 482–490. 10.1093/molbev/msj057 [DOI] [PubMed] [Google Scholar]
  • 22.Takou M, Hämälä T, Koch E, Steige KA, Dittberner H, Yant L, et al. Maintenance of adaptive dynamics and no detectable load in a range-edge out-crossing plant population. bioRxiv. 2020. 10.1101/709873 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Mitchell-Olds T, Schmitt J. Genetic mechanisms and evolutionary significance of natural variation in Arabidopsis. Nature. 2006;441: 947–52. 10.1038/nature04878 [DOI] [PubMed] [Google Scholar]
  • 24.Bergelson J, Roux F. Towards identifying genes underlying ecologically relevant traits in Arabidopsis thaliana. Nat Rev Genet. 2010;11: 867–879. 10.1038/nrg2896 [DOI] [PubMed] [Google Scholar]
  • 25.Hornitschek P, Kohnen M V., Lorrain S, Rougemont J, Ljung K, López-Vidriero I, et al. Phytochrome interacting factors 4 and 5 control seedling growth in changing light conditions by directly controlling auxin signaling. Plant J. 2012;71: 699–711. 10.1111/j.1365-313X.2012.05033.x [DOI] [PubMed] [Google Scholar]
  • 26.Vasseur F, Exposito-Alonso M, Ayala-Garay OJ, Wang G, Enquist BJ, Vile D, et al. Adaptive diversification of growth allometry in the plant Arabidopsis thaliana. Proc Natl Acad Sci. 2018;115: 3416–3421. 10.1073/pnas.1709141115 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Bac-Molenaar JA, Vreugdenhil D, Granier C, Keurentjes JJB. Genome-wide association mapping of growth dynamics detects time-specific and general quantitative trait loci. J Exp Bot. 2015;66: 5567–5580. 10.1093/jxb/erv176 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Hanemian M, Vasseur F, Marchadier E, Gilbault E, Bresson J, Gy I, et al. Transcriptional natural variation at FLM induces synergistic pleiotropy in Arabidopsis thaliana. bioRxiv. 2019; 658013 10.1101/658013 [DOI] [Google Scholar]
  • 29.Glander S, He F, Schmitz G, Witten A, Telschow A, de Meaux J. Assortment of Flowering Time and Immunity Alleles in Natural Arabidopsis thaliana Populations Suggests Immunity and Vegetative Lifespan Strategies Coevolve. Genome Biol Evol. 2018;10: 2278–2291. 10.1093/gbe/evy124 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Debieu M, Tang C, Stich B, Sikosek T, Effgen S, Josephs E, et al. Co-Variation between Seed Dormancy, Growth Rate and Flowering Time Changes with Latitude in Arabidopsis thaliana. PLoS One. 2013;8: 1–12. 10.1371/journal.pone.0061075 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Vetter MM, Kronholm I, He F, Häweker H, Reymond M, Bergelson J, et al. Flagellin perception varies quantitatively in arabidopsis thaliana and its relatives. Mol Biol Evol. 2012;29: 1655–1667. 10.1093/molbev/mss011 [DOI] [PubMed] [Google Scholar]
  • 32.Luo Y, Widmer A, Karrenberg S. The roles of genetic drift and natural selection in quantitative trait divergence along an altitudinal gradient in Arabidopsis thaliana. Heredity (Edinb). 2015;114: 220–8. 10.1038/hdy.2014.89 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Todesco M, Balasubramanian S, Hu TT, Traw MB, Horton M, Epple P, et al. Natural allelic variation underlying a major fitness trade-off in Arabidopsis thaliana. Nature. 2010;465: 632–636. 10.1038/nature09083 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Züst T, Agrawal AA. Trade-Offs Between Plant Growth and Defense Against Insect Herbivory: An Emerging Mechanistic Synthesis. Annu Rev Plant Biol. 2017;68: 513–534. 10.1146/annurev-arplant-042916-040856 [DOI] [PubMed] [Google Scholar]
  • 35.Davila Olivas NH, Frago E, Thoen MPM, Kloth KJ, Becker FFM, van Loon JJA, et al. Natural variation in life history strategy of Arabidopsis thaliana determines stress responses to drought and insects of different feeding guilds. Mol Ecol. 2017; 2959–2977. 10.1111/mec.14100 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Kronholm I, Loudet O, de Meaux J. Influence of mutation rate on estimators of genetic differentiation—lessons from Arabidopsis thaliana. BMC Genet. 2010;11: 1–14. 10.1186/1471-2156-11-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Fulgione A, Koornneef M, Roux F, Hermisson J, Hancock AM. Madeiran Arabidopsis thaliana Reveals Ancient Long-Range Colonization and Clarifies Demography in Eurasia. Mol Biol Evol. 2017. 10.1093/molbev/msx300 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Zou Y, Hou X, Wu Q, Chen J, Li Z, Han T, et al. Adaptation of Arabidopsis thaliana to the Yangtze River basin. Genome Biol. 2017; 1–11. 10.1186/s13059-016-1139-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.1001 Genomes Consortium T. 1,135 Genomes Reveal the Global Pattern of Polymorphism in Arabidopsis thaliana—suppl. Material. Cell. 2016;166. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Wieters B, de Meaux, J. Phenotype pictures of Arabidopsis thaliana in high light and low light conditions [Data set]. 2020 10.5061/dryad.s1rn8pk5m [DOI]
  • 41.Rasband W. ImageJ. U.S. National Institutes of Health; 2012. [Google Scholar]
  • 42.De Vylder J, Vandenbussche F, Hu Y, Philips W, Van Der Straeten D. Rosette Tracker: an open source image analysis tool for automatic quantification of genotype effects. 2012. 10.1104/pp.112.202762 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.R-Development-Core-Team. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing; 2008. Available: http://www.r-project.org [Google Scholar]
  • 44.Ritz C, Baty F, Streibig JC, Gerhard D. Dose-response analysis using R. PLoS One. 2015;10: 1–13. 10.1371/journal.pone.0146021 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Dittberner H, Korte A, Mettler-Altmann T, Weber APM, Monroe G, de Meaux J. Natural variation in stomata size contributes to the local adaptation of water-use efficiency in Arabidopsis thaliana. Mol Ecol. 2018; 4052–4065. 10.1111/mec.14838 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Pinheiro J, Bates D, DebRoy S, Sarkar D, Team RC. nlme: linear and nonlinear mixed effects models. 2018. [Google Scholar]
  • 47.Korte A, Farlow A. The advantages and limitations of trait analysis with GWAS: a review. Plant Methods. 2013;9: 29–38. 10.1186/1746-4811-9-29 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Fick SE, Hijmans RJ. Worldclim 2: New 1-km spatial resolution climate surfaces for global land areas. International Journal of Climatology. 2017. [Google Scholar]
  • 49.Trabucco A, Zomer R. Global soil water balance geospatial database. CGIAR Consortium for Spatial Information. Publ Online. 2010. Available: http://www.cgiar-csi.org [Google Scholar]
  • 50.Hothorn T, Bretz F, Westfall P, Heiberger RM, Schuetzenmeister A, Scheibe S. Package ‘multcomp’—Simultaneous Inference in General Parametric Models. 2017. [Google Scholar]
  • 51.Wei T, Simko V. R package “corrplot”: Visualisation of a Correlation Matrix (Version 0.84). 2017. [Google Scholar]
  • 52.The Inkscape Team. Inkscape. 2007. Available: inkscape.org
  • 53.Berg JJ, Coop G. A Population Genetic Signal of Polygenic Adaptation. PLoS Genet. 2014;10 10.1371/journal.pgen.1004412 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, et al. The variant call format and VCFtools. Bioinformatics. 2011;27: 2156–2158. 10.1093/bioinformatics/btr330 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Monroe J, Powell T, Price N, Mullen J., Howard A, Evans K, et al. Drought adaptation in Arabidopsis thaliana by extensive genetic loss-of- function. Elife. 2018;7: 2–21. 10.7554/eLife.41038 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Sohail M, Maier RM, Ganna A, Bloemendal A, Martin AR, Turchin MC, et al. Polygenic adaptation on height is overestimated due to uncorrected stratification in genome-wide association studies. Elife. 2019;8: 1–17. 10.7554/eLife.39702 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Fustier MA, Martínez-Ainsworth NE, Aguirre-Liguori JA, Venon A, Corti H, Rousselet A, et al. Common gardens in teosintes reveal the establishment of a syndrome of adaptation to altitude. PLoS Genetics. 2019. 10.1371/journal.pgen.1008512 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Berg JJ, Harpak A, Sinnott-Armstrong N, Joergensen AM, Mostafavi H, Field Y, et al. Reduced signal for polygenic adaptation of height in UK Biobank. Elife. 2019;8: 1–47. 10.7554/eLife.39725 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Purcell S, Chang C. PLINK 1.9. 2019. Available: www.cog-genomics.org/plink/1.9/
  • 60.Pfeifer B, Wittelsburger U, Ramos-Onsins SE, Lercher MJ, Wittelsbürger U, Ramos-Onsins SE, et al. PopGenome: An Efficient Swiss Army Knife for Population Genomic Analyses in R. Mol Biol Evol. 2014;31: 1929–1936. 10.1093/molbev/msu136 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.He F, Arce AL, Schmitz G, Koornneef M, Novikova P, Beyer A, et al. The Footprint of Polygenic Adaptation on Stress-Responsive Cis-Regulatory Divergence in the Arabidopsis Genus. Mol Biol Evol. 2016;33: 2088–2101. 10.1093/molbev/msw096 [DOI] [PubMed] [Google Scholar]
  • 62.Yu G, Li F, Qin Y, Bo X, Wu Y, Wang S. GOSemSim: An R package for measuring semantic similarity among GO terms and gene products. Bioinformatics. 2010;26: 976–978. 10.1093/bioinformatics/btq064 [DOI] [PubMed] [Google Scholar]
  • 63.Paradis E, Blomberg S, Bolker B, Brown J, Claude J, Cuong HS, et al. Package ‘ape’: Analyses of Phylogenetics and Evolution. 2019. [Google Scholar]
  • 64.Simons YB, Sella G. The impact of recent population history on the deleterious mutation load in humans and close evolutionary relatives. Curr Opin Genet Dev. 2016;41: 150–158. 10.1016/j.gde.2016.09.006 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Xu YC, Niu XM, Li XX, He W, Chen JF, Zou YP, et al. Adaptation and phenotypic diversification in arabidopsis through loss-of-function mutations in protein-coding genes. Plant Cell. 2019;31: 1012–1025. 10.1105/tpc.18.00791 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Goudet J. HIERFSTAT, a package for R to compute and test hierarchical F -statistics. Mol Ecol Notes. 2005;5: 184–186. 10.1111/j.1471-8278 [DOI] [Google Scholar]
  • 67.Leinonen T, McCairns RJS, O’Hara RB, Merilä J. Q(ST)-F(ST) comparisons: evolutionary and ecological insights from genomic heterogeneity. Nat Rev Genet. 2013;14: 179–90. 10.1038/nrg3395 [DOI] [PubMed] [Google Scholar]
  • 68.Whitlock MC, Guillaume F. Testing for spatially divergent selection: Comparing QST to FST. Genetics. 2009;183: 1055–1063. 10.1534/genetics.108.099812 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Koch EM. The effects of demography and genetics on the neutral distribution of quantitative traits. Genetics. 2019;211: 1371–1394. 10.1534/genetics.118.301839 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Brachi B, Faure N, Horton M, Flahauw E, Vazquez A, Nordborg M, et al. Linkage and association mapping of Arabidopsis thaliana flowering time in nature. PLoS Genet. 2010;6: 40 10.1371/journal.pgen.1000940 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Külheim C, Ågren J, Jansson S. Rapid regulation of light harvesting and plant fitness in the field. Science (80-). 2002;297: 91–93. 10.1126/science.1072359 [DOI] [PubMed] [Google Scholar]
  • 72.Franklin KA. Shade avoidance. New Phytol. 2008;179: 930–944. 10.1111/j.1469-8137.2008.02507.x [DOI] [PubMed] [Google Scholar]
  • 73.Durvasula A, Fulgione A, Gutaker RM, Alacakaptan SI, Flood PJ, Neto C, et al. African genomes illuminate the early history and transition to selfing in Arabidopsis thaliana. Proc Natl Acad Sci. 2017;114: 201616736 10.1073/pnas.1616736114 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Sasaki E, Zhang P, Atwell S, Meng D, Nordborg M. " Missing " G x E Variation Controls Flowering Time in Arabidopsis thaliana. PLoS Genet. 2015; 1–18. 10.1371/journal.pgen.1005597 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Méndez-Vigo B, Gomaa NH, Alonso-Blanco C, Xavier Picó F. Among- and within-population variation in flowering time of Iberian Arabidopsis thaliana estimated in field and glasshouse conditions. New Phytol. 2013;197: 1332–1343. 10.1111/nph.12082 [DOI] [PubMed] [Google Scholar]
  • 76.Shindo C, Aranzana MJ, Lister C, Baxter C, Nicholls C, Nordborg M, et al. Role of FRIGIDA and FLOWERING LOCUS C in determining variation in flowering time of Arabidopsis. Plant Physiol. 2005;138: 1163–1173. 10.1104/pp.105.061309 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Zan Y, Carlborg Ö. A Polygenic Genetic Architecture of Flowering Time in the Worldwide Arabidopsis thaliana Population. de Meaux J, editor. Mol Biol Evol. 2019;36: 141–154. 10.1093/molbev/msy203 [DOI] [PubMed] [Google Scholar]
  • 78.Price N, Lopez L, Platts AE, Lasky JR. In the presence of population structure: From genomics to candidate genes underlying local adaptation. Ecol Evol. 2020; 1889–1904. 10.1002/ece3.6002 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Wellenreuther M, Hansson B. Detecting Polygenic Evolution: Problems, Pitfalls, and Promises. Trends Genet. 2016;32: 155–164. 10.1016/j.tig.2015.12.004 [DOI] [PubMed] [Google Scholar]
  • 80.Csilléry K, Rodríguez-Verdugo A, Rellstab C, Guillaume F. Detecting the genomic signal of polygenic adaptation and the role of epistasis in evolution. Mol Ecol. 2018;27: 606–612. 10.1111/mec.14499 [DOI] [PubMed] [Google Scholar]
  • 81.Pritchard JK, Di Rienzo A. Adaptation—not by sweeps alone. Nat Rev Genet. 2010;11: 665–667. 10.1038/nrg2880 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 82.Braam J, Davis RW. Rain-, wind-, and touch-induced expression of calmodulin and calmodulin-related genes in Arabidopsis. Cell. 1990. 10.1016/0092-8674(90)90587-5 [DOI] [PubMed] [Google Scholar]
  • 83.Filiault DL, Maloof JN. A genome-wide association study identifies variants underlying the Arabidopsis thaliana shade avoidance response. PLoS Genet. 2012;8: 1–12. 10.1371/journal.pgen.1002589 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 84.Bou-Torrent J, Roig-Villanova I, Galstyan A, Martínez-García JF. PAR1 and PAR2 integrate shade and hormone transcriptional networks. Plant Signal Behav. 2008;3: 453–454. 10.4161/psb.3.7.5599 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 85.Holtan HE, Bandong S, Marion CM, Adam L, Tiwari S, Shen Y, et al. Bbx32, an arabidopsis b-box protein, functions in light signaling by suppressing HY5-regulated gene expression and interacting with STH2/BBX21. Plant Physiol. 2011;156: 2109–2123. 10.1104/pp.111.177139 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 86.De Montaigu A, Giakountis A, Rubin M, Tóth R, Cremer F, Sokolova V, et al. Natural diversity in daily rhythms of gene expression contributes to phenotypic variation. Proc Natl Acad Sci U S A. 2015;112: 905–910. 10.1073/pnas.1422242112 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 87.Salmela MJ, Weinig C. The fitness benefits of genetic variation in circadian clock regulation. Curr Opin Plant Biol. 2019;49: 86–93. 10.1016/j.pbi.2019.06.003 [DOI] [PubMed] [Google Scholar]
  • 88.Peischl S, Dupanloup I, Kirkpatrick M, Excoffier L. On the accumulation of deleterious mutations during range expansions. Mol Ecol. 2013;22: 5972–5982. 10.1111/mec.12524 [DOI] [PubMed] [Google Scholar]
  • 89.Azevedo J, Courtois F, Hakimi M-A, Demarsy E, Lagrange T, Lerbs-mache S, et al. Intraplastidial trafficking of a phage-type RNA polymerase is mediated by a thylakoid RING-H2 protein. PNAS. 2008; 2–7. 10.1073/pnas.0800909105 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 90.Jonas T, Rixen C, Sturm M, Stoeckli V. How alpine plant growth is linked to snow cover and climate variability. J Geophys Res Biogeosciences. 2008;113: 1–10. 10.1029/2007JG000680 [DOI] [Google Scholar]
  • 91.Byars SG, Papst W, Hoffmann AA. Local adaptation and cogradient selection in the alpine plant, Poa hiemata, along a narrow altitudinal gradient. Evolution (N Y). 2007;61: 2925–2941. 10.1111/j.1558-5646.2007.00248.x [DOI] [PubMed] [Google Scholar]
  • 92.Li B, Suzuki J-I, Hara T. Latitudinal variation in plant size and relative growth rate in Arabidopsis thaliana. Oecologia. 1998;115: 293–301. 10.1007/s004420050519 [DOI] [PubMed] [Google Scholar]
  • 93.Horton MW, Willems G, Sasaki E, Koornneef M, Nordborg M. The genetic architecture of freezing tolerance varies across the range of Arabidopsis thaliana. Plant Cell Environ. 2016;39: 2570–2579. 10.1111/pce.12812 [DOI] [PubMed] [Google Scholar]
  • 94.Frachon L, Bartolli C, Carrère S, Bouchez O, Chaubet A, Gautier M, Roby D, Roux F. A Genomic Map of Climate Adaptation in Arabidopsis thaliana at a Micro-Geographic Scale. Frontiers in Plant Science. 2018. 10.3389/fpls.2018.00967 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 95.Barghi N, Hermisson J, Schlötterer C. Polygenic adaptation: a unifying framework to understand positive selection. Nat Rev Genet. 2020. 10.1038/s41576-020-0250-z [DOI] [PubMed] [Google Scholar]

Decision Letter 0

Kirsten Bomblies

3 Jul 2020

Dear Dr de Meaux,

Thank you very much for submitting your Research Article entitled 'Polygenic adaptation of rosette growth variation in Arabidopsis thaliana populations' to PLOS Genetics. Your manuscript was fully evaluated at the editorial level and by independent peer reviewers. The reviewers appreciated the attention to an important problem, but raised some substantial concerns about the current manuscript. Based on the reviews, we will not be able to accept this version of the manuscript, but we would be willing to review again a much-revised version. We cannot, of course, promise publication at that time.

Should you decide to revise the manuscript for further consideration here, your revisions should address the specific points made by each reviewer. We will also require a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript.

If you decide to revise the manuscript for further consideration at PLOS Genetics, please aim to resubmit within the next 60 days, unless it will take extra time to address the concerns of the reviewers, in which case we would appreciate an expected resubmission date by email to plosgenetics@plos.org.

If present, accompanying reviewer attachments are included with this email; please notify the journal office if any appear to be missing. They will also be available for download from the link below. You can use this link to log into the system when you are ready to submit a revised version, having first consulted our Submission Checklist.

To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see our guidelines.

Please be aware that our data availability policy requires that all numerical data underlying graphs or summary statistics are included with the submission, and you will need to provide this upon resubmission if not already present. In addition, we do not permit the inclusion of phrases such as "data not shown" or "unpublished results" in manuscripts. All points should be backed up by data provided with the submission.

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool.  PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

PLOS has incorporated Similarity Check, powered by iThenticate, into its journal-wide submission system in order to screen submitted content for originality before publication. Each PLOS journal undertakes screening on a proportion of submitted articles. You will be contacted if needed following the screening process.

To resubmit, use the link below and 'Revise Submission' in the 'Submissions Needing Revision' folder.

[LINK]

We are sorry that we cannot be more positive about your manuscript at this stage. Please do not hesitate to contact us if you have any concerns or questions.

Yours sincerely,

Kirsten Bomblies

Section Editor: Evolution

PLOS Genetics

Kirsten Bomblies

Section Editor: Evolution

PLOS Genetics

The reviewers clearly appreciated that this manuscript covers an interesting and important topic, and I agree and would like to see it get to the level that we can accept it. As is, however, there were substantial and I think largely very valid concerns raised. I was similarly puzzled in some sections. Of course this is complex data and a complex study, but I think the reviewers raise good points that will help hone and clarify the story.

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: uploaded as attachment

Reviewer #2: The manuscript from Wieters and co-authors sets out to understand the genetic basis of variation for rosette growth in the global population of Arabidopsis. To do so, the authors followed for several weeks the growth, approximated as rosette diameter, of a set of 278 accessions representing four broad geographic regions, grown in two light conditions (HL = high light, and LL = low light). They extracted three parameters from those measurements (FS = final rosette size; SL = slope of the exponential phase of the growth curve; and t50 = time at which rosette diameter is ½ FS) and used them to compare the different regional groups and run GWAS. Finally, the authors used a population genomics approach to look for signals of polygenic adaptation for growth in their dataset (Qx, Qst/Fst).

This manuscript addresses a very important topic in adaptation; the ability to match growth to environmental conditions and available resources is a quintessentially adaptive trait, especially for plants, but still very little is known about how variation for growth is genetically determined. The authors have collected a remarkable amount of phenotypic data, and have conducted an extensive set of analyses. I have, however, a few concerns with some of the analyses or their interpretation, and more in general on how the manuscript is framed. I am listing my main comments below following the organization of the manuscript, although that is not necessarily the order of importance.

- Given the apparent focus of the paper on identifying difference in growth patterns (and local adaptation) between different Arabidopsis populations, I was a bit surprised by the author’s choice of light intensity to explore GXE interactions. While clearly light intensity has a major effect on rosette size, this seems a better setup to study variation in shade avoidance (as the GWAS results confirm) rather than to compare growth habits in different macro-climatic areas. The authors should motivate this choice (over, say, different temperatures, daylengths or water availability), and clarify whether the light intensities used are ecologically relevant (i.e. are consistent with differences between the regions examined in this study).

- The authors use rosette diameter as proxy for growth. I understand of course that that was dictated by technical limitations; total biomass measurements are disruptive, and more comprehensive 3D shape measurements would likely be unfeasible at this scale. However, this means that they are exploring only a particular aspect of growth – depending on their flowering habit, Arabidopsis rosettes will continue to produce new leaves (and therefore “grow”) well past the point where they reach maximum diameter, since new leaves will overlap with older ones but not grow longer than them. Other ways of measuring growth would likely give different results, as it is hinted by the significant, but overall quite limited, correlation between FS and biomass or hypocotyl length. In particular, the effect of light intensity on growth would probably be quite smaller if aerial biomass was measured instead of rosette size; it seems likely that the increase of the latter under low light condition is largely due to the increased petiole length that is known to be associated with shade avoidance syndrome (see for example Sasidharan et al. Plant Phys. 2010).

This does not, of course, invalidate the results presented in the manuscript – however, the authors should make clearer what is the scope of their analyses early in the manuscript, and especially in the abstract.

- The authors use a polygenic score analysis and GO enrichment analysis to show that, while only two SNPs pass the very conservative Bonferroni-corrected significance threshold, the GWAS results describe nonetheless the polygenic basis of variation for the examined growth parameters. I am particularly skeptical of GO enrichment analyses; unless they show clear cut patterns, they are labile to different interpretations. The authors interpretation of the GO enrichment results is that “All traits showed functional enrichment within gene ontology (GO) categories related to growth, confirming that these genetic associations were biologically relevant”. However, many of the enriched categories seem unlikely to be directly associated with variation for growth (e.g. “pollen exine formation”, “nuclear chromosome segregation”, “protein modification by small protein removal”, to name a few). A notable exception is GXE FS, for which the four most significantly enriched categories include “shade avoidance” and other three categories which can be plausibly linked to aerial growth/rosette size. Given how noisy the results are, I would suggest the authors remove this analysis, or limit its discussion to GXE FS.

These considerations are also valid for the comparison between genes with elevated Fst and SNP associations. While this analysis uses a different approach, I would have still expected similar results to the previous analysis for SNP associations; however, there is almost no overlap in enriched GO categories between the two analyses (or, for that matter, between the same parameter measured in different light conditions), which appears to further undermine their value.

As for the polygenic scores, I readily admit to not having extensive experience on the topic, so I might be missing or misunderstanding something – in which case I apologize beforehand. It seems to me, however, that using polygenic scores from two replicates to predict the phenotype of the third replicate is functionally identical to predicting the input data – with the accuracy of the prediction being dependent on how consistent between replicates (i.e. heritable) a trait is. Minimally, the authors should provide evidence that the predictions are significantly different from what you would get for random SNPs (possibly generating a null distribution for each parameter), and clarify the interpretation of the results. While the authors do provide one example using random SNPs for FSHL (Table S11), it is not clear how these SNPs have been selected, and it is not a direct comparison (it appears that the polygenic scores for the random SNPs have been calculated using all three replicates, and not to predict the third replicate as is the case for the results in Table S8).

An even more convincing experiment would be to remove a few accessions from the dataset, perform GWAS and calculate polygenic scores on the rest of the accessions, and then use those polygenic scores to predict the phenotype of the remaining accessions. Given the relatively small sample size, such analysis might however be under-powered.

These observations do not mean that sub-significant association are not biologically relevant; nevertheless, unless the authors can provide a more convincing proof, they should acknowledge the possibility that, especially for the low heritability parameters SL and t50 (0.07-0.02, Table S4), GWAS results are extremely noisy and might not be informative. This, however, could have significant consequences on the interpretation of further analyses that rely on GWAS results (Qx). The authors could consider to only focus on the more heritable FS parameters for most of the analyses.

Other points:

- The authors should give more information on how the accessions were chosen, and how the different European regions were defined - was it based on previous genetic characterization (although West European accessions seem to cluster with either Spain or North European ones in the PCA in Fig. S3), or on climatic differences? The Western European group of accession in particular seems not clearly defined; in Fig. S1, one German accession is classified as West European, but all German accessions are classified as North European in Table S3. At different points of the manuscript is not always clear which regional groups are included in which analyses. Since Western European and Chinese accessions are not included in most analyses after the GWAS, it might be helpful to specify at that point that the remaining analyses will focus on the Spain-Northern Europe comparison, and specifically mention the other groups whenever they are included.

- Data for biomass, hypocotyl length and FS in field experiments are used in comparisons with the rosette growth parameters used throughout the manuscript, but they could be integrated further in the analyses (GWAS), since they would provide additional information on growth variation. Either way, since they are used in the manuscript, data for those experiments should be added to the supplementary material (I don’t think they currently are).

- I am not sure that using the flowering time data from the 2016 1001 Genome Consortium paper to test correlation between flowering time and the growth parameters described in this paper is appropriate here – those plants were not only grown at different temperatures, but also with a different light cycle (16 h of light and 8 of dark vs the 12 h of light, 20ºC, and 12 h of dark, 18 ºC, used for this manuscript). While the general trends would be similar, as the relatively strong correlation between the different sets of flowering time data shows (Figure S19), the differences could be large enough to confound the analysis. While FT16 and FT10 are the next best thing, since flowering time was not measured for all accessions in this study, these limitations should be acknowledged.

- Related to the previous point, the description and presentation of Figure 3 should be improved. Are the thickness and length of the lines proportional to the significance and/or strength of the correlation? If so, the information should be included in the figure (even if, as the figure legend mentions, numeric values are reported in Table S6). The striping is also difficult to see on the red lines.

It also not clear why the “VER” parameter is included, since it is not mentioned elsewhere and it does not seem particularly germane to the analyses in the manuscript. If it is kept in the figure, it should be addressed in the manuscript, and a reference supporting the correlation between FT16-FT10 and vernalization requirements should be added (I could not find mention of that in the 2016 1001 Genomes Consortium paper, but it is a very expansive paper).

- While I have little doubt that growth rate is constrained in Arabidopsis, I am not sure selection is necessarily the only explanation for the observation that growth rates in Chinese populations are within the range of variation within European populations – there could be physiological constraints, or European populations, not having experienced (as a whole) the same bottleneck as Chinese population, could be genetically more diverse (which I believe is the case) and cover a broader range of the phenotypic space for growth. The authors should provide a more detailed explanation of why they think selection plays a major role in this pattern.

- It would be helpful to compare the GWAS results for GXE FS to those for the shade avoidance GWAS (measured as hypocotyl elongation) from Filiault and Maloof, PLoS Genetics 2012.

- I found the LOF GWAS to be an interesting approach, especially since it greatly simplifies candidate gene validation – one could quite safely assume that the LOF, and not some more complex regulatory change, is causal for the phenotypic differences. While I do not expect the authors to start a whole experiment just to humor me, it would be neat and relatively straightforward to knock out NIP1 in a few accessions carrying a functional copy (or, even easier, getting a T-DNA mutant line, since I am guessing the Col-0 allele is functional) and check if t50 is indeed affected.

Minor points:

- Line 49-52: This sentence is a bit confusing, the same concept is expressed more clearly elsewhere in the manuscript.

- Line 224-226: numbers sum up to 231, not 235.

- Line 386-397: Figure S7 only shows that flowering time (as FT16 from the 2016 1001 Genome Consortium paper) is not significantly different between North European and Spanish accessions. Directly showing correlations between FT16/FT10 and the parameters analyzed in the manuscript would be more informative (this regardless of my previous comments on whether using those data is appropriate).

- Line 460: should be 22-37 (in Table S8, there are 37 SNPs in the analysis for t50LL).

- Line 537: should be NIP1, not INIP1.

- Line 553: should be 14-47 (in Table S11, there are 14 SNPs in the analysis for GXEt50).

- Line 563-565: this sentence is likely mis-placed (it is repeated almost verbatim immediately below).

- Line 611-612: While this sentence is technically correct (three Western European accessions were include in the GWAS, and Chinese accessions were included in the LOF analyses), most (population) genetic analyses focus almost exclusively on the Spain/Northern Europe comparison.

- Line 614-615: this should probably be re-phrased to specify that light intensity has the strongest effect among the factor tested in this study (other environmental factors not tested here might have even stronger effects).

- Lines 626-628 and lines 629-630: the two sentences are almost identical.

- Figure 6. “Horizontal lines” should be “vertical lines”.

Reviewer #3: attached

**********

Have all data underlying the figures and results presented in the manuscript been provided?

Large-scale datasets should be made available via a public repository as described in the PLOS Genetics data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information.

Reviewer #1: Yes

Reviewer #2: No: Phenotypic data for biomass and hypocotyl length, and for rosette growth in field experiments are used to validate the main dataset used in the manuscript (lines 356-358), but are not reported.

Information on which accessions carry a functional or LOF NIP1 allele is missing.

The numeric values for Qst shown in Figure 6 are not reported (although I am not sure that is necessary).

Reviewer #3: No: raw data (photographs) were not provided

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: Yes: Hugo Tavares

Decision Letter 1

Magnus Nordborg, Kirsten Bomblies

12 Oct 2020

* Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. *

Dear Dr de Meaux,

Thank you very much for submitting your Research Article entitled 'Polygenic adaptation of rosette growth in Arabidopsis thaliana' to PLOS Genetics. Your manuscript was fully evaluated at the editorial level and by independent peer reviewers. The reviewers appreciated the attention to an important topic but identified some aspects of the manuscript that should be improved.

We therefore ask you to modify the manuscript according to the review recommendations before we can consider your manuscript for acceptance. Your revisions should address the specific points made by each reviewer.

In addition we ask that you:

1) Provide a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript.

2) Upload a Striking Image with a corresponding caption to accompany your manuscript if one is available (either a new image or an existing one from within your manuscript). If this image is judged to be suitable, it may be featured on our website. Images should ideally be high resolution, eye-catching, single panel square images. For examples, please browse our archive. If your image is from someone other than yourself, please ensure that the artist has read and agreed to the terms and conditions of the Creative Commons Attribution License. Note: we cannot publish copyrighted images.

We hope to receive your revised manuscript within the next 30 days. If you anticipate any delay in its return, we would ask you to let us know the expected resubmission date by email to plosgenetics@plos.org.

If present, accompanying reviewer attachments should be included with this email; please notify the journal office if any appear to be missing. They will also be available for download from the link below. You can use this link to log into the system when you are ready to submit a revised version, having first consulted our Submission Checklist.

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

Please be aware that our data availability policy requires that all numerical data underlying graphs or summary statistics are included with the submission, and you will need to provide this upon resubmission if not already present. In addition, we do not permit the inclusion of phrases such as "data not shown" or "unpublished results" in manuscripts. All points should be backed up by data provided with the submission.

PLOS has incorporated Similarity Check, powered by iThenticate, into its journal-wide submission system in order to screen submitted content for originality before publication. Each PLOS journal undertakes screening on a proportion of submitted articles. You will be contacted if needed following the screening process.

To resubmit, you will need to go to the link below and 'Revise Submission' in the 'Submissions Needing Revision' folder.

[LINK]

Please let us know if you have any questions while making these revisions.

Yours sincerely,

Magnus Nordborg

Guest Editor

PLOS Genetics

Kirsten Bomblies

Section Editor: Evolution

PLOS Genetics

Thanks for you patience with this very long review process, which was mostly a Covid-causalty. For what it's worth, the manuscript has clearly improved greatly as result of revision!

As you will see from the reviews below, one reviewer is still unhappy with the manuscript, and does not think you have addressed his/her concerns. The points raised are valid, but the discussion is getting rather philosophical, and I think it would be wrong to hold up the paper over something like this. The paper is clearly written, and readers can judge for themselves. However, as you address the other minor comments raised below, please consider changing what you say slightly to accommodate the fact that there are (non-crazy) people who are not convinced by your interpretation of polygenic scores and GO enrichment.

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: The revised manuscript by Wieters et al, contains considerable changes which present the conducted research in a clear and understandable way. I also want to thank the authors for clearly answering each of the reviewers' comments.

I agree with all the answers to the comments and with the changes that were made to the manuscript.

Congratulations to the authors with this very nice work.

Reviewer #2: The authors did a great job of streamlining and clarifying the manuscript; both the analyses and the narrative of the manuscript are much easier to follow in this revised version, and this clarified some of the points I found confusing in the original manuscript. Polygenic traits are messy and I really appreciate how thorough the authors are in their approach to this study.

However, some of the concerns I raised in the first round of reviews still stand. As a consequence, I am not convinced that the data, as they are presented, fully supports all of the main conclusions put forward by the authors. Below I am reiterating the points that I think were not fully addressed by the authors.

- In my original comments I had discussed the issues I had with the way polygenic scores results were validated and the interpretation of the GO analyses. I appreciate that the authors have been more cautious in reporting the results of the GO analyses. However, the analyses are largely unchanged, and so is their interpretation (the corresponding section of the manuscript is still titled “Polygenic scores and functional enrichments confirm the polygenic basis of growth variation”, as is, by and large, the way they are reported in the abstract).

As it stands, the polygenic scores results look promising but the way they are being validated (by using two replicates to predict phenotypic values for the third) does not look convincing to me. Both I and reviewer 3 suggested that comparing those results (Table S7) to a distribution based on random SNP sets could solve the problem. As mentioned in my original comments, the one polygenic score calculation for a set of 27 random SNPs presented in Table S8 does not seem very convincing (and I still do not think it can be fully compared to the results in Table S7). Unfortunately, but as the authors themselves point out perhaps not surprisingly, the attempt to predict phenotypic values for a subset of accessions based on polygenic scores was not successful.

I concur with the authors that the definition of GO categories is limited by the availability of experimental evidence for gene functions. While in my mind this makes interpreting results from GO analyses even harder (if we cannot trust the categories, why look at them in the first place), looking at it from this perspective could be interesting if there was independent evidence that the GWAS signal really describes the polygenic basis of those traits (i.e., “I am sure that those associations are meaningful, let’s see if I can learn something about them by looking at GO categories”). However, in this case GO categories are used to validate the goodness of the GWAS results, which I find less convincing (i.e., “since there are significantly enriched GO categories, and some of them could be linked to the traits we are studying, it means that sub-significant SNPs accurately describe the polygenic basis of those traits”). As stated before, the fact that there is no overlap between enriched GO categories for GWAS associations with p < 10-4 (Table S9) and ranked GWAS associations (Table S10), makes me even less confident about the interpretation of these results.

Neither this analysis, not the polygenic score results, seem to definitely confirm that the sub-significant GWAS associations describe the polygenic basis of growth variation.

- Likewise, I am still unconvinced that stabilizing selection is necessarily the explanation for the lack of phenotypic differentiation between Chinese and European (one of the other main results reported in the abstract), instead of, for example, physiological limitations or convergent adaptation. While there is no harm in proposing stabilizing selection as a possible reason for the observed pattern, this particular explanation is reported several times throughout the manuscript.

- In response to one of my comments, the authors added to the abstract and elsewhere that the light conditions they used mimicked latitudinal differences in light intensity, which is useful in better framing the scope of the study. However, it is difficult to compare the light intensity reported for the growth chambers (in µmol m-2 s-1, probably of PAR) to the total radiation reported in Figure S3 (kJ m-2 day-1, which I guess is total radiation), especially not knowing the exact wavelength distribution in the growth chambers. It would be helpful to express those values in a format that is more comparable, and explain more in details how the HL and LL are comparable to latitudinal differences; light intensity in HL is more than double that in LL, while, according to Figure S3, average radiation in Spain is only ~1.5 that of North Europe.

- As other reviewers noticed as well, the division in four group is not always well justified. While I am convinced about the groupings for Spain, Northern Europe and China (based on the PCA in Fig S2), Western Europe does not seem to form a group of its own in that analysis.

As an aside, I realize there was a typo in my previous comments; in point 5), I meant to write that heritability for SL and t50 ranges between 0.07 and 0.2, not 0.02 – sorry if that generated any confusion.

Reviewer #3: In their revised manuscript, the authors amend and clarify several points raised by myself and the other reviewers. Although there haven't been major changes in the content, structure and conclusions of the manuscript, the many changes done throughout seemed to me to clarify the diverse and rich set of analysis done.

The authors' replies and counter-arguments to some of my comments did increase my appreciation for their approach of looking at the results of this garden experiment seen from an evolutionary lens (rather than a purely quantitative genetics perspective). Apart from a few remaining comments below, I think the manuscript as is provides enough clarity to allow readers to draw their own conclusions from the interesting correlations found between rosette growth variation and patterns of genomic variation seen across different subsets of these accessions.

I thank the authors for substantially clarifying the statistical treatment of their trait data (and providing with analysis code). This has really helped me understand their approach and better interpret their figures. I was sorry that my suggestions for alternative analysis didn't prove fruitful, I'm sorry for the time you might have spent on this without good returns. My main intention was to gain an understanding of the uncertainty associated with the growth curves. The authors provide a new supplementary figure S4 with individual curves, which goes some way at communicating this.

I have a few remaining questions, which the authors may wish to address.

L148: very few points for Western Europe are visible in Fig S2, and the few that are visible seem to occur mixed in with Spanish and Northern European accessions (reviewer #2 also highlighted this). Perhaps adding transparency to the points would help visualise them? Also, would it be possible to add number of accessions in each group to the figure legend? Am I correct in understanding that these groupings are based on the "admixture group" provided by the 1001genomes project: http://1001genomes.org/accessions.html? If so, it may be worth explicitly stating so, as this would have clarified my previous concerns about the discretisation of these data (also mentioned by reviewer #1 and #2). I think this is implicitly mentioned in Table S1 legend, but might be worth having it more visibly here in the methods.

L152/L197: The description of the experiment as following a "nested design" might be confusing as some of the factors are crossed and others nested. If I understood the design correctly, I think it is technically a split-plot design (https://besjournals.onlinelibrary.wiley.com/doi/full/10.1111/j.2041-210x.2012.00251.x). Within each treatment the experiment was blocked three times (blocks nested within treatment) and all accessions occurred in each of the 6 blocks (accessions and blocks are crossed). Accession and light treatment are therefore indirectly crossed (i.e. all accession-treatment combinations were done - this is what allowed looking at GxE interaction).

L169/L428: "For genotypes without a flowering individual by the end of the experiment, a flowering time value of 59 or 90 days was assigned to HL and LL plants" As I mentioned before, this is not a correct treatment for censored data. The average (and Tukey test) as calculated by the authors is problematic as it is biased by the number of non-flowering individuals in each group. For displaying these data in Fig S10 the authors could consider plotting those individuals that did not flower as points with a different shape at the upper edge of the y-axis (to clearly indicate they are censored data) and not include them in the boxplot calculation. The authors could report the percentage of individuals that flowered for each population. As the claim is that there are no substantial differences between NE and SP it would be useful to see that there is no substantial differences in the proportion of individuals that flowered in this experiment.

Line 218 "as the ratio of." didn't finish the sentence.

L218: "The heritability of GxE could not be estimated..." As I mentioned, I believe GxE variance could be estimated by fitting a random slopes model, if the authors wish to do so, something like: parameter ~ treatment + (treatment|accession) + (1|treatment:block)

Otherwise, it may be best to omit this statement from the methods.

L432: I still could not follow the argument in favour of stabilising selection (also raised by reviewer #2). The high Fst is a consequence of the separation of these lineages (with the bottleneck further increasing their relative divergence - as within-population diversity must have dropped in Chinese populations). But why would that be expected to result in a change in rosette growth? What level of phenotypic divergence (in a polygenic trait) would be expected to accompany a change in relative genetic divergence? Are other traits in Chinese accessions significantly different from European accessions such that a lack of difference in rosette growth is surprising? Are the data also compatible with stabilising selection across the whole range of Arabidopsis (not just Chinese populations) or evolution around a trait optimum, e.g. due to physiological constraints as reviewer #2 mentioned? Without data on fitness disadvantage of extreme phenotypes, it seems hard to infer whether stabilising selection is at play in this system.

L496: "The associated sets were composed of 22 to 37 unlinked SNPs". Is there any overlap in the SNPs for each trait?

L576: It might be worth mentioning that a common association for T50-LL and T50-GxE is not surprising given the high correlation between these traits (Fig S18). In fact, they are not independent traits, as T50-GxE is calculated from T50-LL and T50-HL: possibly there's lower variation in T50-HL, and therefore variation in T50-GxE is mostly due to variation in T50-LL.

Fig S22: have the authors corrected these tests for multiple testing?

L669: the authors nicely caution that Fst and polygenic scores might be confounded by structure. I was wondering if there is a relationship between Fst and SNP effects (at all SNPs, not just p < 1e-4)? As the authors discuss (L489), it is challenging to disentangle variation that is confounded by structure from variation that is not (despite the statistical adjustment with kinship matrix). I was wondering if looking at this relationship would help in any way to investigate the extent of this issue in these data?

L714: "(blue line)" might be meant as "dashed line".

L756/L762: check numbers of referenced figures

L771/L829/L835: missing sample sizes

**********

Have all data underlying the figures and results presented in the manuscript been provided?

Large-scale datasets should be made available via a public repository as described in the PLOS Genetics data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information.

Reviewer #1: None

Reviewer #2: Yes

Reviewer #3: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: Yes: Hugo Tavares

Decision Letter 2

Magnus Nordborg, Kirsten Bomblies

10 Dec 2020

Dear Dr de Meaux,

We are pleased to inform you that your manuscript entitled "Polygenic adaptation of rosette growth in Arabidopsis thaliana" has been editorially accepted for publication in PLOS Genetics. Congratulations!

Before your submission can be formally accepted and sent to production you will need to complete our formatting changes, which you will receive in a follow up email. Please be aware that it may take several days for you to receive this email; during this time no action is required by you. Please note: the accept date on your published article will reflect the date of this provisional acceptance, but your manuscript will not be scheduled for publication until the required changes have been made.

Once your paper is formally accepted, an uncorrected proof of your manuscript will be published online ahead of the final version, unless you’ve already opted out via the online submission form. If, for any reason, you do not want an earlier version of your manuscript published online or are unsure if you have already indicated as such, please let the journal staff know immediately at plosgenetics@plos.org.

In the meantime, please log into Editorial Manager at https://www.editorialmanager.com/pgenetics/, click the "Update My Information" link at the top of the page, and update your user information to ensure an efficient production and billing process. Note that PLOS requires an ORCID iD for all corresponding authors. Therefore, please ensure that you have an ORCID iD and that it is validated in Editorial Manager. To do this, go to ‘Update my Information’ (in the upper left-hand corner of the main menu), and click on the Fetch/Validate link next to the ORCID field.  This will take you to the ORCID site and allow you to create a new iD or authenticate a pre-existing iD in Editorial Manager.

If you have a press-related query, or would like to know about making your underlying data available (as you will be aware, this is required for publication), please see the end of this email. If your institution or institutions have a press office, please notify them about your upcoming article at this point, to enable them to help maximise its impact. Inform journal staff as soon as possible if you are preparing a press release for your article and need a publication date.

Thank you again for supporting open-access publishing; we are looking forward to publishing your work in PLOS Genetics!

Yours sincerely,

Magnus Nordborg

Guest Editor

PLOS Genetics

Kirsten Bomblies

Section Editor: Evolution

PLOS Genetics

www.plosgenetics.org

Twitter: @PLOSGenetics

----------------------------------------------------

Comments from the reviewers (if applicable):

----------------------------------------------------

Data Deposition

If you have submitted a Research Article or Front Matter that has associated data that are not suitable for deposition in a subject-specific public repository (such as GenBank or ArrayExpress), one way to make that data available is to deposit it in the Dryad Digital Repository. As you may recall, we ask all authors to agree to make data available; this is one way to achieve that. A full list of recommended repositories can be found on our website.

The following link will take you to the Dryad record for your article, so you won't have to re‐enter its bibliographic information, and can upload your files directly: 

http://datadryad.org/submit?journalID=pgenetics&manu=PGENETICS-D-20-00468R2

More information about depositing data in Dryad is available at http://www.datadryad.org/depositing. If you experience any difficulties in submitting your data, please contact help@datadryad.org for support.

Additionally, please be aware that our data availability policy requires that all numerical data underlying display items are included with the submission, and you will need to provide this before we can formally accept your manuscript, if not already present.

----------------------------------------------------

Press Queries

If you or your institution will be preparing press materials for this manuscript, or if you need to know your paper's publication date for media purposes, please inform the journal staff as soon as possible so that your submission can be scheduled accordingly. Your manuscript will remain under a strict press embargo until the publication date and time. This means an early version of your manuscript will not be published ahead of your final version. PLOS Genetics may also choose to issue a press release for your article. If there's anything the journal should know or you'd like more information, please get in touch via plosgenetics@plos.org.

Acceptance letter

Magnus Nordborg, Kirsten Bomblies

20 Jan 2021

PGENETICS-D-20-00468R2

Polygenic adaptation of rosette growth in Arabidopsis thaliana

Dear Dr de Meaux,

We are pleased to inform you that your manuscript entitled "Polygenic adaptation of rosette growth in Arabidopsis thaliana" has been formally accepted for publication in PLOS Genetics! Your manuscript is now with our production department and you will be notified of the publication date in due course.

The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript.

Soon after your final files are uploaded, unless you have opted out or your manuscript is a front-matter piece, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.

Thank you again for supporting PLOS Genetics and open-access publishing. We are looking forward to publishing your work!

With kind regards,

Melanie Wincott

PLOS Genetics

On behalf of:

The PLOS Genetics Team

Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom

plosgenetics@plos.org | +44 (0) 1223-442823

plosgenetics.org | Twitter: @PLOSGenetics

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Fig. Genotype origin map.

    Each dot represents the sampling point of a genotype. The genotypes where assigned to Northern Europe (green), Western Europe (red), Spain (purple) and China (orange).

    (RAR)

    S2 Fig. Principal component analysis of 227 genotypes.

    The PCA is based on 1.5 millions SNPs with a minor allele frequency larger than 0.05. The first two principle components explain about 16% of the variance between the genotypes. Regions: China (orange), Northern Europe (green), Spain (purple), Western Europe (red).

    (TIFF)

    S3 Fig. Climatic variation between regions.

    A) Annual average of the monthly radiation (left), monthly average temperature (center) and monthly precipitation (right) estimates for the sampling location of each genotype from Worldclim2 data (estimate per ~1km2). Boxplots with different letters are significantly different according to Tukey’s HSD (p-value < 0.05). Region information: China (CH, 20 unique locations), Northern Europe (NE, 46 unique locations), Spain (SP, 120 unique locations) & Western Europe (WE, 15 unique locations). B) Experimental Set-up in the growth chamber with the light-spectrum and intensity in HL (left) and LL (right). The bottom bar represent the timing of the light.

    (TIF)

    S4 Fig. Projected growth rates and diameter measurements of individual genotypes in HL and LL.

    Predicted growth curves averaged per genotype (from drm function). To represent the regions 5 genotypes per region were chosen randomly. The growth curves were estimated from diameter measurements at different time points (points for three input replicates). Diameter measurements for HL are from day 11 to 46 and for LL from day 24 to 89. Legend: Title: Region and ID; HL (dashed line, red points), LL (solid line, blue points), China (orange), Northern Europe (green), Spain (purple) and Western Europe (red).

    (TIF)

    S5 Fig. GxE for Final Size.

    GxE was estimated based on a glm(Final Size ~ genotype * environment) and is indicated by the color. Each dot corresponds to a genotype with its phenotype in HL (x-axis) and LL (y-axis) (269 genotypes in total). The black dot shows the average over all genotypes with standard deviation. The line shows a linear model for Final Size in LL ~ Finals Size HL.

    (TIF)

    S6 Fig. Regional comparison of the GWAS results within Spain to across Europe.

    qq-plots comparing the GWAS data in S13 Fig (Spain, 117 genotypes, x-axis) to the data from S12 Fig (Europe, 201 genotypes, y-axis). The traits are Final Size (upper row), t50 (2nd row) and Slope (lower row) in HL (left column), LL (middle column) and their GxE (right column). The grey dotted line indicates the neutral expectation.

    (TIF)

    S7 Fig. Regional comparison of the GWAS results within Northern Europe to across Europe.

    qq-plots comparing the GWAS data in S14 Fig (Northern Europe, 83 genotypes, x-axis) to the data from S12 Fig (Europe, 201 genotypes, y-axis). The traits are Final Size (upper row), t50 (2nd row) and Slope (lower row) in HL (left column), LL (middle column) and their GxE (right column). The grey dotted line indicates the neutral expectation.

    (TIF)

    S8 Fig. Regional differences for GxE for each trait.

    GxE for FS (left), t50 (center) and Slope (SL, right). The phenotypic values are based on 217 genotypes of Arabidopsis thaliana. Groups that do not share a letter are significantly different according to Tukey’s HSD (p-value < 0.05). Region information: China (CH, n = 14), Northern Europe (NE, n = 58), Spain (SP, n = 117) & Western Europe (WE, n = 28).

    (TIF)

    S9 Fig. Flowering time from 1001 Genomes.

    Flowering time in 16°C conditions of each genotype plotted for Northern Europe (green, n =) and Spain (purple, n =, based on data from 1001Genomes, 2016). The regions showed no phenotypic difference, as indicated by the same letter (pairwise GLHT, p-value> 0.05).

    (TIF)

    S10 Fig. Flowering time in the experiment.

    Flowering time of each genotype in HL (left) and LL conditions(right). Missing values were replaced with 59 (HL) or 90 (LL) days after sowing. Boxplots with different letters are significantly different according to Tukey’s HSD (p-value < 0.05). Population information: China (CH, n = 22), Northern Europe (NE, n = 84), Spain (SP, n = 121) & Western Europe (WE, n = 53).

    (TIF)

    S11 Fig. Regional differences in Slope.

    The phenotypic values are based on 220 genotypes of Arabidopsis thaliana in HL (left) and LL conditions (right). Groups that do not share a letter are significantly different according to Tukey’s HSD (p-value < 0.05). Region information: China (CH, n = 15), Northern Europe (NE, n = 58), Spain (SP, n = 119) & Western Europe (WE, n = 28).

    (TIF)

    S12 Fig. GWAS results for all phenotypes across Europe.

    Manhattan plots using 201 (or more) genotypes from Europe (Spain and Northern Europe) as input. The traits are Final Size (upper row), t50 (2nd row) and Slope (lower row) in HL (left column), LL (middle column) and their GxE (right column). The dotted line denotes the 5% Bonferroni-corrected threshold.

    (TIF)

    S13 Fig. GWAS results for all phenotypes within Spain.

    Manhattan plots using 117 (or more) genotypes from Spain as input. The traits are Final Size (upper row), t50 (2nd row) and Slope (lower row) in HL (left column), LL (middle column) and their GxE (right column). The dotted line denotes the 5% Bonferroni-corrected threshold.

    (TIF)

    S14 Fig. GWAS results for all phenotypes within Northern Europe.

    Manhattan plots using 83 (or more) genotypes from Northern Europe as input. The traits are Final Size (upper row), t50 (2nd row) and Slope (lower row) in HL (left column), LL (middle column) and their GxE (right column). The dotted line denotes the 5% Bonferroni-corrected threshold.

    (TIF)

    S15 Fig. QQ-plots for GWAS results for all phenotypes across Europe.

    QQ-plots of GWAS using 201 genotypes from Europe (Spain and Northern Europe) as input. The traits are Final Size (upper row), t50 (2nd row) and Slope (lower row) in HL (left column), LL (middle column) and their GxE (right column). The grey line denotes the neutral expectation and the red line the observation from the data. The axes describe the expected (x) and observed (y) values for -log10(p).

    (TIF)

    S16 Fig. QQ-plots for GWAS results for all phenotypes within Spain.

    QQ-plots of GWAS using 117 genotypes from Spain as input. The traits are Final Size (upper row), t50 (2nd row) and Slope (lower row) in HL (left column), LL (middle column) and their GxE (right column). The grey line denotes the neutral expectation and the red line the observation from the data. The axes describe the expected (x) and observed (y) values for -log10(p).

    (TIF)

    S17 Fig. QQ-plots for GWAS results for all phenotypes within Northern Europe.

    QQ-plots of GWAS using 83 genotypes from Northern Europe as input. The traits are Final Size (upper row), t50 (2nd row) and Slope (lower row) in HL (left column), LL (middle column) and their GxE (right column). The grey line denotes the neutral expectation and the red line the observation from the data. The axes describe the expected (x) and observed (y) values for -log10(p).

    (TIF)

    S18 Fig. Correlation of phenotypic traits.

    Pearson correlations for each pair of traits. Colored boxes show significant correlations (p<0.05 after multiple testing correction (FDR correction) and correction for populations structure (lmekin)) for 193 genotypes across experiments. The significance is illustrated by box size (larger box represents lower p-values) and the color shows the direction and strength of correlation. Abbreviations are: HL = high light, GxE = Genome x Environment interaction, LL = low light, SL = Slope, FT = Flowering time, FS = Final Size, DiamFieldM2 = Diameter in Field conditions after 2 Months, Biomass21d = Biomass in controlled (HL) conditions after 21 days.

    (TIF)

    S19 Fig. Functional enrichment dendrogram for GO enrichment.

    The enrichment is either based on ranking genes by p-value of the nearest SNP in GWAS (columns 1–9) or Fst of the gene (column 10). The GO terms are arranged into 9 clusters of similar function on the right side of the plot. Depicted are only enrichments with a p-value < 0.001.

    (TIF)

    S20 Fig. Loss-of-function alleles per population.

    Based on data from Monroe et al. (2018). The sum of LOF alleles per genotype for Northern Europe (green, n =) and Spain (purple, n =). The regions were not different from each other (GLHT: z-value = 0.634, p-value = 0.526, negative binomial distribution).

    (TIF)

    S21 Fig. Loss-of-function alleles per population.

    Based on data from Xu et al. (2019). Boxplot of the sum of LOF alleles per genotype for each region. Boxplots with different letters are significantly different according to Tukey’s HSD (p-value < 0.05). Region information: China (CH, n = 21), Northern Europe (NE, n = 84) & Spain (SP, n = 121).

    (TIF)

    S22 Fig. Polygenic Scores and regional differentiation for each trait.

    Summary results from the analysis after Berg and Coop (2014). Each boxplot depicts the polygenic scores of a trait for genotypes from Northern Europe (green) & Spain (purple). Boxplots with different letters are significantly different according to Tukey’s HSD (p-value < 0.05). Furthermore, the plot contain information about the number of SNPs used as input, the Qx score for excess variance in SNPs associated with the trait and the p-value of the Qx-analysis. Traits: FS = Final Size, t50, SL = Slope, HL = High Light treatment, LL = Low Light treatment.

    (TIF)

    S23 Fig. Correlation of phenotypic traits and climate.

    Pearson correlations for each pair of traits/climatic variable. Colored boxes show significant correlations (p<0.05 after multiple testing correction (FDR correction) and correction for populations structure (lmekin)) for 195 genotypes across experiments. The significance is illustrated by box size (larger box represents lower p-values) and the color shows the direction and strength of correlation. Abbreviations are: HL = high light, GxE = Genome x Environment interaction, LL = low light, SL = Slope, FT = Flowering time, FS = Final Size, DiamFieldM2 = Diameter in Field conditions after 2 Months, Biomass21d = Biomass in controlled (HL) conditions after 21 days, Radiation in kJ/m2/day, PC1/2_growS = Principle component 1 and 2 of all climatic data in the estimated growing Season (explaining 88.7 and 10.7% of the variance), PC1/2_T = Principle component 1 and 2 for climatic variables related to Temperature (explaining 98.1 and 1.3% of the variance), PC1/2_P = Principle component 1 and 2 for climatic variables related to Precipitation (explaining 89.8 and 8.22% of the variance).

    (TIF)

    S1 Table. Information on the genotypes used in this study, with their country of origin, assigned region, Genotype name and ID in 1001 Genomes, info on the sampling location and position (latitude and longitude) and the Collector.

    In the second part of the table the climatic information on the respective location is summarized with: Number of growing months; in the growing season: average Temperature [°C], Soil water content [%], Water vapor pressure [kPa], Wind speed [m s-1], Radiation [kJ m-2 day-1], Rain [mm]. Afterwards the Bioclim variables 1 t 19 from the Worldclim database (http://worldclim.org/version2). After this the first 2 PCs for PCA on data based on growing season, Temperature variables from bioclim data and precipitation variables from bioclim data.

    (XLSX)

    S2 Table. Raw phenotypic measurements for each plant In the experiment.

    Replicate is the block the plant was growing in with the corresponding tray number and row and column for position on the tray (5 rows and 7 columns per tray). The “diam” measurements are diameter measurements where the number corresponds to days after sowing.

    (XLSX)

    S3 Table. Genotypic mean of each genotype after correction for positional effects.

    Information of the usage of genotypes: Phenotype_analysis is 1, if the genotype was used for phenotype-related analysis (regional differentiation, Qst) and GWAS is 1, if the genotype was used in GWAS and following analyses (also GO enrichment & polygenic scores). Additionally data from other experiments that was used for correlations: DiamFieldM2: Diameter in mm in the field in Cologne, after 2 months; Hypocotyllength: length of hypocotyls in mm in HL conditions, 15 days after sowing; Biomass21d: Plant dry weight in g after 21 days after sowing in HL conditions; FT_10/FT_16: flowering time in 10/16°C from 1001 Genomes, 2016.

    (XLSX)

    S4 Table. Estimated heritabilities and pseudo-heritability from EMMAX.

    Rows contain the input sample size (N), heritability (H2) and pseudo-heritability for each trait, treatment and population. The p-value of a heritabily is the genotype effect of the mixed linear model.

    (XLSX)

    S5 Table. Pairwise comparisons of phenotypes for each trait and treatment.

    The mean difference between traits is given with Z- and p-value from a GLHT of a glm(parameter~population).

    (XLSX)

    S6 Table. Associated SNPs for the different datasets, traits and environment.

    For each associated SNP the Chromosome, Base, minor allele frequency (MAF), -log10(P) and effect size are given. The LD for the focal SNP was estimated, with the number of SNPs and genes within the LD range. The p-value of two SNPs that exceeded the Bonferroni threshold are marked in bold, the others were just below threshold.

    (XLSX)

    S7 Table. Testing the accuracy of polygenic trait predictions.

    A. Polygenic scores were computed based on the phenotypic measurements for two replicates, and correlated with the phenotype observed for the third replicate. Correlation was tested with a Spearman rank correlation test Rho..Nr_SNPs: number of SNPs associated with each trait at p<10–4. B. SNPs associated with the phenotype at sub-significant level improve significantly the phenotype prediction but random SNPs show that population structure plays an important role. Rho_associated shows the correlation between polygenic score and the genotypic values. Based on 1000 random samples of an equal number of SNPs, a distribution of random Zscores was computed and compared to the spearman correlation of the prediction of associated variants to the input phenotypes (Rho_associated). The distributions of spearman correlations of the 1000 random sets is described with the median (Rho_random_median), 95th quantile (Rho_random_95quantile) and the maximal Rho (Rho_random_max). The correlation obtained with random SNP set is also often significant at p<0.05 (Percentage_significant), but the maximum correlation coefficient (Rho_random_max) is always markedly lower than the one obtained with sub-significant SNPs (Rho_associated).

    (XLSX)

    S8 Table. Results from Polygenic adaptation test after Berg & Coop (2014).

    The trait column contains the respective traits that were used as input and a random set of equal size which was used to predict FSHL in the last row. Qx is the test statistic for a signal of polygenic adaptation using all phenotypic data. Rho are the results from a spearman correlation of Z-scores predicted versus the input phenotypes. The regional Z-values for Northern Europe and Spain are the region specific effect on the trait. P-values from each test are in parentheses. The SNPs column contains the number of input SNPs for the estimation of polygenic adaptation (after pruning).

    (XLSX)

    S9 Table. GO-enrichment of genes in LD (within 10kb) to SNPs with p < 0.008 (based on permutation) in a GWAS for the respective trait.

    Shown are terms with an enrichment < 0.001. GO.ID and term give information on the enriched GO term. Annotated states all genes that are in the term, Significant is the number of genes that are associated in the input data set and Expected the number of genes that are expected to be enriched by chance. The resultFisher gives the Fisher score for enrichment. We only report GO terms with >5 genes in them.

    (XLSX)

    S10 Table. GO-enrichment of all genes ranked by their Fst or p-value of the closest SNP in a GWAS of the respective trait.

    Shown are terms with an enrichment < 0.001. GO.ID and term give information on the enriched GO term. Nr_Genes is the number of genes in the respective term. The resultKS gives the Kolmogorov-Smirnov score for enrichment.

    (XLSX)

    S11 Table. Loadings of the climate PCAs for S23 Fig.

    The input variables for the respective PCA are in the column Climatic_variable and the loading for PC1 and PC2 are in the following columns. The PCAs were performed with data within the projected growing season (PCA_growing_season, 185 unique locations), for bioclimatic variables related to temperature (PCA_temperature, 180 unique locations) and bioclimatic variables related to precipitation (PCA_precipitation, 180 unique locations).

    (XLSX)

    S1 File. R Markdown detailing the statistical analysis of rosette diameter variation.

    (HTML)

    Attachment

    Submitted filename: PloSGenetics_Wieters_et_al_2020_response_to_reviewers_final.docx

    Attachment

    Submitted filename: response_reviewer_minorBW2.docx

    Data Availability Statement

    All relevant data are within the manuscript and its Supporting Information files. Raw image data and image analysis scripts are stored in the DRYAD repository (doi:10.5061/dryad.s1rn8pk5m).


    Articles from PLoS Genetics are provided here courtesy of PLOS

    RESOURCES