Abstract
Understanding how genetic variation interacts with the environment is essential for understanding adaptation. In particular, the life cycle of plants is tightly coordinated with local environmental signals through complex interactions with the genetic variation (G x E). The mechanistic basis for G x E is almost completely unknown. We collected flowering time data for 173 natural inbred lines of Arabidopsis thaliana from Sweden under two growth temperatures (10°C and 16°C), and observed massive G x E variation. To identify the genetic polymorphisms underlying this variation, we conducted genome-wide scans using both SNPs and local variance components. The SNP-based scan identified several variants that had common effects in both environments, but found no trace of G x E effects, whereas the scan using local variance components found both. Furthermore, the G x E effects appears to be concentrated in a small fraction of the genome (0.5%). Our conclusion is that G x E effects in this study are mostly due to large numbers of allele or haplotypes at a small number of loci, many of which correspond to previously identified flowering time genes.
Author Summary
Many traits are influenced by genetic variation in interaction with the environment, so called G x E variation. In agriculture, for example, different varieties are optimal in different environments. In evolution, G x E is also crucial for local adaptation. Identifying the genes underlying G x E has proven extremely challenging, however. Using a collection of inbred lines of the model plant Arabidopsis thaliana, we meausured flowering time under two temperature regimes, and scanned the genome for polymorphisms responsible for variation in this trait. Although most of the variation is due to G x E, genome-wide scans using SNPs only revealed direct genetic effects (G), and failed to reveal any significant G x E associations. In contrast, scanning the genome using local windows of polymorphism suggested that almost all the observed variation can be explained by 2% of the genome. Previously identified flowering time genes are strongly overrepresented in these regions, and our results are compatible with a model under which G x E is mainly due to many alleles at a relatively small number of loci.
Introduction
The transition from vegetative to reproductive growth is a key developmental step in the life cycle of higher plants, and its timing is tightly regulated by both genes and environment, often in an interactive manner, so that the effect of genetic variants depends on the environment [1, 2]. Such genotype by environment interactions (G x E) have long been of interest to quantitative geneticists, as they are crucial for local adaptation [3, 4] and for improving agricultural yield. In particular, understanding G x E variation is considered essential for predicting the effects of climate change on ecology and agriculture [2, 5].
Analytically, G x E can be described in terms of “reaction norms” as genetic variation in the phenotypic response to the environment [2]. The phenotypic variation can be decomposed into genetic effects that are the same across environments (G), effects that are different across environments (G x E), and non-genetic environmental effects (E). Many approaches have been proposed to identify loci contributing to G x E variation [2, 6]. In the context of genome-wide association studies (GWAS), Korte et al. [7] proposed a multi-trait mixed model (MTMM) that can also be used to study G x E [2, 5, 7].
Attempts to map G x E variation, whether using classical linkage mapping or GWAS [4, 5, 8–10], have generally revealed loci explaining only a small fraction of the G x E variation. The most likely explanation for this “missing” G x E heritability is that the underlying genetic architecture involves either rare alleles of relatively large effect [2], or large numbers of polymorphisms of small effect [5, 8, 9].
Here we present a GWAS for flowering time at two temperatures (10°C and 16°C; see Methods) in a population of 173 A. thaliana lines from Sweden [11] (S1 Fig, S1 Table). Our goal was twofold: first, we wanted to investigate our ability to map polymorphisms responsible for G x E interactions; second, we wanted to characterize the main determinants of flowering time variation in Sweden, because although many GWAS have mapped genes responsible for flowering time variation in A. thaliana [5, 12–15], this has almost always been done in global samples, and there is reason to believe that the relatively small number of significant associations in these attempts is due to excessive genetic heterogeneity in these samples. The genetics of flowering time in local samples could be simpler, increasing the power of GWAS [12].
Results
Reaction norms and G x E
The increase in growing temperature from 10°C to 16°C had a dramatic effect on flowering behavior, significantly accelerating flowering in 29% of the lines, significantly decelerating flowering in 16% of the lines, and generally increasing the variance both within and between lines (t-test, q-value < 0.01; Fig 1; S1–S2 Tables). Broad-sense heritabilities (H 2) were extremely high (over 90%) at both temperatures (albeit significantly lower at 16°C, p < 0.01), demonstrating strong genetic effects, in agreement with published results (Table 1) [12, 16, 17]. We partitioned the variance in flowering time using a model with four components: genotype (G, the variance attributable to genome-wide relatedness), environment (E), G x E, and noise (see Methods). This analysis revealed massive G x E effects. The G x E effects are largely due to the differences in the reaction norm between the subsets in Fig 1. For example, 67.9% of the variation among lines with accelerated flowering is due to direct genetic effects (Table 2).
Table 1. Broad-sense heritability of flowering time in both temperatures, for the full sample and separately for the subsets of lines that responded differently to the change in temperature.
H 2 | df | N | μ | |
---|---|---|---|---|
Whole population | ||||
10°C | 0.97 | 173 | 549 | 92.8 |
16°C | 0.91 | 173 | 612 | 99.0 |
Accelerated flowering | ||||
10°C | 0.98 | 50 | 162 | 81.3 |
16°C | 0.95 | 50 | 211 | 60.8 |
Decelerated flowering | ||||
10°C | 0.97 | 27 | 97 | 102.2 |
16°C | 0.87 | 27 | 74 | 152.7 |
No response | ||||
10°C | 0.95 | 94 | 290 | 96.12 |
16°C | 0.74 | 94 | 327 | 109.78 |
Table 2. Genetic and environmental effects on flowering time variation.
Single-trait (%) | Multi-trait (%) | |||||
---|---|---|---|---|---|---|
G10°C | G16°C | G | G x E | E | noise | |
Whole population | 99.95 | 99.99 | 28.43 | 65.94 | 5.62 | 10−4 |
Accelerated flowering | 99.98 | 99.99 | 67.90 | 5.79 | 26.30 | 10−4 |
Decelerated flowering | 99.91 | 99.98 | 8.35 | 24.31 | 67.33 | 10−4 |
No response | 86.75 | 70.16 | 36.29 | 29.1 | 14.8 | 19.8 |
GWAS of G x E
We attempted to map the polymorphisms responsible for the G x E effect using genome-wide association using a mixed model that allows multiple correlated traits (MTMM [7]). Three different association tests were carried out: a “full SNP test” that compares a full model including the effect of marker genotype and its interaction with environment against a model with no (fixed) SNP effect; “common SNP effect test” that compare a model with genetic marker (a genetic model) against no SNP effect, and; “interaction (GSNP x E) effect test” that compares the full model against the genetic model [7]. In agreement with previous results, MTMM appeared to correct for confounding population structure well, whereas a standard multi-linear regression model (MLR) produced massively skewed p-values (S2 Fig).
The full SNP test identified two peaks with genome-wide significance (Fig 2A). The strongest association was centered around position 3,180,721 on chromosome 5, in the promoter region of the well-known flowering regulator FLOWERING LOCUS C (FLC) (Fig 2B), which has previously been shown to play a major role in natural variation for flowering time, but has generally been difficult to map using GWAS [5, 12, 13], presumably because of extensive genetic heterogeneity [18, 19]. Interestingly, the FLC peak can be seen using both the common SNP and the GSNP x E effect tests, but was significant in neither, suggest that it has a weak GSNP x E effect as well as a weak common SNP effect.
The behavior of the second strong association is very different. This association, centered on position 9,005,735 on chromosome 2, is more significant under the common SNP effect test, and is not present under the GSNP x E effect test, suggesting that the polymorphism has the same effect in both temperatures. The peak is quite broad (Fig 2C) and contains approximately 13 genes, none of which are known to be involved in regulating flowering time. However, one of them, FIONA1 (FIO1), is related to the circadian clock, and the null mutant shows early flowering [20]. Furthermore, GWAS using indel markers identified the most significant association (p-value = 2.97E-08; Fig 2D) as a insertion of two nucleotides in the 9th (last) exon of FIO1, which would result in a frameshift, however, this exon appears not to be present in mRNA-seq data from leaves [21], and appears to be specific to A. thaliana. A stop codon is found 26-amino acids upstream of the insertion in the closely related Arabidopsis lyrata and Capsella rubella. The putative frameshift polymorphism is due to eight vs nine GA repeats, and is in strong linkage disequilibrium with several non-synonymous polymorphisms, which are slightly less strongly associated with flowering time (S3 Fig). Although definitive proof in the form of transgenic experiments (allele swapping) is missing, polymorphism in FIO1 is a strong candidate for the major common effect on chromosome 2. The common SNP effect test revealed no further significant associations, and the GSNP x E effect test revealed no significant associations at all, despite the fact that G x E effects account for 66% of the phenotypic variance (Fig 2, Table 2).
Enrichment of a priori candidates
Our GWAS identified two associations with genome-wide significance, one of which corresponds to a clear a priori candidate (FLC). Given that the number of a priori candidates (genes known to be involved in flowering time) is on the order of a percent of total genes (S3 Table), one out of two is obviously more than expected by chance. To investigate whether there is an overrepresentation of a priori candidates among associations that do not reach genome-wide significance as well, we calculated the enrichment as a function of significance threshold [12]. Because an association that is significant at a certain level will generally be surrounded by many SNPs that are less strongly associated (giving rise to a peak of association), we calculated enrichment at a given level after removing all peaks (defined as 30 kbp windows) containing SNPs that were already significant using a more stringent threshold.
For the full SNP test, a significant enrichment of a priori candidates persists as we increase the significance threshold (i.e., lower the stringency) to 10−5 (Fig 3). Although associations at this level are far from significant in the genome-wide sense, the enrichment of a priori candidates implies that the false-discovery rate (FDR) among these candidates is less than 20% [12]. Three a priori candidates were identified using this approach (Table 3): FLC (which also reaches genome-wide significance); SHORT VEGETATIVE PHASE (SVP), which mediates ambient temperature signaling by regulating FLOWERING LOCUS T (FT) [22], and has been shown to be involved in natural variation in other samples [23]; and VERNALIZATION INSENSITIVE 3 (VIN3), which is involved in the epigenetic silencing of FLC during vernalization, but has hitherto not been identified in natural populations [20, 24]. Some of the associated SNPs were found in promoter regions (common SNP effects of FLC, VIN3). These SNPs are excellent candidates for being causal, and it seems likely that we simply lack the power to pick them up in a genome-wide scan. What the FDR is among the approximately 10 peaks that do not correspond to a priori candidates but are significant using the same threshold is not known (S4 Table).
Table 3. A priori candidates identified at FDR less than 20% by SNP association test.
Full SNP test | Common SNP | GSNP x E | |||||
---|---|---|---|---|---|---|---|
Gene name | Chr | Position | p-value (MAF) | Position | p-value (MAF) | Position | p-value (MAF) |
SVP | 2 | 9593397 | 2.96E-06 (25.9) | 9593397 | 7.36E-06 (25.9) | 9580035 | 4.8E-04 (32.8) |
FLC | 5 | 3180721 | 2.72E-09 (39.1) | 3180721 | 3.58E-07 (39.1) | 3184162 | 9.85E-05 (29.3) |
VIN3 | 5 | 23249568 | 8.48E-06 (10.9) | 23249568 | 1.47E-06 (10.9) | 23249256 | 0.014 (19.5) |
The results for the common SNP effect test were very similar to the full SNP test, and the same a priori candidates were identified (Fig 3, Table 3). However, the GSNP x E effect test showed no evidence for significant enrichment at any p-value threshold, suggesting that if low power is the reason for the missing G x E associations, then the power is low indeed.
Finally, we note that if causal variants are strongly correlated with global relatedness, power to detect them may be greatly decreased [25, 26]. We therefore scanned for associations without correction for relatedness (using MLR), as well. The associations from such an analysis are of course extremely inflated, but it is possible to use the enrichment analysis described above, as it does not rely on well calibrated p-values (S2, S4 Figs). However, this approach identified only a subset of the candidate genes already identified using MTMM.
Using local relatedness to improve power
Statistical power in GWAS may be decreased by allelic heterogeneity, which reduces the marginal contribution of individual polymorphisms at a genetic locus. One possible way around this is to consider the joint effect of all polymorphisms at a genetic locus using a mixed model. Instead of mapping individuals SNPs as fixed effects, we estimate the variance component that is due to local relatedness around each gene (using a 15 kbp window on each side of the coding region) and compare that to the variance component that is due to the rest of the genome [21]. We refer to these effects as “local” and “global”, respectively, and we also include environmental and G x E components.
Three different tests were carried out: a “full local test” that compares a full model, including local and global effects and their interactions with E, with a null model that does not include any local effect; a “common local effect test” that compares a local model that does not include a Glocal x E with the null model, and; an “interaction (Glocal x E) effect test” that compares the full model with the local model. For each test, log-likelihood ratios were calculated (see Methods).
Result for the full local and the common local effect tests were strongly correlated with their corresponding GWAS results (presented above), especially for genes with reasonably strong association with flowering, while GSNP x E and Glocal x E showed much lower correlation (S5 Fig). Because the variance component likelihood ratios are not calibrated, it is difficult to say whether any particular effect is significant. However, we can assess this using overrepresentation of a priori candidates as for MTMM above. In all tests (full local, common local and Glocal x E), a significant enrichment of a priori candidates exist for likelihood ratios of 5 or higher, for which FDR is less than 20% (Fig 4). Notably, this effect was observed for the Glocal x E effect test as well, whereas GSNP x E showed no evidence of overrepresentation (Fig 3). Thus the variance component analysis appears to capture G x E effects not captured by the marginal SNP GWAS.
A total of four flowering time genes showed significant peaks at the log-likelihood threshold of 5 (Table 4). FLC and VIN3 showed high common local effect as well as common SNP effect, while FPA, an FLC suppressor in the autonomous pathway [28], showed up as a Glocal x E locus. Furthermore, CENTER CITY (CCT) was significant in using the full local test. CCT, also known as CRYPTIC PRECOCIOUS (CRP), is a flowering regulator that acts as a promoter of FT and a suppressor of FLC [29, 30]. It is closely linked to the well-known flowering time locus FRIGIDA (FRI) and has previously been detected in GWAS [12].
Table 4. A priori candidates identified at 20% FDR by local association test.
ID | Start | End | Full local test | Glocal (LR) | Glocal x E (LR) | Candidate genes |
---|---|---|---|---|---|---|
14 | AT2G43350 | AT2G43410 | 5.45 | 0.04 | 5.4 | FPA |
25 | AT4G00450 | AT4G00590 | 6.37 | 4.57 | 3.31 | CCT |
30 | AT5G10090 | AT5G10260 | 11.69 | 9.65 | 4.1 | FLC |
38 | AT5G57345 | AT5G57410 | 6.31 | 6.31 | 0.1 | VIN3 |
The genomic architecture of associations
Fig 5 shows the distribution of common (i.e., G) and G x E signals across the genome, for SNPs as well as for local variance components. The three highest peaks of Glocal (S5 Table) overlap peaks of common GSNP effect centered around FIO1 on chromosome 2, and FLC on chromosome 5, and position 23,544,472 on chromosome 5. This overlap suggests that a small number of SNPs identified by MTMM might be responsible for the local variance components. Although there are no obvious flowering time candidates in the final region on chromosome 5, a recent study reported that gene in the region, MULTICOPY SUPRESSOR OF IRA 1 (MSI; AT5G58230) delays the transition to flowering [31]. The most significant peak of Glocal x E only was found at the top of chromosome 1 (963,400-1,053,719) and includes eight genes, none of which are known to be involved in flowering.
Finally, we consider the question of genetic architecture. For a Mendelian trait, all the phenotypic variation is due to a single locus, whereas for a truly Fisherian trait, the contribution of a genomic region should be proportional to its size (relative to the entire genome). Flowering time is clearly neither. As shown in Table 5, the 144 SNPs identified using MTMM (with the full SNP test using the 20% FDR defined in Fig 3) jointly explain 22% of the phenotypic variation as common (to both environments) genetic variation (G), and 31% as G x E variation. The remaining 3.7 million SNPs (of which 1 million have a minor allele frequency less than 0.1) explain only 6% as G and 35% as G x E. If we instead turn to the local variance components, the identified regions, comprising roughly 2% of the genome, explain 26% as G and 67% as G x E (randomly chosen regions explain on average at total of 7.5%; p = 0.001; S6 Fig), supporting the observation that the local variance component approach seems to have significantly greater power to capture G x E effects, but does not do better when it comes to common effects. Importantly, the local variance components explain essentially all the available genetic variation, and combining SNPs and local variance components yield almost no improvement (Table 5). It is also worth noting that the less than 10% of the identified regions that contain one of the a priori candidates explain almost 40% of the variation, a clearly significant overrepresentation (p = 0.001; S6 Fig).
Table 5. Summary of variance explained by SNPs identified using MTMM and VCA.
Gsig. | Gsig. x E | Gnot_sig. | Gnot_sig. x E | E | Noise | |
---|---|---|---|---|---|---|
MTMM | ||||||
All (144 SNPs) | 21.53 (34.91) | 31.27 (8.22) | 6.45 (5.69) | 34.90 (13.31) | 3.76 | 2.14 |
A priori (9 SNPs) | 9.63 (27.02) | 7.50 (7.2) | 18.83 (13.47) | 58.52 (13.32) | 5.51 | 0.01 |
VCA | ||||||
All local (2.0% of genome; 43,554 SNPs) | 25.57 (27.38) | 67.09 (21.36) | 1.14 (0.06) | 0.14 (0) | 6.05 | 0.02 |
A priori (0.13% of genome; 3,101 SNPs) | 11.03 (8.85) | 30.76 (10.80) | 17.18 (12.64) | 35.47 (8.84) | 5.55 | 0.01 |
MTMM + VCA | ||||||
All SNPs + local (43,569 SNPs) | 25.58 (27.54) | 67.12 (21.42) | 1.11 (0.06) | 0.13 (0) | 6.06 | 0.02 |
A priori (3,105 SNPs) | 11.53 (9.43) | 30.89 (10.87) | 16.69 (10.98) | 35.33 (8.7) | 5.55 | 0.01 |
Discussion
Mapping polymorphisms responsible for G x E
The main purpose of this study was to investigate the genetic architecture of G x E variation using a population and experimental setting where such variation was massive. Roughly 66% of the variation for flowering time among lines across environments in this study is due to G x E (Table 2), yet a standard GWAS method failed to detect a single significant SNP association. Indeed, even when considering enrichment for a priori candidates using less stringent thresholds, there is no trace of G x E associations. The same was true using various summaries of the traits, like the slope of the reaction norm. In contrast, there is ample evidence for polymorphisms that do not interact with the environment (include two that reach genome-wide significance), although this type of variation is only 28% of the phenotypic variation.
The much-discussed “missing heritability” problem in human genetics refers to the fact that individually identifiable (mappable) SNPs do not explain the genetic variation [32]. Although many explanations have been proposed, the simplest one is that the marginal contributions of the underlying variants are too small (due to a combination of allele frequency and effect size) for them to be identified given the statistical power of the study. This explanation is supported by studies that increase power by increasing sample size [33] or that use variance components to estimate the joint contribution of all SNPs rather than trying to identify marginal effects [34].
In the present study, we have no “missing heritability” for common genetic variation, since the SNPs we identified account for almost all of this (22% vs 28%; Tables 2 and 5). However, we do have “missing heritability” for G x E variation, where the identified SNPs explain less than half of the existing variation (31% vs 66%; Tables 2 and 5). Why this difference between G and G x E? The obvious explanation is again power. Under some scenarios, G x E effects are more difficult to detect for purely statistical reasons [7], and it is also possible that the distribution of allele frequencies and/or effect sizes differ. Simulation studies have likewise suggested that substantial genetic risk score-by-environment interactions may exist, although marginal G x E effects are undetectable [35].
The notion that power is involved is supported by the fact that we are able to account for the missing G x E variation fully using variance component methods that estimate the joint contribution of multiple SNPs (Tables 2 and 5). However, these results also demonstrate that the G x E variation is not Fisherian in the sense of being spread out infinitessimally thinly across the genome. Instead, 8 small regions, comprising about 0.5% of the genome, appear to explain almost all the G x E variation (S5 Table). This suggests that G x E variation for flowering is due to a relatively small number of genes harboring a large number of functionally distinct alleles (or haplotypes), i.e., allelic rather than genetic heterogeneity. This is consistent with what is known about allelic variation at several flowering time loci [18, 36, 37], and perhaps also with the general observation that different linkage mapping experiments, which are insensitive to allelic heterogeneity, consistently seem to identify the same small number of flowering loci, several of which have not been identified using GWAS [38, 39]. Dissecting these complex regions and haplotypes further will likely require painstaking experimental work, as linkage disequilibrium is typically too extensive for fine-mapping [12, 18].
It should be noted that the extensive allelic heterogeneity for G x E is in contrast to several examples from crops [40, 41]. A possible explanation for this is that domestication and breeding increased the frequency of rare alleles. The pattern in A. thaliana, on the other hand, suggests strong local adaptation. There is no obvious correlation between flowering time and geography in our data, but this is not surprising given the strong G x E effects, and the existence of micro-scale climate variation. In order to elucidate the selective forces acting on flowering time variation, field experiments will be required [14, 42].
Flowering time control in Swedish lines
A secondary purpose of this project was to investigate the genetics of flowering time variation in a local population sample from Sweden. From an a priori list of more than hundred flowering time genes, we identified five genes, FLC, SVP, VIN3, CCT and FPA at an FDR of less than 20% (S5 Table). FLC, in particular, clearly has a major effect, in agreement with its role as a major flowering repressor and central player in the vernalization response [43]. Although flowering time is determined by the interaction of huge networks that include the photoperiod, gibberellin, vernalization, temperature, autonomous pathways [44], we found that all identified flowering time genes in our analysis were tightly related to the regulation of FLC and FT (S7 Fig). Briefly, floral initiation starts immediately by upregulation of FT when warm temperature returns after FLC is epigenetically silenced by VIN3 during a cold period [20, 24]. CCT and FPA suppresses FLC in the autonomous pathway [29, 30, 45]. SVP has been reported as another flowering regulator that suppresses FT independent of FLC [46]. It should be noted that CCT is closely linked to FRIGIDA (FRI, distance is 13.97 kbp), a strong up-regulator of FLC [47–49] known to harbor, strong allelic heterogeneity and massive haplotype sharing in global samples (over 250 kbp [50, 51]). Although FRI is not known to be segregating in the Swedish population, it is clearly possibly that FRI alleles could lead to confounding at CCT [12]. In addition to known flowering time genes, we also identified one possible novel gene. Although our FDR approach only works for a priori candidates, the peak in FIO1 is clearly significant at the genome-wide level, and the association is currently being confirmed experimentally.
With the exception of FLC and SVP, none of the genes identified here have previously been shown to be important in natural variation. This demonstrates the advantages of using a local sample for GWAS when working on a trait important in local adaptation, and is in agreement with the G x E results above. Given that allelic heterogeneity can have a major effect on the power of GWAS even within Sweden, it should come as no surprise that flowering time is recalcitrant to GWAS in global samples [12].
Materials and Methods
Plant materials and growth conditions
173 Swedish lines and Col-0 were used for experiments (S1 Table). These lines, and all genome information, including SNPs and short indels, are described elsewhere [11].
Seeds were sown on soil and stratified for three days at 4°C in the dark. They were then transferred into a single pot after germination. All plants were grown in MTPS144 Conviron walk-in growth chambers (Winnipeg, MB, Canada) set to long-day conditions (16 h photoperiod) under 10°C or 16°C constant temperatures. Periods from germination to presence of first buds were recorded as flowering time for multi-individuals for each line. Measurements were taken twice a week, until 190 days from germination.
Statistical analysis
Broad sense heritability
The broad-sense heritability (H 2) was calculated using all individuals as V G/V P, where V P is the total phenotypic variance and V G is the genetic variance (estimated from the between-line phenotypic variance).
Genome-wide association mapping
For GWAS, the multi-trait mixed model were performed using LIMIX [27] using the model
(1) |
where Y is a vector of n × p phenotypic means (one mean for each of n lines in p environments), μ 10 and μ 16 are temperature specific mean values, x is the vector of genotypes to be tested (SNPs or indels), A is a trait design matrix (environment), B is the effect size estimate corresponding to A, R is a genomic relatedness (sample-sample covariance matrix) estimated from SNPs, C is the trait-trait covariance matrix, Q is a trait-trait noise covariance matrix, and and δ are scaling factors. SNPs and indels were analyzed separately in the model and R calculated with only SNPs was used for both analyses. Three different tests using likelihood ratio test were carried out [27]:
The full model with A = I p tested against a null model x = 0. This test identifies “any effect” including environment persistent and specific marker (SNPs or indel) effects between two environments.
To identify “interaction effect” (GSNP x E) as environment specific marker effects, the full model was tested against a genetic model as A = 1 1,p.
To identify “common SNP effect” as environment persistent marker effects, the genetic model was tested to the null model.
Standard multi-linear regression (MLR) analysis was also conducted using LIMIX function as well as the tests in MTMM. In both MTMM and MLR, Bonferroni-corrected 5% significance thresholds were used. Rare alleles (minor allele frequency less than 10%) were not included in final results and Bonferroni corrections.
Variance components analysis (VCA)
VCA was conducted by LIMIX with the model
(2) |
where U local and U global are random effects corresponding to local and global relatedness, respectively, and ψ is noise. U local and U global can each be decomposed into an environment-persistent and an environment-specific variant component:
(3) |
(4) |
(5) |
Here R local and R global are sample-sample covariance matrices that estimate genetic relatedness (kinship) based on local and global SNPs, respectively. The local region of a gene was defined as the gene body plus 15 kbp from the 5’ and 3’ UTR, respectively, and global was defined as the rest of the genome. The parameters and are environment-persistent variances and covariances for the local and global genetic terms, and , , σ 1,2 are the noise covariance parameters. To evaluate the “full local” (including environment persistent and specific effect), “common local” (environment persistent effect) and “Glocal x E” (environment specific effect) effects, three different tests were carried out and log likelihood-ratio was calculated:
A “full local effect” was tested by comparison of a full model, including local and global effects and their interactions with E, with a null model that does not include any local effect (U local).
A “common local effect” was tested by comparison of a local model that does not include an interaction effect between the local effect and E (as , ) with the null model.
An “interaction (Glocal x E) effect” was tested by comparison of the full model with the local model.
The null model was also used to determine genetic (global), environmental and G x E effects on flowering time variations in Table 2.
Quantile-quantile plots
Quantile-quantile plots were constructed by the rank of significance of all flowering time genes and the corresponding non-flowering time genes. For GWAS, the most significant p-value within 15 kbp from a gene was assigned for significance of the gene. First, genes in each flowering and non-flowering time gene lists were ranked according to significance of these genes from smallest to largest, and the ranks were scaled by number of genes in the list. We assumed (as a null hypothesis) that a distribution of significances of genes in both lists are same, and genes that have a same rank after scaling will have same significance. To help interpretation of the plots, 95% confidence interval was calculated (shaded grey in all quantile-quantile plots). For this, we conducted random sampling (1000 times) that maintained the chromosomal order of all observations but shuffled the relative positions of the two variables (for details see [52]). Random distributions were generated point by point and the 2.5th and 97.5th percentiles of each point were calculated from the distribution.
Enrichment test and bounding the FDR
Observed enrichments were assessed to optimize the threshold of MTMM and VCA according to the method of Atwell et al. [12]. Briefly, if we assume that all non-candidate genes are false, then we can estimate the fraction of true positives and false positives among the a priori candidates. We estimated the enrichment as X/Y and the FDR as
(6) |
where Y is the fraction of non a priori genes (S3 Table) that are significant, and X is the fraction of a priori genes that are significant. 113 functionally confirmed flowering time genes were used for a priori list. For GWAS, the most significant p-value within 15 kbp of a gene was assigned as the significance of that gene.
Supporting Information
Acknowledgments
We thank for Oliver Stegle for discussions about statistics and usage of LIMIX, Arthur Korte for discussions about MTMM, and Ümit Seren for helping prepare an excellent data analysis environment.
Data Availability
All data can be found in Long et al. (Nature Genet. 2013). Phenotypes are included as supplemental data here.
Funding Statement
This work was supported by NIH P50HG002790, ERC 268962, and GMI institutional funds (to MN). SA was supported in part by NIH R01GM073822 (to Justin Borevitz). ES was supported in part by a Lise Meitner award from the Austrian Science Foundation (FWF M1420-B21). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
References
- 1. Koornneef M, Alonso-Blanco C, Peeters AJM, Soppe W. Genetic control of flowering time in Arabidopsis. Annu Rev Plant Physiol Plant Mol Biol. 1998;49:345–370. 10.1146/annurev.arplant.49.1.345 [DOI] [PubMed] [Google Scholar]
- 2. El-Soda M, Malosetti M, Zwaan BJ, Koornneef M, Aarts MGM. Genotype x environment interaction QTL mapping in plants: lessons from Arabidopsis. Trends Plant Sci. 2014;19(6):390–398. 10.1016/j.tplants.2014.01.001 [DOI] [PubMed] [Google Scholar]
- 3. Via S, Lande R. Genotype-Environment Interaction and the Evolution of Phenotypic Plasticity. Evolution. 1985;39(3):505–522. 10.2307/2408649 [DOI] [PubMed] [Google Scholar]
- 4. Fournier-Level A, Korte A, Cooper MD, Nordborg M, Schmitt J, Wilczek AM. A Map of Local Adaptation in Arabidopsis thaliana . Science. 2011;334(6052):86–89. 10.1126/science.1209271 [DOI] [PubMed] [Google Scholar]
- 5. Li Y, Cheng RY, Spokas KA, Palmer AA, Borevitz JO. Genetic Variation for Life History Sensitivity to Seasonal Warming in Arabidopsis thaliana . Genetics. 2014;196(2):569–577. 10.1534/genetics.113.157628 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Thomas D. Methods for Investigating Gene-Environment Interactions in Candidate Pathway and Genome-Wide Association Studies. Annu Rev Public Health, Vol 31. 2010;31:21–36. 10.1146/annurev.publhealth.012809.103619 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Korte A, Vilhjálmsson BJ, Segura V, Platt A, Long Q, Nordborg M. A mixed-model approach for genome-wide association studies of correlated traits in structured populations. Nat Genet. 2012;44(9):1066–1071. 10.1038/ng.2376 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Stratton DA. Reaction norm functions and QTL-environment interactions for flowering time in Arabidopsis thaliana . Heredity (Edinb). 1998;81:144–55. 10.1046/j.1365-2540.1998.00369.x [DOI] [PubMed] [Google Scholar]
- 9. Buckler ES, Holland JB, Bradbury PJ, Acharya CB, Brown PJ, Browne C, et al. The genetic architecture of maize flowering time. Science. 2009;325(5941):714–8. 10.1126/science.1174276 [DOI] [PubMed] [Google Scholar]
- 10. Filiault DL, Maloof JN. A genome-wide association study identifies variants underlying the Arabidopsis thaliana shade avoidance response. PLoS Genet. 2012;8(3):e1002589 10.1371/journal.pgen.1002589 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Long Q, Rabanal FA, Meng D, Huber CD, Farlow A, Platzer A, et al. Massive genomic variation and strong selection in Arabidopsis thaliana lines from Sweden. Nat Genet. 2013;45(8):884–90. 10.1038/ng.2678 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Atwell S, Huang YS, Vilhjálmsson BJ, Willems G, Horton M, Li Y, et al. Genome-wide association study of 107 phenotypes in Arabidopsis thaliana inbred lines. Nature. 2010;465(7298):627–31. 10.1038/nature08800 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Li Y, Huang Y, Bergelson J, Nordborg M, Borevitz JO. Association mapping of local climate-sensitive quantitative trait loci in Arabidopsis thaliana . Proc Natl Acad Sci U S A. 2010;107(49):21199–204. 10.1073/pnas.1007431107 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Brachi B, Faure N, Bergelson J, Cuguen J, Roux F. Genome-wide association mapping of flowering time in Arabidopsis thaliana in nature: genetics for underlying components and reaction norms across two successive years. Acta Bot Gallica. 2013;160(3–4):205–218. 10.1080/12538078.2013.807302 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Brachi B, Villoutreix R, Faure N, Hautekèete N, Piquot Y, Pauwels M, et al. Investigation of the geographical scale of adaptive phenological variation and its underlying genetics in Arabidopsis thaliana . Mol Ecol. 2013;22(16):4222–40. 10.1111/mec.12396 [DOI] [PubMed] [Google Scholar]
- 16. Brachi B, Morris GP, Borevitz JO. Genome-wide association studies in plants: the missing heritability is in the field. Genome Biol. 2011;12(10):232 10.1186/gb-2011-12-10-232 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Debieu M, Tang C, Stich B, Sikosek T, Effgen S, Josephs E, et al. Co-Variation between Seed Dormancy, Growth Rate and Flowering Time Changes with Latitude in Arabidopsis thaliana . PLoS One. 2013;8(5):e61075 10.1371/journal.pone.0061075 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Li PJ, Filiault D, Box MS, Kerdaffrec E, van Oosterhout C, Wilczek AM, et al. Multiple FLC haplotypes defined by independent cis-regulatory variation underpin life history diversity in Arabidopsis thaliana . Genes Dev. 2014;28(15):1635–1640. 10.1101/gad.245993.114 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Michaels SD, He YH, Scortecci KC, Amasino RM. Attenuation of FLOWERING LOCUS C activity as a mechanism for the evolution of summer-annual flowering behavior in Arabidopsis . Proc Natl Acad Sci U S A. 2003;100(17):10102–10107. 10.1073/pnas.1531467100 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Kim DH, Zografos BR, Sung SB. Vernalization-Mediated VIN3 Induction Overcomes the LIKE-HETEROCHROMATIN PROTEIN1/POLYCOMB REPRESSION COMPLEX2-Mediated Epigenetic Repression. Plant Physiol. 2010;154(2):949–957. 10.1104/pp.110.161083 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Dubin MJ, Zhang P, Meng D, Remigereau MS, Osborne EJ, Casale FP, et al. DNA methylation in Arabidopsis has a genetic basis and shows evidence of local adaptation. ELife. 2015;4:e05255 10.7554/eLife.05255 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Lee JH, Ryu HS, Chung KS, Posé D, Kim S, Schmid M, et al. Regulation of Temperature-Responsive Flowering by MADS-Box Transcription Factor Repressors. Science. 2013;342(6158):628–632. 10.1126/science.1241097 [DOI] [PubMed] [Google Scholar]
- 23. Méndez-Vigo B, Martínez-Zapater JM, Alonso-Blanco C. The flowering repressor SVP underlies a novel Arabidopsis thaliana QTL interacting with the genetic background. PLoS Genet. 2013;9(1):e1003289 10.1371/journal.pgen.1003289 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Sung SB, Amasino RM. Vernalization in Arabidopsis thaliana is mediated by the PHD finger protein VIN3. Nature. 2004;427(6970):159–164. 10.1038/nature02195 [DOI] [PubMed] [Google Scholar]
- 25. Yu JM, Pressoir G, Briggs WH, Bi IV, Yamasaki M, Doebley JF, et al. A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat Genet. 2006;38(2):203–208. 10.1038/ng1702 [DOI] [PubMed] [Google Scholar]
- 26. Zhao KY, Aranzana MJ, Kim S, Lister C, Shindo C, Tang CL, et al. An Arabidopsis example of association mapping in structured samples. PLoS Genet. 2007;3(1). 10.1371/journal.pgen.0030004 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Lippert C, Casale FP, Rakitsch B, Stegle O. LIMIX: genetic analysis of multiple traits. BioRxiv. 2014;.
- 28. Schomburg FM, Patton DA, Meinke DW, Amasino RM. FPA, a gene involved in floral induction in Arabidopsis, encodes a protein containing RNA-recognition motifs. Plant Cell. 2001;13(6):1427–1436. 10.1105/tpc.13.6.1427 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29. Imura Y, Kobayashi Y, Yamamoto S, Furutani M, Tasaka M, Abe M, et al. Cryptic Precocious/Med12 Is a Novel Flowering Regulator with Multiple Target Steps in Arabidopsis. Plant Cell Physiol. 2012;53(2):287–303. 10.1093/pcp/pcs002 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30. Gillmor CS, Silva-Ortega CO, Willmann MR, Buendia-Monreal M, Poethig RS. The Arabidopsis Mediator CDK8 module genes CCT (MED12) and GCT (MED13) are global regulators of developmental phase transitions. Development. 2014;141(23):4580–4589. 10.1242/dev.111229 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31. Steinbach Y, Hennig L. Arabidopsis MSI1 functions in photoperiodic flowering time control. Front Plant Sci. 2014;5(77). 10.3389/fpls.2014.00077 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32. Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, Hunter DJ, et al. Finding the missing heritability of complex diseases. Nature. 2009;461(7265):747–53. 10.1038/nature08494 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33. Lango Allen H, Estrada K, Lettre G, Berndt SI, Weedon MN, Rivadeneira F, et al. Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature. 2010;467(7317):832–8. 10.1038/nature09410 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34. Yang J, Benyamin B, McEvoy BP, Gordon S, Henders AK, Nyholt DR, et al. Common SNPs explain a large proportion of the heritability for human height. Nat Genet. 2010;42(7):565–9. 10.1038/ng.608 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35. Marigorta UM, Gibson G. A simulation study of gene-by-environment interactions in GWAS implies ample hidden effects. Front Genet. 2014;5:225 10.3389/fgene.2014.00225 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36. Shindo C, Aranzana MJ, Lister C, Baxter C, Nicholls C, Nordborg M, et al. Role of FRIGIDA and FLOWERING LOCUS C in determining variation in flowering time of Arabidopsis. Plant Physiol. 2005;138(2):1163–1173. 10.1104/pp.105.061309 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37. Wollenberg AC, Amasino RM. Natural variation in the temperature range permissive for vernalization in accessions of Arabidopsis thaliana . Plant Cell Environ. 2012;35(12):2181–2191. 10.1111/j.1365-3040.2012.02548.x [DOI] [PubMed] [Google Scholar]
- 38. Strange A, Li P, Lister C, Anderson J, Warthmann N, Shindo C, et al. Major-Effect Alleles at Relatively Few Loci Underlie Distinct Vernalization and Flowering Variation in Arabidopsis Accessions. PLoS One. 2011;6(5). 10.1371/journal.pone.0019949 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39. Salomé PA, Bomblies K, Laitinen RAE, Yant L, Mott R, Weigel D. Genetic Architecture of Flowering-Time Variation in Arabidopsis thaliana . Genetics. 2011;188(2):421–U313. 10.1534/genetics.111.126607 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40. Yang Q, Li Z, Li W, Ku L, Wang C, Ye J, et al. CACTA-like transposable element in ZmCCT attenuated photoperiod sensitivity and accelerated the postdomestication spread of maize. Proc Natl Acad Sci U S A. 2013;110(42):16969–74. 10.1073/pnas.1310949110 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41. Zhao K, Tung CW, Eizenga GC, Wright MH, Ali ML, Price AH, et al. Genome-wide association mapping reveals a rich genetic architecture of complex traits in Oryza sativa . Nat Commun. 2011;2:467 10.1038/ncomms1467 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42. Wilczek AM, Roe JL, Knapp MC, Cooper MD, Lopez-Gallego C, Martin LJ, et al. Effects of Genetic Perturbation on Seasonal Life History Plasticity. Science. 2009;323(5916):930–934. 10.1126/science.1165826 [DOI] [PubMed] [Google Scholar]
- 43. Sheldon CC, Rouse DT, Finnegan EJ, Peacock WJ, Dennis ES. The molecular basis of vernalization: The central role of FLOWERING LOCUS C (FLC). Proc Natl Acad Sci U S A. 2000;97(7):3753–3758. 10.1073/pnas.060023597 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44. Srikanth A, Schmid M. Regulation of flowering time: all roads lead to Rome. Cell Mol Life Sci. 2011;68(12):2013–2037. 10.1007/s00018-011-0673-y [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45. Bäurle I, Smith L, Baulcombe DC, Dean C. Widespread role for the flowering-time regulators FCA and FPA in RNA-mediated chromatin silencing. Science. 2007;318(5847):109–12. 10.1126/science.1146565 [DOI] [PubMed] [Google Scholar]
- 46. Li D, Liu C, Shen L, Wu Y, Chen H, Robertson M, et al. A repressor complex governs the integration of flowering signals in Arabidopsis . Dev Cell. 2008;15(1):110–120. 10.1016/j.devcel.2008.05.002 [DOI] [PubMed] [Google Scholar]
- 47. Clarke JH, Dean C. Mapping FRI, a locus controlling flowering time and vernalization response in Arabidopsis thaliana . Mol Gen Genet. 1994;242(1):81–9. [DOI] [PubMed] [Google Scholar]
- 48. Michaels SD, Amasino RM. FLOWERING LOCUS C encodes a novel MADS domain protein that acts as a repressor of flowering. Plant Cell. 1999;11(5):949–956. 10.2307/3870827 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49. Johanson U, West J, Lister C, Michaels S, Amasino R, Dean C. Molecular analysis of FRIGIDA, a major determinant of natural variation in Arabidopsis flowering time. Science. 2000;290(5490):344–7. 10.1126/science.290.5490.344 [DOI] [PubMed] [Google Scholar]
- 50. Hagenblad J, Nordborg M. Sequence variation and haplotype structure surrounding the flowering time locus FRI in Arabidopsis thaliana . Genetics. 2002;161(1):289–298. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51. Hagenblad J, Tang CL, Molitor J, Werner J, Zhao K, Zheng HG, et al. Haplotype structure and phenotypic associations in the chromosomal regions surrounding two Arabidopsis thaliana flowering time loci. Genetics. 2004;168(3):1627–1638. 10.1534/genetics.104.029470 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52. Nordborg M, Hu TT, Ishino Y, Jhaveri J, Toomajian C, Zheng HG, et al. The pattern of polymorphism in Arabidopsis thaliana . PLoS Biol. 2005;3(7):1289–1299. 10.1371/journal.pbio.0030196 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All data can be found in Long et al. (Nature Genet. 2013). Phenotypes are included as supplemental data here.