Abstract
GWAS involves testing genetic variants across the genomes of many individuals of a population to identify genotype–phenotype association. It was initially developed and has proven highly successful in human disease genetics. In plants genome-wide association studies (GWAS) initially focused on single feature polymorphism and recombination and linkage disequilibrium but has now been embraced by a plethora of different disciplines with several thousand studies being published in model and crop species within the last decade or so. Here we will provide a comprehensive review of these studies providing cases studies on biotic resistance, abiotic tolerance, yield associated traits, and metabolic composition. We also detail current strategies of candidate gene validation as well as the functional study of haplotypes. Furthermore, we provide a critical evaluation of the GWAS strategy and its alternatives as well as future perspectives that are emerging with the emergence of pan-genomic datasets.
Supplementary Information
The online version contains supplementary material available at 10.1007/s00018-021-03868-w.
Keywords: GWAS, Genetic architecture, Quantitative trait loci, Crop species
Genome-wide association studies (GWAS)
It was reported on 11 January 2019 that for humans 3730 GWAS studies had been published with a total of 37 730 single nucleotide variations and 52 415 unique SNV-trait associations above a genome-wide significance threshold [1, 2]. Analysis of the staggering increase in the number of associations in the time-lapse figure on the GWAS catalog website (https://www.ebi.ac.uk/gwas/) suggests that these numbers have likely increased at least threefold demonstrating the tremendous uptake of this method in recent years. Indeed, as evidenced by the numbers given above since the first GWAS for age-related macular degeneration was published in 2005 [3], well over 50 000 associations of genome-wide significance (P < 5 × 10–8) have been reported between genetic variants and common diseases and traits [1]. Among these studies risk loci for a vast number of diseases and traits, including anorexia nervosa [4], body mass index [5], cancers and their sub-types [6, 7], coronary diseases [7], inflammatory bowel disease [8], insomnia [9], type 2 diabetes mellitus [10], and schizophrenia [11], have been reported. Indeed, the number of replicable associations is now dramatically higher than those available in the pre-GWAS era [12]. The rapid uptake of GWAS in plants is similar. Indeed, since early studies on flowering time and pathogen resistance [13], single feature polymorphism [14], and recombinant and linkage disequilibrium [15], well over 1000 GWAS studies have now been published in plants [16, 17]. The data from many of these have subsequently been uploaded to the AraGWAS catalog database [18]. In this article we will provide a review of these studies in plants splitting them into four major categories: (1) biotic resistance, (2) abiotic tolerance, (3) yield associated traits, and (4) metabolic composition. We will document strategies of validation and cross-validation and outline how results from these studies are being exploited both as a route by which to gain mechanistic understanding of various biological processes and one to improve agriculture. Finally, we outline alternatives to the GWAS approach as well as providing a prospective for its future application. However, before doing so we feel it highly important to provide a brief overview of the technique itself.
The GWAS approach
The aim of GWAS is exceedingly simple—namely to detect association between allele or genotype frequency and trait status. The first step of such analysis is to identify the traits to be scored and select an appropriate study population considering both the size of the population and the amounts of genetic and trait variance that it possesses (Fig. 1). Depending on whether using a novel population or one that is already well studied genotyping may or may not be necessary. It can be carried out using single nucleotide polymorphism (SNP) arrays combined with imputation [19] or via whole-genome sequencing [2]. Association tests are then used to identify genomic regions that associate with the variance of the phenotype of interest at genome-wide significance with meta-analysis often used to increase the statistical power to detect associations. The first GWAS was performed by Klein et al. [3], who identified a variant of the Complement Factor H gene as being strongly associated with age-related macular degeneration. Within the last 15 years it has been powerful in dissecting the genetic basis for variation in a range of complex phenotypes including disease in humans and animals and physiological and agronomic traits in plants [20–26]. That said population structure and unequal relatedness between individuals can result in spurious associations and thereby false discoveries. To combat this problem considerable effort has been made to statistically account for population structure [27, 28]. For example, in mixed linear models (MLM), population stratification is fitted as a fixed effect, while kinship among individuals is incorporated via the variance–covariance structure of the random effect for the individual [29, 30]. Indeed the MLM method is now firmly established in GWAS since it has proven effective in correcting for the inflation of small genetic effects and controlling bias caused by population structure. Generally such models are carried out with single-locus test, however, multi-locus mixed models have been developed which perform well [31, 32]. While also commonly used single nucleotide polymorphism (SNP)-based GWAS suffers from oft-overlooked interactions between SNPs within a gene and also weak signals aggregating within related SNP sets [33]. To limit such problems, haplotype-based GWAS and gene-based GWAS have been developed which has high statistical power to identify causal haplotypes and demonstrated to be able to identify new candidates for complex traits albeit being less capable of detecting QTL than SNP-based GWAS especially so for rare alleles [34–36]. All these methods are based on the assumption that phenotype and marked effects follow a normal distribution. Two further developments are worthy of note. The Anderson Darling test is a complementary method, which is particularly useful for moderate effect loci or rare variants and with abnormal phenotype distribution [37] while statistics-based fine-mapping strategies have also been developed [38].
Fig. 1.
A schematic view of GWAS in plants
Initial excitement surrounding GWA cooled considerably on the appreciation of the above-mentioned facts that GWAS loci often have small effect sizes and explain only a modest proportion of heritability [39]. However, this missing heritability is, at least as long as large and varied populations as used, in fact rather small. What is clear is that the larger the population and the larger the number of SNPs the greater the chance of a successful result with empirical evidence demonstrating that for each complex trait there is a threshold sample size above which the rate of locus discovery accelerates in GWAS [40, 41]. It is important to note, however, that the value of biological insight gained from GWAS is in no way proportional to the strength of association, a fact that provides a strong argument for the value of finding subtle associations in ever larger sample sizes [42]. As stated above genetic variants can be genotyped in many different ways but by far the most predominant are SNP arrays and whole-genome sequencing (see Fig. 1). Given the lowering sequencing costs the latter is beginning to become more frequent. The advantages of SNP arrays, other than their lower costs are the fact that it is highly accurate with a well-established pipeline for analysis. By contrast, although less accurate and more expensive whole-genome sequencing provides coverage also of rare variants and even if the sample size is large enough ultra-rare variants. In addition fine mapping is easier with whole-genome sequencing, however, these advantages come at the cost of higher computational costs including a higher multiple testing burden [2]. To offset some of the limitations of SNP-based GWAS sophisticated tools for genotype imputation have been developed which allow genotypes or untyped variants to be predicted. If the size of the reference panel is large enough and a subset is well sequenced this imputation has been demonstrated to be highly reliable [43, 44]. Given this fact it is not surprising that both approaches currently retain utility. However, whole-genome sequencing is the gold standard in GWAS [45–47] and has the potential to resolve many of the limitations of the method (for example the identification of missed signals, accounting for population stratification, identification of ultra-rare mutations as well as gene–gene and gene-environment interactions and to explain even more of the missing heritability). We will discuss this in detail when we compare GWAS with other strategies to link genotype with phenotype in Limitations of GWAS an alternative approaches to GWAS below. Having provided a general introduction to the approach above we will, use early case studies in Arabidopsis that span a wide range of phenotypic traits to illustrate it in detail below before providing a more comprehensive overview of its use in other species.
Early studies of GWAS in Arabidopsis
As for many studies in the last 40 years the initial applications of GWAS in plants were in Arabidopsis. The very earliest studies focused on single feature polymorphism [14] and recombination and linkage disequilibrium [15], but a far more diverse range of phenotypes have been studied in the interim. The study of Borevitz et al. used hybrization to a microrarray as a means to assess genomic DAN diversity of 23 ecotypes in comparison to the reference ecotype Col0 allowing assessment of over 77 000 single feature polymorphisms [14]. Similarly, that of Kim et al. analyzed linkage disequilibrium in a sample of 19 Arabidopsis accessions using approximately 350 000 non-singleton SNPs demonstrating the presence of clear recombination hotspots in intergenic regions [15]. Currently, in Arabidopsis results of > 400 GWAS covering an exhausting range of phenotypes are curated in the AraGWAS catalog [18]. To highlight a few recent studies we will focus on growth, metabolism, defense, and evolution of tolerance to abiotic stress [48–52]. Growth and metabolism have been evaluated in association with enzyme activities of primary metabolism [48], while primary [51] and secondary metabolite contents [49, 50] have also been studied via the use of metabolomics approaches. All of these studies have provided greater insight into the interplay between metabolism and growth on one hand and defense on the other [53], with both difference in the levels of defense metabolites and altered alleles of ACCELERATED CELL DEATH6 suggesting a trade-off between metabolism and defense. Abiotic stress has also been much studied in Arabidopsis populations with the recent tour-de-force work of Exposito-Alonso representing a beautiful example of the power of this approach [52]. These authors evaluated 517 Arabidopsis ecotypes grown in Spain and Germany simulating high and low precipitation at each site quantifying survival and fecundity and thereafter performing a GWAS in the quantified selection coefficients. They observed that a significant proportion of the climate-driven natural selection was predictable form signatures of local adaptation since genetic variants were found in geographical areas with climates more similar to the experimental sites were positively selected. These data thus allowed them to forecast that with the increased frequency of drought and temperature in Europe such positive selection will sweep Northwards across Europe.
While the above studies represent impressive proof-of-concept studies and additionally greatly refined our understanding of the genotype-to-phenotype interface [16], as we will detail in the following sections it has been adopted in cereal crops (rice [22, 54] maize [55, 56], wheat [57] and barley [58]) as well as soybean [59–61], cotton [62, 63], tomato [25, 26], cucumber [64, 65], sesame [66], peanut [67], peach [68], melon [69], tea [70], and lettuce [71, 72]. As we will elaborate in the next four sections, these studies, alongside the purpose-developed populations, catalogs of allelic variants, and corresponding genotype–phenotype associations, provide unprecedented resources for understanding crop functional genomics [33].
Adoption of GWAS in crop species (i) biotic resistance
In the above section we have detailed some studies evaluating biotic stress in Arabidopsis. In crops this is of massive importance with 20–40% yield losses predicted to be caused by biotic interactions annually. While considerable success has been made by breeding efforts—notably the introgression of wild species alleles conferring resistance [73, 74]. Critically the collection of broad populations for, among others, the species listed above renders GWAS, an attractive approach for the identification of further genes of interest for this purpose. As can be seen in Supplementary Table 1, there are already a vast number of such studies covering many species. Here, we will highlight only the few summarized in Table 1.
Table 1.
List of selected genome-wide association studies in Arabidopsis and major crop plants
| Species (common name) | Panel size [markers] | Trait [associations] | References | Validation | |
|---|---|---|---|---|---|
| Arabidopsis | A. thaliana (Arabidopsis) | 96 [200,000] | Metabolites [**] | [148] | − |
| A. thaliana (Arabidopsis) | 314 [199,455] | Primary metabolites [117] | [51] | + | |
| A. thaliana (Arabidopsis) | 91 [4,000,000] | Drought [**] | [149] | − | |
| A. thaliana (Arabidopsis) | 349 [214,051] | Central metabolism and plant growth [131] | [48] | + | |
| A. thaliana (Arabidopsis) | 309 [199,455] | Darkness [123*] | [49] | + | |
| A. thaliana (Arabidopsis) | 517 [1,353,386] | Environmental adaptation [6,660] | [52] | + | |
| Metabolite QTL | Z. mays (Maize) | 513 [56,110] | Specialized metabolites [16] | [150] | + |
| Z. mays (Maize) | 368 [1,030,000] | Metabolites [74*] | [55] | + | |
| Z. mays (Maize) | 368 [560,000] | Metabolites [882*] | [151] | - | |
| Z. mays (Maize) | 282 [29,000,000] | Specialized metabolites [**] | [103] | - | |
| Z. mays (Maize) | 368 [560,000] | Lipid biosynthesis [139] | [106] | - | |
| O. sativa (Rice) | 529 [6,400,000] | Metabolites [634] | [152] | + | |
| O. sativa (Rice) | 502 [3,900,000] | Metabolites [105] | [20] | + | |
| Solanum spp (Tomato) | 398 [2,014,488] | Flavor [251] | [25] | + | |
| H. vulgare L. var. nudum (Tibetian Hulles Barley) | 196 [19,248,055] | Metabolites [90*] | [58] | + | |
| L. sativa (Lettuce) | 189 [16,611] | Primary metabolites [154*] | [153] | + | |
| Yield associated | G. hirsutum (Cotton) | 258 [1,871,401] | Yield-related traits [119*] | [62] | − |
| G. max (Soybean) | 809 [10,415,168] | Agronomic traits [245*] | [89] | − | |
| L. batatas (Sweet potato) | 358 [33,068] | Root-related traits [34] | [91] | − | |
| O. sativa (Rice) | 242 [700,000] | Agronomic traits [10*] | [88] | − | |
| P. vulgaris (Common bean) | 683 [4,811,097] | Yield associated traits [505*] | [154] | − | |
| Biotic stress | Z. mays (Maize]) | 5,000 [1,600,000] | Resistance to Southern Leaf Blight [245*] | [56] | − |
| Z. mays (Maize) | 318 [542,438] | Rhizoctonia solani resistance [28] | [75] | + | |
| G. max (Soybean) | 330 [25,179] | Sclerotinia sclerotiorum resistance [38] | [155] | − | |
| O. sativa (Rice) | 67 [2,576] | Blast resistance [36] | [156] | + | |
| T. aestivum (Wheat) | 2,300 [49,905] | Rust resistance [161/33] | [77] | − | |
| T. turgidum ssp, Dicoccum (Emmer Wheat) | 176 [5106] | Puccinia striiformis resistance [51*] | [76] | − | |
| Abiotic stress | O. sativa (Rice) | 553 [304,877] | Salinity tolerance [**] | [82] | − |
| O. sativa (Rice) | 68 [27,192] | Flooding tolerance [6*] | [157] | + | |
| O. sativa (Rice) | 1,033 [289,231] | Cold tolerance [5*] | [85] | + | |
| O. sativa (Rice) | 117 [1,531,224] | NUE-related agronomic traits [7] | [83] | + | |
| Z. mays (Maize) | 338 [56,110] | Metabolites under low Pi [178] | [84] | + |
Expanded list is provided in Supplementary Table 1
* Number of QTLs, ** several associations, + experimental validation of the genes/s, − no experimental validation of the candidate genes or loci
Starting with studies in our major cereals we will describe two studies each for maize and wheat and one for rice before highlighting the possible value of this approach in two less studied crops. The first study in maize used the nested association mapping population to identify 32 QTL with small additive effects on southern leaf blight with many being within or near genes previously shown to be involved in plant disease resistance [56]. More recently, GWAS revealed that the F-Box protein ZmFBL41 which interacts with ZmCAD encoding the terminal enzyme of the monolignol pathway which if active restricts lesion expansion [75]. Similarly, in a GWAS-based study in rice Li et al. found a natural allele of a C2-H2 type transcription factor that confers broad spectrum resistance. Haplotype analysis (which we will return to it below), revealed that this allele exists in 10% of accessions of rice. This allelic variance was associated to an inhibition of H2O2 degradation which the authors postulate is responsible for the observed resistance. In Emmer wheat stripe resistance loci that were associated with field resistance in multiple environments with more than half of these representing novel candidate genes that were not found in linkage mapping studies [76]. Meanwhile, a recent large-scale study in 2 300 bread wheat accessions was used to investigate leaf-, stem-, and stripe-rust diseases with both single- and multi-trait GWAS being applied [77]. Importantly, both studies revealed the utility of small effect QTL in achievement of durable resistance.
Of the less studied species, we would highlight two cassava which is actually the fourth largest crop in terms of production globally [78] and pigeonpea an important smallholder crop in India and Africa [79]. For cassava GWAS for cassava mosaic disease and cassava green mite severity were carried out identifying several novel and previously reported associations. For pigeonpea a pangenome was recently published based on 89 accessions and this will surely be a fantastic resource for future studies. Indeed, since so many natural populations are now established it would seem likely that their use as well as those of biparental and multi-parental populations will likely unlock resistance in a wide range of plant-pest combinations and as such will result in the achievement of durable resistance.
Adoption of GWAS in crop species (ii) abiotic tolerance
Similarly to the above studies aiming to generate more resistant plants considerable research and breeding efforts have been expended on identifying and utilizing allelic variance that confers tolerance to abiotic stresses. As can be seen in Supplementary Table 1, there are already a vast number of such studies covering many species. Here, we will highlight only the few summarized in Table 1 focusing on water and salt stress as well as macronutrient and temperature stress. Arguably, the most important of these is drought stress with yield losses of > 50% being estimated to be due to this stress annually [80]. While water deficiency can devastate crop yields the opposite, i.e., flooding can have the same consequences. The development of varieties of rice that are tolerant of flooding is thus highly desirable. The identification of haplotypes of the SEMIDWARF1 gene that facilitate this [81] presents an excellent example of the power of haplotype analysis following GWAS studies (an analysis type we will return to it below). Similarly in rice, salt stress has been much researched. Al-Tamanini et al. combined high throughput phenotyping of plant growth and transpiration with high-density genotyping if indica and aus diversity panels containing a total of 553 accessions [82]. This study identified a previously undetected loci for salt stress localizing to chromosome 11, thus, providing new insight into early responses to rice salinity and providing hints as to how breeding could alleviate this problem.
Given that nitrogen fertilizer is often over applied to fields often with catastrophic ecological consequences. There is, thus, a pressing need to develop crops exhibiting high nitrogen use efficiency to reduce fertilizer to move towards a more sustainable agriculture. Tang et al. recently identified the nitrate transporter OsNPF6.1 (HapB) as conferring high nitrogen use efficiency in a GWAS experiment conducted on a rice diversity panel [83] with haplotype analysis identifying that this allele had been lost in over 90% of rice varieties. In a similar vein GWAS was used to investigate phosphate use efficiency in maize [84] with metabolomics being utilized in this study to understand how metabolism is reprogrammed under phosphate limitation. The combined work identified phosphoglucose isomerase activity to be a key determinant of phosphate use efficiency suggesting it to be a strong lead gene for lessening the need of P fertilization [84].
Extreme temperatures also often provoke deleterious effects on crop yield. For this reason, GWAS was recently applied to identify genes underlying cold tolerance in a large 1033 accession rice diversity panel [85]. This study resulted in the identification of five cold tolerance related genetic loci with one loci LOC_Os10g34840 being deemed responsible for cold tolerance at the seedling stage with the cold tolerant allele being present in 80% of temperate japonica accessions but only 3.8% of the indica accessions. By contrast, for high temperature tolerance, GWAS discovered genetic factors associated with four production traits in both heat and drought stress environments in common bean (Phaseolus vulgaris L.) [86].
Adoption of GWAS in crop species (iii) yield associated traits
Having addressed the use of association mapping in resistance and tolerance of plants to biotic and abiotic factors, respectively, above it is important to note that considerable research effort has additionally been placed on elucidating the genetic basis of yield associated traits. As for the above traits we have listed several GWAS studies reporting yield associated traits in Table 1 and provide a more extensive list in Supplementary Table 1. An early study tested almost 5000 lines from the maize NAM population described above to identify numerous small effect QTL with a simple additive model being able to predict flowering time [87]. In addition to flowering time, in rice panicle architecture is a key target of selection. A total of 49 panicle phenotypes were recently assessed in 242 tropical rice accessions allowing the identification of ten GWAS peaks but also demonstrating subtle links between panicle size and yield performance [88]. The complexity of agronomic yield was similarly underlined by a study of 84 agronomic traits in a panel of 809 soybean accessions with many of the loci exhibiting complex pleiotropic effects [89]. In upland cotton a GWAS identified two ethylene pathway related genes as associated with increased lint yield with an analysis of population frequencies revealing that the majority of the elite alleles detected were transferred from a mere three founder landraces [62]. Such analyses are not restricted to cereals with analysis even being carried out in long lived species such as Populus trees [90], as well as sweet potato [91] and GWAS confirming the Lin5 association with agronomic yield in tomato [25] that had previously been identified by linkage mapping [92]. It is perhaps not unexpected that the QTL for yield associated traits seem generally not to be conserved across species.
Adoption of GWAS in crop species (iv) metabolic composition
Combining the developments in sequencing with those in mass-spectrometry-based analytical systems, has rendered understanding of the genetic architecture of metabolism far easier than it was previously [33, 93–95]. Indeed the immense metabolic diversity of plants has made the ideal models for dissecting the genetic bases underlying the regulation of the metabolome with studies progressing from analysis of mutant libraries [96, 97], and the analysis of gene families [98, 99] via the comparison of sister species [100] and species series within taxa [101] to linkage mapping, and association mapping based on next-generation sequencing have been applied to metabolomics studies [33]. By contrast to the QTL for agricultural performance described above, genetic variants controlling natural variation in metabolite accumulation are easier to identify due to both the tremendous diversity apparent across experimental populations [20, 102–105] and the high accuracy of evaluation of metabolite content [95]. As mentioned above a wide range of examples are now published both in cereal and non-cereal crops (Table 1 and Supplementary Table 1). Due to space limitations we limit our discussion to ten of these examples. In maize, GWAS was used to quantify metabolite contents of nearly 1000 mass features in over 700 lines and further allowed the association of metabolite features with kernal size [55] while a more recent study identified four times as many features paying particular attention to the benzoxazinoids and hydroxycitric acids [103]. Earlier a ground-breaking highly comprehensive study on maize kernel oil identified 74 associated loci of which 26 were found that could explain up to 83% of the phenotypic variation using a simple additive mode.
Maize kernel oil is a valuable source of nutrition. In a seminal study, Li et al. examined the genetic architecture of oil accumulation in maize by GWAS using 368 maize inbred lines characterized to contain in excess of 1 million SNPs. In the process, they identified 74 loci associated with kernel oil levels and fatty acid composition. They validated more than half of these in a linkage mapping population and 26 of the conserved loci were annotated as enzymes of oil biosynthesis and could explain up to 83% of the phenotypic variation in this trait [106]. Similarly in rice, secondary metabolism data of 175 accessions identified 323 associations among 143 SNPs and 89 metabolites. While a comparative analysis between maize and rice demonstrated a considerable amount of shared loci associated with metabolites common to both species [20], but of course could not provide information with regard to species-specific metabolites or for that matter genes [33]. The use of this approach in wheat and barley has allowed the definition of the flavonoid biosynthesis pathway in the former and a novel metabolite, thereof, that confers UV-tolerance in hulless barley, respectively. In tomato, GWAS was used in concert with metabolite profiling and taste panels to characterize the genetic architecture of tomato fruit taste [25] and with metabolic and transcript profiling to characterize the changes in the metabolome that occurred during the domestication and improvement processes [26] while a combination of GWAS, a multi-parental breeding population and transgenic lines was used to characterize the control of vitamin E levels in this fruit [107]. To summarize, metabolic GWAS has proven highly informative not only as a means of identify lead genes for engineering of specific metabolite contents but also in beginning to define the biological function of specific metabolites [95]. However, in certain species such as citrus the use of GWAS is not yet tractable most likely due to population structure issues (unpublished), and this fact is important to keep in mind before carrying out labor-intensive studies, on a new species—irrespective of the phenotype studied.
Validation of candidate gene function
Despite the strong theoretical foundation we discuss above and considerable efforts being taken to address population structure and employ strict probability cut-offs, false-positive associations will still occur due to the enormous number of statistical inferences and other factors which are not taken into account by the simplicity of the approach [17, 108, 109]. As a consequence independent biological validation is required, however, often not provided [17]. That said two forms of validation have been employed in several instances (i) the validation of associations in independent populations or (ii) validation by targeted viral-induced gene silencing, transgenesis and gene editing experimentation. Cross-population validation is currently largely achieved by integrating association mapping in diverse panels or linkage mapping in RIL population(s) or F2 populations. For example, in the recent cloning of ZmCCT9, a QTL which affects maize flowering time [110], the locus was simultaneously identified by NAM [87] and maize-teosinte RIL populations under association and linkage mapping. Moreover, the causal allele—an InDel of a harbinger-like transposon—has also been identified in a 513 line association panel [111] a fact that was cross-validated in the two populations used to map the locus. In a similar example, rice chlorophyll content was mapped in a panel of 529 individuals followed by three customized F2 populations [112]. Other such examples are the metabolomes of maize [113] and in independent studies the QTL underlying total soluble solid content [92, 113] and alterations in the metabolome [26, 93] in tomato and the exquisitely controlled study mentioned above which used GWAS, multi-parental breeding populations and transgenics to confirm QTL for tocopherol contents [107]. The increasing availability of populations which have been characterized should massively increase or capacities to do such experiments which will undoubtedly massively boost our confidence in the results of association mapping studies. In this vein, it is important also to note also the value of cross-species analysis which has already been implemented in cereals [20, 114, 115] and would probably prove tractable in other agronomically important families such as the Brasicacae, Solanaceae, and legumes. Rather than employing the cross-validation approach which can prove incredibly time and labor intensive several other more direct approaches have been taken. For example, the confirmation of many metabolic QTL has been provided by the reduction of the expression of candidate genes via virus-induced gene silencing [93, 95, 116] or alternatively via their transient or inducible expression [20]. Given that the repertoire of species amenable to both methods are currently being considerably expanded. While these are great for select candidates the promise of clustered regularly interspaced short palindromic repeats (CRISPR)/CRISPR associated protein 9 (Cas9) mutant libraries such as those set up for rice [117, 118] and more recently maize [119] should greatly accelerate the functional confirmation of causality. Like the VIGS and transient expression methods, the range of plant species for which multiple publications on the use of CRISPR has seen a steep increase in recent years [119, 120].
Limitations of GWAS an alternative approaches to GWAS
Despite the great success of the method as evidenced by the wealth of information described above (and in the Supplementary Table 1), GWAS currently has clear limitations the major of which being issues concerning population structure and low-frequency causal alleles leading to false negative results [121]. For example, given that flowering time is a typical adaptive trait and is always confounded (i.e., highly correlated) with population structure, only one gene (ZmCCT) was revealed for flowering time using a diverse association mapping panel consisting of 500 inbred lines [122]. It is widely accepted that many false negatives occur for such confounded traits when correcting for population structure in GWAS [17, 123]. Another example is the demonstration that only five inbred lines in a population of 527 (< 1%) possess functionally alternative alleles at the Brachytic2 locus for plant height [124] rendering it impossible to identify this locus using routine association mapping analysis. Similarly in rice, causal alleles within most of the cloned yield related quantitative trait loci (QTLs) are at low frequency in diverse germplasms (1% for GS3, [125]; 2% for Ghd7, [126–128]; 2% for qGL3, [129]; 6% for TGW6, [130]). Two routes to tackle these issues have been suggested either the development of novel statistical methods for the exploration of rare functional alleles [131–133] or alternatively employing artificially designed populations to balance allelic frequencies and thereby control population structure [87, 134–136]. Given that these have been reviewed in depth recently [17, 137–139]we will not discuss them in detail here.
In addition to the above issues, sometimes non-causative loci show more significant associations in GWAS than the causative ones meaning the causative genes may be distant from the GWAS peaks. Such an occurrence has been reported in a number of plant studies including studies in Arabidopsis [140, 141], sorghum [142], and tomato [143]. Such misleading associations are sometimes known as synthetic associations and are presumed to be caused by linkage drag caused by linkage disequilibrium between common tagged markers and rare causative variants [17, 144]. This may in turn explain the so-called missing heritability issue of GWAS. That said some causes do not follow the rare-allele assumption but trait variation rather appears to be caused by multiple alleles within one gene [34, 142]. Given that mutation constantly generates new variants, multiple independent alleles within one gene leading to the same phenotype could be common. As we state above haplotype- or gene-based methodologies, therefore, have high potential for identifying such situations. That said current haplotype-based association mapping remains imperfect [145] and, moreover, is particularly challenging in plants [17]. Thus improving haplotype analyses will likely prove highly beneficial both at the understanding of the underlying genetics as well as its functional physiological consequence.
Current and future perspectives for GWAS
The power of genome-wide association studies have successfully identified enormous number of loci associated with phenotypic, expression, and metabolic traits in multiple species. Although, the genetic factors underling some of these associations have been characterized. The vast majority are remain unexplained. The development of next-generation sequencing and bioinformatics tools greatly improved and currently implemented to decipher the genetic diversity of targeted traits. This recently supported by multi‐omics data analysis to enhancing our understanding of phenotypic diversity and its corresponding genetic basis. Combined analyses of phenotypic and transcriptomic data have been utilized to dissect the genetic basses of various metabolic and phenotypic traits see [146]. Moreover, the developments of molecular biology techniques (e.g., CRISPR/Cas9, over‐expression, or genetic complementation) have greatly accelerated the biological functions of the causative genes behind the GWAS hits. Currently, the cross-validation by combing association and linkage (F2, RILs) mapping has already been implemented in crop [25, 147]. Finally, despite molecular and genetic validations are the reliable ways to validate the GWAS results, there are still accompanying challenges need to take into consideration, such as; epistasis, heterosis and environmental factors. Once such factors are assembled, it will improve our chance of understanding the genetic regulation of complex traits, and provide viable targets for crop improvement and breeding.
Supplementary Information
Below is the link to the electronic supplementary material.
Author contributions
SA, MB, and ARF: Max-Planck-Institute of Molecular Plant Physiology, Am Mühlenberg 1, 14476 Potsdam-Golm, Germany. DK: Maritsa Vegetable Crops Research Institute (MVCRI), Plovdiv, Bulgaria. SA, DK, and ARF: Center of Plant Systems Biology and Biotechnology, 4000 Plovdiv, Bulgaria. All authors contributed to the literature review and preparing the manuscript.
Funding
Open Access funding enabled and organized by Projekt DEAL. ARF, DK, and SA acknowledge the financial support of the EU Horizon 2020 Research and Innovation Programme, project PlantaSYST (SGA-CSA No, 739582 under FPA No, 664620), MB is supported by the IMPRS-PMPG ‘Primary Metabolism and Plant Growth’,
Availability of data and material
Data associated with a paper are available in the manuscript.
Declarations
Conflict of interest
The authors declare that they have no conflict of interest,
Footnotes
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Buniello A, et al. The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res. 2019;47(D1):D1005–d1012. doi: 10.1093/nar/gky1120. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Tam V, et al. Benefits and limitations of genome-wide association studies. Nat Rev Genet. 2019;20(8):467–484. doi: 10.1038/s41576-019-0127-1. [DOI] [PubMed] [Google Scholar]
- 3.Klein RJ, et al. Complement factor H polymorphism in age-related macular degeneration. Science. 2005;308(5720):385–389. doi: 10.1126/science.1109557. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Duncan L, et al. Significant locus and metabolic genetic correlations revealed in genome-wide association study of anorexia nervosa. Am J Psychiatry. 2017;174(9):850–858. doi: 10.1176/appi.ajp.2017.16121402. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Yengo L, et al. Meta-analysis of genome-wide association studies for height and body mass index in ∼700000 individuals of European ancestry. Hum Mol Genet. 2018;27(20):3641–3649. doi: 10.1093/hmg/ddy271. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Milne RL, et al. Identification of ten variants associated with risk of estrogen-receptor-negative breast cancer. Nat Genet. 2017;49(12):1767–1778. doi: 10.1038/ng.3785. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Sud A, Kinnersley B, Houlston RS. Genome-wide association studies of cancer: current insights and future perspectives. Nat Rev Cancer. 2017;17(11):692–704. doi: 10.1038/nrc.2017.82. [DOI] [PubMed] [Google Scholar]
- 8.de Lange KM, et al. Genome-wide association study implicates immune activation of multiple integrin genes in inflammatory bowel disease. Nat Genet. 2017;49(2):256–261. doi: 10.1038/ng.3760. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Jansen PR, et al. Genome-wide analysis of insomnia in 1,331,010 individuals identifies new risk loci and functional pathways. Nat Genet. 2019;51(3):394–403. doi: 10.1038/s41588-018-0333-3. [DOI] [PubMed] [Google Scholar]
- 10.Suzuki K, et al. Identification of 28 new susceptibility loci for type 2 diabetes in the Japanese population. Nat Genet. 2019;51(3):379–386. doi: 10.1038/s41588-018-0332-4. [DOI] [PubMed] [Google Scholar]
- 11.Li Z, et al. Genome-wide association analysis identifies 30 new susceptibility loci for schizophrenia. Nat Genet. 2017;49(11):1576–1583. doi: 10.1038/ng.3973. [DOI] [PubMed] [Google Scholar]
- 12.Lohmueller KE, et al. Meta-analysis of genetic association studies supports a contribution of common variants to susceptibility to common disease. Nat Genet. 2003;33(2):177–182. doi: 10.1038/ng1071. [DOI] [PubMed] [Google Scholar]
- 13.Aranzana MJ, et al. Genome-wide association mapping in Arabidopsis identifies previously known flowering time and pathogen resistance genes. PLoS Genet. 2005;1(5):e60. doi: 10.1371/journal.pgen.0010060. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Borevitz JO, et al. Genome-wide patterns of single-feature polymorphism in Arabidopsis thaliana. Proc Natl Acad Sci USA. 2007;104(29):12057–12062. doi: 10.1073/pnas.0705323104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Kim S, et al. Recombination and linkage disequilibrium in Arabidopsis thaliana. Nat Genet. 2007;39(9):1151–1155. doi: 10.1038/ng2115. [DOI] [PubMed] [Google Scholar]
- 16.Fernie AR, Gutierrez-Marcos J. From genome to phenome: genome-wide association studies and other approaches that bridge the genotype to phenotype gap. Plant J. 2019;97(1):5–7. doi: 10.1111/tpj.14219. [DOI] [PubMed] [Google Scholar]
- 17.Liu HJ, Yan J. Crop genome-wide association study: a harvest of biological relevance. Plant J. 2019;97(1):8–18. doi: 10.1111/tpj.14139. [DOI] [PubMed] [Google Scholar]
- 18.Togninalli M, et al. AraPheno and the AraGWAS Catalog 2020: a major database update including RNA-Seq and knockout mutation data for Arabidopsis thaliana. Nucleic Acids Res. 2020;48(D1):D1063–d1068. doi: 10.1093/nar/gkz925. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Johnson EO, et al. Imputation across genotyping arrays for genome-wide association studies: assessment of bias and a correction strategy. Hum Genet. 2013;132(5):509–522. doi: 10.1007/s00439-013-1266-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Chen W, et al. Comparative and parallel genome-wide association studies for metabolic and agronomic traits in cereals. Nat Commun. 2016;7:12767. doi: 10.1038/ncomms12767. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Horton MW, et al. Genome-wide patterns of genetic variation in worldwide Arabidopsis thaliana accessions from the RegMap panel. Nat Genet. 2012;44(2):212–216. doi: 10.1038/ng.1042. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Huang X, et al. Genome-wide association studies of 14 agronomic traits in rice landraces. Nat Genet. 2010;42(11):961–967. doi: 10.1038/ng.695. [DOI] [PubMed] [Google Scholar]
- 23.Tian D, et al. GWAS Atlas: a curated resource of genome-wide variant-trait associations in plants and animals. Nucleic Acids Res. 2020;48(D1):D927–D932. doi: 10.1093/nar/gkz828. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Tian F, et al. Genome-wide association study of leaf architecture in the maize nested association mapping population. Nat Genet. 2011;43(2):159–162. doi: 10.1038/ng.746. [DOI] [PubMed] [Google Scholar]
- 25.Tieman D, et al. A chemical genetic roadmap to improved tomato flavor. Science. 2017;355(6323):391–394. doi: 10.1126/science.aal1556. [DOI] [PubMed] [Google Scholar]
- 26.Zhu GT, et al. Rewiring of the fruit metabolome in tomato breeding. Cell. 2018;172(1–2):249. doi: 10.1016/j.cell.2017.12.019. [DOI] [PubMed] [Google Scholar]
- 27.Devlin B, Roeder K. Genomic control for association studies. Biometrics. 1999;55(4):997–1004. doi: 10.1111/j.0006-341X.1999.00997.x. [DOI] [PubMed] [Google Scholar]
- 28.Liu X, et al. Iterative usage of fixed and random effect models for powerful and efficient genome-wide association studies. PLoS Genet. 2016;12(2):e1005767. doi: 10.1371/journal.pgen.1005767. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Yu JM, et al. A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat Genet. 2006;38(2):203–208. doi: 10.1038/ng1702. [DOI] [PubMed] [Google Scholar]
- 30.Zhao KY, et al. An Arabidopsis example of association mapping in structured samples. PLoS Genet. 2007;3(1):e4. doi: 10.1371/journal.pgen.0030004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Segura V, et al. An efficient multi-locus mixed-model approach for genome-wide association studies in structured populations. Nat Genet. 2012;44(7):825–U144. doi: 10.1038/ng.2314. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Wen YJ, et al. Methodological implementation of mixed linear models in multi-locus genome-wide association studies (bbw145, 2016) Brief Bioinform. 2017;18(5):906–906. doi: 10.1093/bib/bbx028. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Fang C, Luo J. Metabolic GWAS-based dissection of genetic bases underlying the diversity of plant metabolism. Plant J. 2019;97(1):91–100. doi: 10.1111/tpj.14097. [DOI] [PubMed] [Google Scholar]
- 34.Yano K, et al. Genome-wide association study using whole-genome sequencing rapidly identifies new genes influencing agronomic traits in rice. Nat Genet. 2016;48(8):927–934. doi: 10.1038/ng.3596. [DOI] [PubMed] [Google Scholar]
- 35.Zhang, W.C., et al., PEPIS: A Pipeline for Estimating Epistatic Effects in Quantitative Trait Locus Mapping and Genome-Wide Association Studies. Plos Computational Biology, 2016. 12(5). [DOI] [PMC free article] [PubMed]
- 36.Sato S, et al. SNP- and haplotype-based genome-wide association studies for growth, carcass, and meat quality traits in a Duroc multigenerational population. BMC Genet. 2016 doi: 10.1186/s12863-016-0368-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Yang N, et al. Genome wide association studies using a new nonparametric model reveal the genetic architecture of 17 agronomic traits in an enlarged maize association panel. PLoS Genet. 2014;10(9):e1004573. doi: 10.1371/journal.pgen.1004573. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Schaid DJ, Chen WN, Larson NB. From genome-wide associations to candidate causal variants by statistical fine-mapping. Nat Rev Genet. 2018;19(8):491–504. doi: 10.1038/s41576-018-0016-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Manolio TA, et al. Finding the missing heritability of complex diseases. Nature. 2009;461(7265):747–753. doi: 10.1038/nature08494. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Visscher PM, et al. Five years of GWAS discovery. Am J Hum Genet. 2012;90(1):7–24. doi: 10.1016/j.ajhg.2011.11.029. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Ahlqvist E, et al. The genetics of diabetic complications. Nat Rev Nephrol. 2015;11(5):277–287. doi: 10.1038/nrneph.2015.37. [DOI] [PubMed] [Google Scholar]
- 42.Altshuler D, Daly MJ, Lander ES. Genetic mapping in human disease. Science. 2008;322(5903):881–888. doi: 10.1126/science.1156409. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Marchini J, Howie B. Genotype imputation for genome-wide association studies. Nat Rev Genet. 2010;11(7):499–511. doi: 10.1038/nrg2796. [DOI] [PubMed] [Google Scholar]
- 44.Huang J, et al. Improved imputation of low-frequency and rare variants using the UK10K haplotype reference panel. Nat Commun. 2015;6:8111. doi: 10.1038/ncomms9111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Fuchsberger C, et al. The genetic architecture of type 2 diabetes. Nature. 2016;536(7614):41–47. doi: 10.1038/nature18642. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Lek M, et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature. 2016;536(7616):285–291. doi: 10.1038/nature19057. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Steinthorsdottir V, et al. Identification of low-frequency and rare sequence variants associated with elevated or reduced risk of type 2 diabetes. Nat Genet. 2014;46(3):294–298. doi: 10.1038/ng.2882. [DOI] [PubMed] [Google Scholar]
- 48.Fusari CM, et al. Genome-wide association mapping reveals that specific and pleiotropic regulatory mechanisms fine-tune central metabolism and growth in Arabidopsis. Plant Cell. 2017;29(10):2349–2373. doi: 10.1105/tpc.17.00232. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Wu S, et al. Mapping the arabidopsis metabolic landscape by untargeted metabolomics at different environmental conditions. Mol Plant. 2018;11(1):118–134. doi: 10.1016/j.molp.2017.08.012. [DOI] [PubMed] [Google Scholar]
- 50.Chan EK, Rowe HC, Kliebenstein DJ. Understanding the evolution of defense metabolites in Arabidopsis thaliana using genome-wide association mapping. Genetics. 2010;185(3):991–1007. doi: 10.1534/genetics.109.108522. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Wu S, et al. Combined use of genome-wide association data and correlation networks unravels key regulators of primary metabolism in Arabidopsis thaliana. PLoS Genet. 2016;12(10):e1006363. doi: 10.1371/journal.pgen.1006363. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Exposito-Alonso M, et al. Natural selection on the Arabidopsis thaliana genome in present and future climates. Nature. 2019;573(7772):126–129. doi: 10.1038/s41586-019-1520-9. [DOI] [PubMed] [Google Scholar]
- 53.Kleessen S, et al. Metabolic efficiency underpins performance trade-offs in growth of Arabidopsis thaliana. Nat Commun. 2014;5:3537. doi: 10.1038/ncomms4537. [DOI] [PubMed] [Google Scholar]
- 54.Huang X, et al. Genome-wide association study of flowering time and grain yield traits in a worldwide collection of rice germplasm. Nat Genet. 2011;44(1):32–39. doi: 10.1038/ng.1018. [DOI] [PubMed] [Google Scholar]
- 55.Wen WW, et al. Metabolome-based genome-wide association study of maize kernel leads to novel biochemical insights. Nat Commun. 2014;5:3438. doi: 10.1038/ncomms4438. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Kump KL, et al. Genome-wide association study of quantitative resistance to southern leaf blight in the maize nested association mapping population. Nat Genet. 2011;43(2):163–168. doi: 10.1038/ng.747. [DOI] [PubMed] [Google Scholar]
- 57.Chen J, et al. Metabolite-based genome-wide association study enables dissection of the flavonoid decoration pathway of wheat kernels. Plant Biotechnol J. 2020;18(8):1722–1735. doi: 10.1111/pbi.13335. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Zeng X, et al. Genome-wide dissection of co-selected UV-B responsive pathways in the UV-B adaptation of Qingke. Mol Plant. 2020;13(1):112–127. doi: 10.1016/j.molp.2019.10.009. [DOI] [PubMed] [Google Scholar]
- 59.Hwang EY, et al. A genome-wide association study of seed protein and oil content in soybean. BMC Genomics. 2014;15:1. doi: 10.1186/1471-2164-15-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Fang C, et al. Genome-wide association studies dissect the genetic networks underlying agronomical traits in soybean. Genome Biol. 2017 doi: 10.3389/fpls.2018.01184. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Leamy LJ, et al. A genome-wide association study of seed composition traits in wild soybean (Glycine soja) BMC Genomics. 2017;18:18. doi: 10.1186/s12864-016-3397-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Fang L, et al. Genomic analyses in cotton identify signatures of selection and loci associated with fiber quality and yield traits. Nat Genet. 2017;49(7):1089–1098. doi: 10.1038/ng.3887. [DOI] [PubMed] [Google Scholar]
- 63.Wang MJ, et al. Asymmetric subgenome selection and cis-regulatory divergence during cotton domestication. Nat Genet. 2017;49(4):579. doi: 10.1038/ng.3807. [DOI] [PubMed] [Google Scholar]
- 64.Shang Y, et al. Biosynthesis, regulation, and domestication of bitterness in cucumber. Science. 2014;346(6213):1084–1088. doi: 10.1126/science.1259215. [DOI] [PubMed] [Google Scholar]
- 65.Zhang ZH, et al. Genome-wide mapping of structural variations reveals a copy number variant that determines reproductive morphology in cucumber. Plant Cell. 2015;27(6):1595–1604. doi: 10.1105/tpc.114.135848. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Wei X, et al. Genetic discovery for oil production and quality in sesame. Nat Commun. 2015;6:8609. doi: 10.1038/ncomms9609. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Pandey MK, et al. Genomewide Association Studies for 50 agronomic traits in peanut using the 'Reference set' comprising 300 genotypes from 48 countries of the semi-arid tropics of the world. PLoS ONE. 2014 doi: 10.1371/journal.pone.0105228. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Cao K, et al. Genome-wide association study of 12 agronomic traits in peach. Nat Commun. 2016;7:13246. doi: 10.1038/ncomms13246. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Zhao G, et al. A comprehensive genome variation map of melon identifies multiple domestication events and loci influencing agronomic traits. Nat Genet. 2019;51(11):1607–1615. doi: 10.1038/s41588-019-0522-8. [DOI] [PubMed] [Google Scholar]
- 70.Zhang W, et al. Genome assembly of wild tea tree DASZ reveals pedigree and selection history of tea varieties. Nat Commun. 2020;11(1):3719. doi: 10.1038/s41467-020-17498-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Zhang L, et al. RNA sequencing provides insights into the evolution of lettuce and the regulation of flavonoid biosynthesis. Nat Commun. 2017;8(1):2264. doi: 10.1038/s41467-017-02445-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Zhang W, et al. Dissection of the domestication-shaped genetic architecture of lettuce primary metabolism. Plant J. 2020;104:613–630. doi: 10.1111/tpj.14950. [DOI] [PubMed] [Google Scholar]
- 73.Janzen GM, Wang L, Hufford MB. The extent of adaptive wild introgression in crops. 2019;221(3):1279–1288. doi: 10.1111/nph.15457. [DOI] [PubMed] [Google Scholar]
- 74.Diepenbrock CH, et al. Novel loci underlie natural variation in vitamin E levels in maize grain. Plant Cell. 2017;29(10):2374–2392. doi: 10.1105/tpc.17.00475. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Li N, et al. Natural variation in ZmFBL41 confers banded leaf and sheath blight resistance in maize. Nat Genet. 2019;51(10):1540–1548. doi: 10.1038/s41588-019-0503-y. [DOI] [PubMed] [Google Scholar]
- 76.Liu W, et al. Genome-wide association mapping reveals a rich genetic architecture of stripe rust resistance loci in emmer wheat (Triticum turgidum ssp. dicoccum) Theor Appl Genet. 2017;130(11):2249–2270. doi: 10.1007/s00122-017-2957-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Joukhadar R, et al. Genome-wide association reveals a complex architecture for rust resistance in 2300 worldwide bread wheat accessions screened under various Australian conditions. Theor Appl Genet. 2020;133(9):2695–2712. doi: 10.1007/s00122-020-03626-9. [DOI] [PubMed] [Google Scholar]
- 78.Sonnewald U, et al. The Cassava Source-Sink project: opportunities and challenges for crop improvement by metabolic engineering. Plant J. 2020 doi: 10.1111/tpj.14865. [DOI] [PubMed] [Google Scholar]
- 79.Zhao J, et al. Trait associations in the pangenome of pigeon pea (Cajanus cajan) Plant Biotechnol J. 2020;18(9):1946–1954. doi: 10.1111/pbi.13354. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Webber H, et al. Diverging importance of drought stress for maize and winter wheat in Europe. Nat Commun. 2018;9(1):4249. doi: 10.1038/s41467-018-06525-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Kuroh T, et al. Ethylene-gibberellin signaling underlies adaptation of rice to periodic flooding. Science. 2018;361(6398):181–185. doi: 10.1126/science.aat1577. [DOI] [PubMed] [Google Scholar]
- 82.Al-Tamimi N, et al. Salinity tolerance loci revealed in rice using high-throughput non-invasive phenotyping. Nat Commun. 2016;7:13342. doi: 10.1038/ncomms13342. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Tang W, et al. Genome-wide associated study identifies NAC42-activated nitrate transporter conferring high nitrogen use efficiency in rice. Nat Commun. 2019;10(1):5279. doi: 10.1038/s41467-019-13187-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84.Luo B, et al. Metabolite profiling and genome-wide association studies reveal response mechanisms of phosphorus deficiency in maize seedling. Plant J. 2019;97(5):947–969. doi: 10.1111/tpj.14160. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 85.Xiao N, et al. Identification of genes related to cold tolerance and a functional allele that confers cold tolerance. Plant Physiol. 2018;177(3):1108–1123. doi: 10.1104/pp.18.00209. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86.Oladzad A, et al. Single and multi-trait GWAS identify genetic factors associated with production traits in common bean under abiotic stress environments. G3 (Bethesda) 2019;9(6):1881–1892. doi: 10.1534/g3.119.400072. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 87.Buckler ES, et al. The genetic architecture of maize flowering time. Science. 2009;325(5941):714–718. doi: 10.1126/science.1174276. [DOI] [PubMed] [Google Scholar]
- 88.Crowell S, et al. Genome-wide association and high-resolution phenotyping link Oryza sativa panicle traits to numerous trait-specific QTL clusters. Nat Commun. 2016;7:10527. doi: 10.1038/ncomms10527. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 89.Fang C, et al. Genome-wide association studies dissect the genetic networks underlying agronomical traits in soybean. Genome Biol. 2017;18(1):161. doi: 10.1186/s13059-017-1289-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 90.Bresadola L, et al. Admixture mapping in interspecific Populus hybrids identifies classes of genomic architectures for phytochemical, morphological and growth traits. New Phytol. 2019;223(4):2076–2089. doi: 10.1111/nph.15930. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 91.Bararyenya A, et al. Genome-wide association study identified candidate genes controlling continuous storage root formation and bulking in hexaploid sweetpotato. BMC Plant Biol. 2020;20(1):3. doi: 10.1186/s12870-019-2217-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 92.Fridman E, et al. Zooming in on a quantitative trait for tomato yield using interspecific introgressions. Science. 2004;305(5691):1786–1789. doi: 10.1126/science.1101666. [DOI] [PubMed] [Google Scholar]
- 93.Alseekh S, et al. Identification and mode of inheritance of quantitative trait loci for secondary metabolite abundance in tomato. Plant Cell. 2015;27(3):485–512. doi: 10.1105/tpc.114.132266. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 94.Luo J. Metabolite-based genome-wide association studies in plants. Curr Opin Plant Biol. 2015;24:31–38. doi: 10.1016/j.pbi.2015.01.006. [DOI] [PubMed] [Google Scholar]
- 95.Alseekh S, Fernie AR. Metabolomics 20years on: what have we learned and what hurdles remain? Plant J. 2018;94(6):933–942. doi: 10.1111/tpj.13950. [DOI] [PubMed] [Google Scholar]
- 96.Lin H, et al. DWARF27, an Iron-containing protein required for the biosynthesis of strigolactones, regulates rice tiller bud outgrowth. Plant Cell. 2009;21(5):1512–1525. doi: 10.1105/tpc.109.065987. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 97.Yonekura-Sakakibara K, et al. A flavonoid 3-O-glucoside:2 ''-O-glucosyltransferase responsible for terminal modification of pollen-specific flavonols in Arabidopsis thaliana. Plant J. 2014;79(5):769–782. doi: 10.1111/tpj.12580. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 98.Yamamura C, et al. Diterpenoid phytoalexin factor, a bHLH transcription factor, plays a central role in the biosynthesis of diterpenoid phytoalexins in rice. Plant J. 2015;84(6):1100–1113. doi: 10.1111/tpj.13065. [DOI] [PubMed] [Google Scholar]
- 99.Sadre R, et al. Metabolite diversity in alkaloid biosynthesis: a multilane (diastereomer) highway for camptothecin synthesis in Camptotheca acuminata. Plant Cell. 2016;28(8):1926–1944. doi: 10.1105/tpc.16.00193. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 100.Oliver MJ, et al. A sister group contrast using untargeted global metabolomic analysis delineates the biochemical regulation underlying desiccation tolerance in Sporobolus stapfianus. Plant Cell. 2011;23(4):1231–1248. doi: 10.1105/tpc.110.082800. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 101.Tohge T, et al. Exploiting natural variation in tomato to define pathway structure and metabolic regulation of fruit polyphenolics in the lycopersicum complex. Mol Plant. 2020;13(7):1027–1046. doi: 10.1016/j.molp.2020.04.004. [DOI] [PubMed] [Google Scholar]
- 102.Matsuda F, et al. Metabolome-genome-wide association study dissects genetic architecture for generating natural variation in rice secondary metabolism. Plant J. 2015;81(1):13–23. doi: 10.1111/tpj.12681. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 103.Zhou S, et al. Metabolome-scale genome-wide association studies reveal chemical diversity and genetic control of maize specialized metabolites. Plant Cell. 2019;31(5):937–955. doi: 10.1105/tpc.18.00772. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 104.Soltis NE, Kliebenstein DJ. Natural variation of plant metabolism: genetic mechanisms, interpretive caveats, and evolutionary and mechanistic insights. Plant Physiol. 2015;169(3):1456–1468. doi: 10.1104/pp.15.01108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 105.Peng M, et al. Differentially evolved glucosyltransferases determine natural variation of rice flavone accumulation and UV-tolerance. Nat Commun. 2017 doi: 10.1038/s41467-017-02168-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 106.Li H, et al. Leveraging GWAS data to identify metabolic pathways and networks involved in maize lipid biosynthesis. Plant J. 2019;98(5):853–863. doi: 10.1111/tpj.14282. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 107.Burgos E, et al. Validated MAGIC and GWAS populations mapping reveal the link between vitamin E contents and natural variation in chorismate metabolism in tomato. Plant J. 2020;105:907–923. doi: 10.1111/tpj.15077. [DOI] [PubMed] [Google Scholar]
- 108.Browning BL, Yu ZX. Simultaneous genotype calling and haplotype phasing improves genotype accuracy and reduces false-positive associations for genome-wide association studies. Am J Hum Genet. 2009;85(6):847–861. doi: 10.1016/j.ajhg.2009.11.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 109.Finno CJ, et al. Risk of false positive genetic associations in complex traits with underlying population structure: a case study. Vet J. 2014;202(3):543–549. doi: 10.1016/j.tvjl.2014.09.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 110.Huang C, et al. ZmCCT9 enhances maize adaptation to higher latitudes. Proc Natl Acad Sci USA. 2018;115(2):E334–E341. doi: 10.1073/pnas.1718058115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 111.Li H, et al. Genome-wide association study dissects the genetic architecture of oil biosynthesis in maize kernels. Nat Genet. 2013;45(1):43–U72. doi: 10.1038/ng.2484. [DOI] [PubMed] [Google Scholar]
- 112.Wang Q, et al. Genetic architecture of natural variation in rice chlorophyll content revealed by a genome-wide association study. Mol Plant. 2015;8(6):946–957. doi: 10.1016/j.molp.2015.02.014. [DOI] [PubMed] [Google Scholar]
- 113.Wen W, et al. An integrated multi-layered analysis of the metabolic networks of different tissues uncovers key genetic components of primary metabolism in maize. Plant J. 2018;93(6):1116–1128. doi: 10.1111/tpj.13835. [DOI] [PubMed] [Google Scholar]
- 114.Liu J, et al. The conserved and unique genetic architecture of kernel size and weight in maize and rice. Plant Physiol. 2017;175(2):774–785. doi: 10.1104/pp.17.00708. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 115.Deng M, et al. The genetic architecture of amino acids dissection by association and linkage analysis in maize. Plant Biotechnol J. 2017;15(10):1250–1263. doi: 10.1111/pbi.12712. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 116.Brog YM, et al. A Solanum neorickii introgression population providing a powerful complement to the extensively characterized Solanum pennellii population. Plant J. 2019;97(2):391–403. doi: 10.1111/tpj.14095. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 117.Meng XB, et al. Construction of a genome-wide mutant library in rice using CRISPR/Cas9. Mol Plant. 2017;10(9):1238–1241. doi: 10.1016/j.molp.2017.06.006. [DOI] [PubMed] [Google Scholar]
- 118.Lu YM, et al. Genome-wide targeted mutagenesis in rice using the CRISPR/Cas9 system. Mol Plant. 2017;10(9):1242–1245. doi: 10.1016/j.molp.2017.06.007. [DOI] [PubMed] [Google Scholar]
- 119.Liu HJ, et al. High-throughput CRISPR/Cas9 mutagenesis streamlines trait gene identification in maize. Plant Cell. 2020;32(5):1397–1413. doi: 10.1105/tpc.19.00934. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 120.Chen Q, et al. TeoNAM: a nested association mapping population for domestication and agronomic trait analysis in maize. Genetics. 2019;213(3):1065–1078. doi: 10.1534/genetics.119.302594. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 121.Korte A, Farlow A. The advantages and limitations of trait analysis with GWAS: a review. Plant Methods. 2013;9:29. doi: 10.1186/1746-4811-9-29. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 122.Yang Q, et al. CACTA-like transposable element in ZmCCT attenuated photoperiod sensitivity and accelerated the postdomestication spread of maize. Proc Natl Acad Sci USA. 2013;110(42):16969–16974. doi: 10.1073/pnas.1310949110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 123.Huang, X.H. and B. Han (2014) Natural variations and genome-wide association studies in crop plants. In: Merchant SS (ed) Annual review of plant biology, vol 65, p 531–551 [DOI] [PubMed]
- 124.Xing AQ, et al. A rare SNP mutation in Brachytic2 moderately reduces plant height and increases yield potential in maize. J Exp Bot. 2015;66(13):3791–3802. doi: 10.1093/jxb/erv182. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 125.Fan CH, et al. GS3, a major QTL for grain length and weight and minor QTL for grain width and thickness in rice, encodes a putative transmembrane protein. Theor Appl Genet. 2006;112(6):1164–1171. doi: 10.1007/s00122-006-0218-1. [DOI] [PubMed] [Google Scholar]
- 126.Xue WY, et al. Natural variation in Ghd7 is an important regulator of heading date and yield potential in rice. Nat Genet. 2008;40(6):761–767. doi: 10.1038/ng.143. [DOI] [PubMed] [Google Scholar]
- 127.Lu L, et al. Evolution and association analysis of Ghd7 in rice. PLoS ONE. 2012;7(5):e34021. doi: 10.1371/journal.pone.0034021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 128.Mao HL, et al. Linking differential domain functions of the GS3 protein to natural variation of grain size in rice. Proc Natl Acad Sci USA. 2010;107(45):19579–19584. doi: 10.1073/pnas.1014419107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 129.Zhang XJ, et al. Rare allele of OsPPKL1 associated with grain length causes extra-large grain and a significant yield increase in rice. Proc Natl Acad Sci USA. 2012;109(52):21534–21539. doi: 10.1073/pnas.1219776110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 130.Ishimaru K, et al. Loss of function of the IAA-glucose hydrolase gene TGW6 enhances rice grain weight and increases yield. Nat Genet. 2013;45(6):707. doi: 10.1038/ng.2612. [DOI] [PubMed] [Google Scholar]
- 131.Zhu CS, Li XR, Yu JM. Integrating rare-variant testing, function prediction, and gene network in composite resequencing-based genome-wide association studies (CR-GWAS) G3. 2011;1(3):233–243. doi: 10.1534/g3.111.000364. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 132.Listgarten J, Lippert C, Heckerman D. FaST-LMM-Select for addressing confounding from spatial structure and rare variants. Nat Genet. 2013;45(5):470–471. doi: 10.1038/ng.2620. [DOI] [PubMed] [Google Scholar]
- 133.Kaakinen M, et al. MARV: a tool for genome-wide multi-phenotype analysis of rare variants. Bmc Bioinform. 2017;18:110. doi: 10.1186/s12859-017-1530-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 134.Dell'Acqua M, et al. Genetic properties of the MAGIC maize population: a new platform for high definition QTL mapping in Zea mays. Genome Biol. 2015;16:167. doi: 10.1186/s13059-015-0716-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 135.Navarro JAR, et al. A study of allelic diversity underlying flowering-time adaptation in maize landraces. Nat Genet. 2017;49(3):476–480. doi: 10.1038/ng.3784. [DOI] [PubMed] [Google Scholar]
- 136.Wen YJ, et al. An efficient multi-locus mixed model framework for the detection of small and linked QTLs in F-2. Brief Bioinform. 2019;20(5):1913–1924. doi: 10.1093/bib/bby058. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 137.Gibson G. Rare and common variants: twenty arguments. Nat Rev Genet. 2012;13(2):135–145. doi: 10.1038/nrg3118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 138.Xiao Y, et al. Genome-wide association studies in maize: praise and stargaze. Mol Plant. 2017;10(3):359–374. doi: 10.1016/j.molp.2016.12.008. [DOI] [PubMed] [Google Scholar]
- 139.Cockram J, Mackay I (2018) Genetic mapping populations for conducting high-resolution trait mapping in plants. In: Varshney RK, Pandey MK, Chitikineni A (eds) Plant genetics and molecular biology, p 109–138 [DOI] [PubMed]
- 140.Atwell S, et al. Genome-wide association study of 107 phenotypes in Arabidopsis thaliana inbred lines. Nature. 2010;465(7298):627–631. doi: 10.1038/nature08800. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 141.Kerdaffrec E, et al. Multiple alleles at a single locus control seed dormancy in Swedish Arabidopsis. Elife. 2016;5:e22502. doi: 10.7554/eLife.22502. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 142.Lin ZW, et al. Parallel domestication of the Shattering1 genes in cereals. Nat Genet. 2012;44(6):720–U154. doi: 10.1038/ng.2281. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 143.Lin T, et al. Genomic analyses provide insights into the history of tomato breeding. Nat Genet. 2014;46(11):1220–1226. doi: 10.1038/ng.3117. [DOI] [PubMed] [Google Scholar]
- 144.Dickson SP, et al. Rare variants create synthetic genome-wide associations. Plos Biol. 2010;8(1):e000294. doi: 10.1371/journal.pbio.1000294. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 145.Hayes B. Overview of Statistical Methods for Genome-Wide Association Studies (GWAS) Methods Mol Biol. 2013;1019:149–169. doi: 10.1007/978-1-62703-447-0_6. [DOI] [PubMed] [Google Scholar]
- 146.Scossa F, Alseekh S, Fernie AR. Integrating multi-omics data for crop improvement. J Plant Physiol. 2021;257:153352. doi: 10.1016/j.jplph.2020.153352. [DOI] [PubMed] [Google Scholar]
- 147.Ye J, et al. An InDel in the promoter of Al-ACTIVATED MALATE TRANSPORTER9 selected during tomato domestication determines fruit malate contents and aluminum tolerance. Plant Cell. 2017;29(9):2249–2268. doi: 10.1105/tpc.17.00211. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 148.Chan EK, et al. The complex genetic architecture of the metabolome. PLoS Genet. 2010;6(11):e1001198. doi: 10.1371/journal.pgen.1001198. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 149.Clauw P, et al. Leaf growth response to mild drought: natural variation in Arabidopsis sheds light on trait architecture. Plant Cell. 2016;28(10):2417–2434. doi: 10.1105/tpc.16.00483. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 150.Li Q, et al. Genome-wide association studies identified three independent polymorphisms associated with α-tocopherol content in maize kernels. PLoS ONE. 2012;7(5):e36807. doi: 10.1371/journal.pone.0036807. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 151.Wen W, et al. Combining quantitative genetics approaches with regulatory network analysis to dissect the complex metabolism of the maize kernel. Plant Physiol. 2016;170(1):136–146. doi: 10.1104/pp.15.01444. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 152.Chen W, et al. Genome-wide association analyses provide genetic and biochemical insights into natural variation in rice metabolism. Nat Genet. 2014;46(7):714–721. doi: 10.1038/ng.3007. [DOI] [PubMed] [Google Scholar]
- 153.Zhang W, et al. Dissection of the domestication-shaped genetic architecture of lettuce primary metabolism. Plant J. 2020;104(3):613–630. doi: 10.1111/tpj.14950. [DOI] [PubMed] [Google Scholar]
- 154.Wu J, et al. Resequencing of 683 common bean genotypes identifies yield component trait associations across a north-south cline. Nat Genet. 2020;52(1):118–125. doi: 10.1038/s41588-019-0546-0. [DOI] [PubMed] [Google Scholar]
- 155.Zhao X, et al. Loci and candidate gene identification for resistance to Sclerotinia sclerotiorum in soybean (Glycine max L. Merr.) via association and linkage maps. Plant J. 2015;82(2):245–255. doi: 10.1111/tpj.12810. [DOI] [PubMed] [Google Scholar]
- 156.Li W, et al. A natural allele of a transcription factor in rice confers broad-spectrum blast resistance. Cell. 2017;170(1):114–126.e15. doi: 10.1016/j.cell.2017.06.008. [DOI] [PubMed] [Google Scholar]
- 157.Kuroha T, et al. Ethylene-gibberellin signaling underlies adaptation of rice to periodic flooding. Science. 2018;361(6398):181–186. doi: 10.1126/science.aat1577. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Data associated with a paper are available in the manuscript.

