Abstract
Background
Hexaploid oat (Avena sativa L.) is a commercially important cereal crop due to its soluble dietary fiber β-glucan, a hemicellulose known to prevent cardio-vascular diseases. To maximize health benefits associated with the consumption of oat-based food products, breeding efforts have aimed at increasing the β-glucan content in oat groats. However, progress has been limited. To accelerate oat breeding efforts, we leveraged existing breeding datasets (1,230 breeding lines from South Dakota State University oat breeding program grown in multiple environments between 2015 and 2022) to conduct a genome-wide association study (GWAS) to increase our understanding of the genetic control of beta-glucan content in oats and to compare strategies to implement genomic selection (GS) to increase genetic gain for β-glucan content in oat.
Results
Large variation for β-glucan content was observed with values ranging between 3.02 and 7.24%. An independent GWAS was performed for each breeding panel in each environment and identified 22 loci distributed over fourteen oat chromosomes significantly associated with β-glucan content. Comparison based on physical position showed that 12 out of 22 loci coincided with previously identified β-glucan QTLs, and three loci are in the vicinity of cellulose synthesis genes, Cellulose synthase-like (Csl). To perform a GWAS analysis across all breeding datasets, the β-glucan content of each breeding line was predicted for each of the 26 environments. The overall GWAS identified 73 loci, of which 15 coincided with loci identified for individual environments and 37 coincided with previously reported β-glucan QTLs not identified when performing the GWAS in single years. In addition, 21 novel loci were identified that were not reported in the previous studies. The proposed approach increased our ability to detect significantly associated markers. The comparison of multiple GS scenarios indicated that using a specific set of markers as a fixed effect in GS models did not increase the prediction accuracy. However, the use of multi-environment data in the training population resulted in an increase in prediction accuracy (0.61–0.72) as compared to single-year (0.28–0.48) data. The use of USDA-SoyWheOatBar-3 K genotyping array data resulted in a similar level of prediction accuracy as did genotyping-by-sequencing data.
Conclusion
This study identified and confirmed the location of multiple loci associated with β-glucan content. The proposed genomic strategies significantly increase both our ability to detect significant markers in GWAS and the accuracy of genomic predictions. The findings of this study can be useful to accelerate the genetic improvement of β-glucan content and other traits.
Supplementary Information
The online version contains supplementary material available at 10.1186/s12864-024-11174-5.
Keywords: Avena sativa L., Β-glucan content, GWAS, Quantitative trait loci (QTL), Cellulose synthase-like (Csl), Genomic selection (GS)
Background
Hexaploid oat (Avena sativa L.; 2n = 6x = 42, AACCDD) is an important cereal crop for both food and animal feed, with an annual worldwide grain production of 22.6 million tons (FAOSTAT 2021) [1]. In the last twenty years, oat has gained popularity among consumers and culinary experts due to their numerous health benefits and unique nutritional profile [2]. Among the nutritious components of oat, is a unique water-soluble fiber, beta-glucan (β-glucan), which is present in the endosperm and aleurone layer of oat groats and constitutes up to 70% by weight of the endosperm cell walls [2, 3]. Barley (Hordeum vulgare L.) and oat are considered to have higher β-glucan content than wheat (Triticum aestivum L.) and rye (Secale cereale L.). The total β-glucan content by kernel weight ranges from 2.5 to 11.3% for barley, from 2.2 to 7.8% for oat, from 1.2 to 2.9% for rye, and from 0.2 to 1.4% for wheat [4–6]. Although some barley varieties have been reported with higher β-glucan content than oat [7], barley use for human consumption is limited; oat-based food products are more common and constitute the main source of β-glucan intake.
Regular consumption of oat β-glucan can lower blood cholesterol, reduce the risk of coronary heart disease [8], improve the immune response [9], reduce the risk of obesity, and prevent cancer [10, 11]. This has led various agencies around the globe to approve health claims for β-glucan. Among them, in 1997, the Food and Drug Administration (FDA) approved the claim that a daily amount of 3 g of oat β-glucan can help to reduce blood cholesterol and promote cardiovascular health [12]. Increasing the β-glucan content in oat could help consumers to achieve these dietary goals [13].
The effect of genotype, environmental conditions, and agronomic practices on β-glucan content has been reported in various studies [14–23]. Genotype plays a significant role on groat β-glucan content [16, 22, 24]. Some report minor effects of genotype × environment (GE) interactions [25], but others have indicated that GE interactions strongly influence the β-glucan content [26]. The broad-sense heritability estimates for β-glucan content are moderate to high, with values ranging from 0.27 to 0.45 [26], from 0.44 to 0.63 [27], from 0.52 to 0.73 [28], from 0.43 to 0.63 [29], and from 0.80 to 0.85 [30] depending upon the genetic architecture of populations under study and the environments in which they were evaluated. Environmental conditions such as temperature, precipitation, and fertilizers also affect β-glucan content in oat. Previous studies have found that β-glucan content was higher in warm, dry climates and lower in cool/wet environments [17, 31–33]. Furthermore, an increase in nitrogen fertilization resulted in higher levels of β-glucan [14, 17, 34].
Breeding efforts to increase β-glucan content in oat have been facilitated by the development of NIR calibrations that can quickly and accurately measure β-glucan content on whole groats [35]. This considerably increased throughput as compared to traditional methods that required measurement of ground groat flour to estimate β-glucan content. Efforts would further be facilitated by the integration of genomic strategies for germplasm improvement in this crop. QTLs for β-glucan have been identified in several studies using bi-parental populations [27, 36, 37]. GWAS has also been carried out to further elucidate the trait architecture [29, 38–41]. An association analysis using a worldwide diversity panel revealed three DArT markers significantly associated with β-glucan content, and one of the markers sequence showed homology with rice’s (Oryza sativa L.) CslF (cellulose synthase-like) gene family [38]. The detection of a low number of associations in this study might be due to insufficient marker density [38]. A GWAS analysis for β-glucan content was also conducted for 446 North American elite oat breeding lines with phenotypic data collected from historical and balanced two-year trials (Ames, IA) [29]. A total of 24 and 37 significant marker associations for β-glucan content were detected for the historical and balanced data sets, respectively [29], with most of the markers identified corresponding to genomic regions previously found in QTL analysis in bi-parental populations [26, 36]. The use of historically unbalanced data however led to higher genotype × environment interaction and resulted in multiple loci with low R2 values [29].
More recently, a GWAS was performed using three panels of elite accessions (Spring, Winter, and World Diversity) that were genotyped using the Oat 6 K Custom Infinium iSelect BeadChip [42], and through a meta-analysis identified 58 markers significantly associated with β-glucan content [39]. In contrast to the above studies that used North American oat germplasm, a GWAS for β-glucan content conducted using a Federal University of Rio Grande do Sul (UFRGS) oat panel consisting of 413 genotypes evaluated under subtropical conditions led to the identification of seven quantitative trait loci (QTL) associated with β-glucan content with three of those genomic regions being conserved between oat and barley [40].
Collaborative work between Pepsico and Corteva Agriscience released the full genome sequence of hexaploid oat line OT3098 which can be accessed at https://wheat.pw.usda.gov/jb/?data=/ggds/oat-ot3098v2-pepsico [43]. The oat reference genome was used to map QTL affecting agronomic and grain quality traits such as β-glucan content using five different bi-parental recombinant inbred line populations and three QTL linked with β-glucan content were identified on chromosomes 6A (2 QTL) and 7D (1 QTL) [41]. These QTL were co-localized with the QTL found in populations ‘Kanota’ x ‘Ogle’ and Kanota x ‘Marion’ as well as in the CORE and UFRGS diversity panels and with three β-glucan candidate genes (CslF11_chr6A_298 Mb, CslF9_chr6A_416 Mb, and CslF6_chr7A_399 Mb) [41]. They also found a strong association among QTL and candidate genes affecting heading date and quality traits including β-glucan content [41]. Recently, an inventory of genes and QTL identified for various traits [44], including β-glucan content, was created by assigning genes/QTL to the recent OT3098 v2 hexaploid oat physical map found on the GrainGene database [43]. This inventory facilitates the comparison of identified QTL/genes among studies.
Previous GWAS studies have reported many genomic regions associated with β-glucan content, suggesting that marker-assisted selection may not be the most efficient approach to improve β-glucan content. On the other hand, genomic selection (GS) has shown promise for improving β-glucan content in oat [45]. GS considers the effect of both minor and major genetic factors [46, 47]. GS consists of using a statistical model to predict the performance of a specific set of lines (testing) built upon a training population for which both phenotypic and genotypic data are available [48]. Multiple strategies have been proposed to increase prediction accuracy, including using molecular markers as a fixed effect and using multi-environment data. When QTL/markers that influenced phenotypic value by 10% or more were considered as a fixed effect in GS models, prediction accuracy increased by 7.70–33.4% for flowering time, 7.0–20.5% for plant height, and 12.8–62.8% for yield in rice and by 10% for heading date, 8–14% for resistance to powdery mildew, and 9–17% for plant height in wheat depending upon the training size and environment [49, 50]. In addition, integrating multi-environment information into the GS model has been suggested to overcome the genotype × environment interaction and increase prediction accuracy in maize (Zea mays L.), wheat, and potato (Solanum tuberosum L.) [51]. These strategies can be explored in oat to improve the prediction accuracy for β-glucan content.
While previous studies have been critical in increasing our understanding of the genetic control of β-glucan content in oat, further work is needed to fully integrate those findings into breeding schemes. As part of breeding programs’ activities, valuable data is collected that can further confirm previous findings and lead to enhanced breeding strategies. In comparison to the use of diverse panels for conducting GWAS, advanced breeding material offers the advantage of directly applying findings to ongoing breeding programs. In the present study, existing breeding data were leveraged to improve our understanding of the genetic control of β-glucan content and to evaluate strategies to increase β-glucan content in oat. The utilization of an existing data set encompassing multiple years of breeding activities in multiple environments is expected to improve the power, precision, and accuracy of GWAS and enable the detection of variants with smaller effect. The objectives of the study were to (1) map the genomic regions associated with β-glucan content among experimental breeding lines of a North American oat breeding program, (2) compare the significant genomic regions associated with β-glucan content in the present study with those previously reported in the literature, (3) provide new molecular markers to be used in oat breeding programs to improve β-glucan content in oat, and (4) evaluate GS strategies to predict β-glucan content.
Materials and methods
Plant material and field experiments
The GWAS panel consisted of six sets of advanced breeding lines (~ 148–233 lines) evaluated in the preliminary yield trials (PYT) of the South Dakota State University oat breeding program from 2015 to 2022. In total, 1,230 oat breeding lines were considered in this study. Each year, experimental material was planted at three to five locations in South Dakota. The testing sites were located in Brookings (BRK) (44° 18’ N, 96°47” W), South Shore (SSH) (45° 6’ N, 96° 55’ W), Volga (VLG) (44° 19’ N, 96° 55’ W), Beresford (BRF) (43° 4’ N, 96°46’ W), Pierre (PIE) (44° 22’N, 100°21’ W), and Winner (WIN) (43° 22’ N, 99°51’ W) (Table 1). Phenotypes were collected in 26 different environments. The combination of location and year was considered as an individual environment and abbreviations were denoted in Table 1. In each environment, the experimental design followed an augmented design with at least four checks replicated nine times. The plot size was 1.52 × 1.83 m2 at SSH and VLG and 1.52 × 3.65 m2 at BRF, BRK, PIE, and WIN. The experiments were managed following regional standard cultural practices and recommendations.
Table 1.
List of years, locations within each year, set of lines, and SNPs used for GWAS analysis for β-glucan content
| Year | Location | Environmenta | Number of lines | Number of markers |
|---|---|---|---|---|
| 2015 | BRF, SSH, VLG, WIN | BRF_15, SSH_15, VLG_15, WIN_15 | 221 | 8,299 |
| 2016 | SSH, VLG, WIN | SSH_16, VLG_16, WIN_16 | 148 | 11,714 |
| 2017 | BRF, SSH, VLG, WIN | BRF_17, SSH_17, VLG_17, WIN_17 | 225 | 11,356 |
| 2020 | BRK, BRF, SSH, VLG, WIN | BRK_20, BRF_20, SSH_20, VLG_20, WIN_20 | 224 | 9,301 |
| 2021 | BRK, BRF, SSH, VLG, PIE | BRK_21, BRF_21, SSH_21, VLG_21, PIE_21 | 230 | 8,625 |
| 2022 | BRK, BRF, SSH, VLG, PIE | BRK_22, BRF_22, SSH_22, VLG_22, PIE_22 | 233 | 10,902 |
| Across all years | - | ACRb | 1,230 | 9,341 |
aCoded as year and location combination, where BRK, Brookings; BRF, Beresford; SSH, South Shore; VLG, Volga; PIE, Pierre; WIN, Winner and 15, 16, 17, 20,21, and 22 for the years 2015, 2016, 2017, 2020, 2021, and 2022, respectively
bLines from all the years together
Phenotypic data and statistical analysis
The GWAS panel was phenotyped for β-glucan content (as dry-weight%) using Near Infrared (NIR) spectroscopy on whole oat groats using a calibration developed at SDSU [35]. Prior to 2017, β-glucan content was measured on the oat flour instead of the whole oat groat. The phenotypic data for the GWAS panel in multi-location, multi-year trials were curated before analysis. Outliers in the data were identified based on the interquartile range method. In this method, data points having values less than Q1 – (1.5 * IQR) and more than Q3 + (1.5 * IQR) were removed from further analysis where Q1 was the first quartile, Q3 was the third quartile, and IQR was the interquartile range. Phenotypic data from the years 2018 and 2019 were excluded from the analysis due to the poor quality of genotyping data. A linear mixed model was applied to calculate the best linear unbiased estimator (BLUE) value for each line within each environment and across all the environments for a specific year using lme4 r-package [52] as per the equations below and further used in the analysis:
| 1 |
| 2 |
where is the groat β-glucan content (expressed in %) of an individual line, is the mean effect, represents the effect of genotypes, is the effect of the row, is the effect of the column, is the effect of location, is the interaction effect of genotype and location, and or is the error effect. All the effects were considered as random effects except the genotype effect in BLUE value calculations. The broad sense heritability (H2) for β-glucan in each environment and across environments was calculated from the variance estimates from the linear mixed model. Descriptive statistics and Pearson correlations among locations within each year for β-glucan content were determined in R [53].
Genotyping
The entire GWAS panel was genotyped using the Genotyping-by-Sequencing (GBS) method at the USDA-ARS (U.S. Department of Agriculture-Agricultural Research Service) Genotyping Lab, Fargo, ND. The PYT set from 2015 to 2016 were genotyped as part of the Public Oat Genotyping Initiative led by Dr. Jean-Luc Jannink. A subset of this panel was also genotyped with the USDA-SoyWheOatBar-3 K Illumina iSelect array (Jason Fiedler, personal communication). For DNA extraction, each line seed was grown in a standard potting mix of small pots in a greenhouse. Young leaf tissue was collected into a 96-well plate having silica sand to keep the sample dry. Genomic DNA was extracted from ground dried tissue by first lysing with 10% sodium dodecyl sulfate at 65°C, precipitating proteins with 6 M ammonium acetate, precipitating DNA with cold 70% isopropanol, and washing with 70% ethanol. GBS protocol was previously described [54]. Libraries were prepared using double restriction enzymes PstI and MspI, ligated to unique barcode adapters. Paired-end sequencing of the GBS libraries was performed on an Illumina HiSeq 2000 system with 150 bp paired ends. All the Illumina paired end reads were cleaned with bbduk [55] using the following parameters ktrim = r; K = 23; mink = 11 and hdist = 1). Following the cleanup, sequencing reads were aligned to the PepsiCo oat reference genome OT3098V2 [43] using Bowtie2 [56]. Sample read alignment files were then sorted, indexed using SAMtools [57]. Raw single nucleotide variants and indels were called using SAMtools and bcftools v1.14 [58] and were filtered with a minimum mapping quality score of 30 and a minimum reads depth coverage of 4.
Initially, genotypic data consisted of 38,070 SNP markers. Genotypic data for the set of lines present in each year was extracted and curated individually. For quality control, SNP markers with minor allele frequency (MAF) of less than 5%, more than 5% heterozygosity, and unassigned to any oat chromosomes were filtered out. Missing SNP data points were imputed using an LD-kNNi method in TASSEL 5.0 software [59]. Genotypes that had more than 60% of marker data missing and marker heterozygosity of 0.15 were also excluded. The filtered set of 8,299 SNPs and 221 lines in 2015, 11,714 SNPs and 148 lines in 2016, 11,356 SNPs and 225 lines in 2017, 9,301 SNPs and 224 lines in 2020, 8,625 SNPs and 230 lines in 2021, 10,902 SNPs and 233 lines in 2022, and 9,341 SNPs and 1,230 lines for all years combined were used for association analysis (Table 1). The Illumina assay for the USDA-SoyWheOatBar-3 K array was generated using the manufacturer’s protocol. After filtering markers with the same metrics as for GBS data, 2,290 SNPs markers were available for further use in this study.
Population structure and linkage disequilibrium
To determine the underlying population structure in the GWAS panel, the principal component analysis (PCA) approach was used, with the first five PCAs being considered to account for sub-structure. Linkage disequilibrium (LD) analysis was performed for the whole genome as well as for sub-genomes of oat by calculating squared correlation coefficients (r2) for the pairwise marker comparisons using TASSEL v5.0 software [59]. To visualize the extent of LD, r2 was plotted against the map distance (base pairs), and the LOESS curve was fitted. This nonlinear regression curve was used to estimate the LD decay over a genetic distance where the average r2 dropped to half its maximum value [60].
BLUE values calculation for across years GWAS analysis
Most breeding lines were evaluated in only one year. The number of lines common from year to year ranged from 2 to 15. This can lead to bias in calculating BLUE values across all years. Therefore, the β-glucan content values for all lines present across years (i.e., 1230 lines) were predicted (genomic estimated breeding value (GEBV)) using a univariate RRBLUP GS model (Eq. 3). The RRBLUP model was implemented using mixed.solve function of the RRBLUP R package which facilitates the partitioning of random and fixed effects [61] and is represented as:
| 3 |
where is the vector of BLUEs for the phenotypic trait, is the overall mean, is the N × M matrix of markers, is a vector of normally distributed random marker effects with constant variance and e is the residual error variance distributed as . Data collected in a specific environment was used as a training set to predict the GEBV value of all other lines within that environment. This was done separately for all 26 environments. In this strategy, each PYT set was predicted in all 26 environments, which resulted in a balanced data set (GEBVs) for all 1230 lines. This balanced dataset was used to calculate BLUEs across all the years using a linear mixed model:
| 4 |
where Yij is the predicted groat β-glucan content (expressed in %) of line I in environment j, μ is the mean effect, Geni represents the effect of genotypes, Envj is the effect of environment, (Gen × Env)ij is the interaction effect of genotype and environment, and eij is the error effect. All the effects were considered as random effects except the genotype effect.
Association analysis
GWAS was conducted for each environment, across environments for a specific year, and across years using respective SNP data and β-glucan content BLUE values. The BLUE values for each environment and across environments for a specific year were calculated using Eqs. 1 and 2, respectively, whereas across years BLUE values were predicted using Eq. 3. GWAS was performed using a generalized linear model (GLM) and mixed linear model (MLM) in TASSEL v5.0, and with fixed and random model Circulating Probability Unification (FarmCPU) and Bayesian-information and Linkage-disequilibrium Iteratively Nested Keyway (BLINK) methods in R environment with the Genomic Association and Prediction Integrated Tool (GAPIT) [62]. Population structure, either represented by Q-values or PCAs, and kinship coefficient matrix (K) were included as covariates in the GWAS analysis. The kinship coefficient matrix (K) for the GWAS panel was calculated using the Identify-by-Descent (IBD) method in TASSEL. The FarmCPU and BLINK models reduce false positive associations without compromising true positive associations [63, 64]. The population structure and kinship matrix calculated using TASSEL software were included as covariates in MLM, FarmCPU, and BLINK models. All models (GLM, MLM, FarmCPU, and BLINK) were compared based on quantile-quantile (Q-Q) plots to select the best-fitted model for GWAS. Based on the fitness of the model and the power to detect significant associations, the BLINK model performed better and was selected for the association analysis reported here. The Bonferroni-corrected threshold was used to declare a significant association. The estimated allelic effects for each significant marker were obtained directly from the BLINK model. Based on the LD decay value, significant SNPs within 28 Mb were clustered into one locus. Significant SNPs detected in at least two environments were considered a stable locus.
Co-localization with previous studies and candidate gene identification
Significant SNPs for β-glucan content identified in the present study were compared with the β-glucan QTL reported in previous biparental and GWAS studies, and their position was confirmed based on the updated version of Oat Sang v1 and PepsiCo OT3098 v2 hexaploid reference genome [43, 44, 65]. Genomic regions underlying significant β-glucan SNPs were searched for candidate genes with function directly related to β-glucan synthesis i.e., CslF genes using GrainGenes oat genome browser [43]. The genes were searched in the vicinity of each significant SNP with a flanking window of ± 10 Mb upstream and downstream estimated based on the LD-decay value.
Genomic selection
The prediction accuracy (PA) of different genomic prediction scenarios for β-glucan content was compared. The validation strategy consisted of forward predictions in which models were built using previous year(s) PYT data to predict the most recent PYT (2022). Four GS scenarios were evaluated: (i) GS model with no markers as a fixed effect using PYT data from 2015 to 2021 as the training set (TS) to predict PYT 2022 (GSM1); (ii) GS model with a specific set of markers as a fixed effect (selected based on GWAS results) using PYT data from 2015 to 2021 as TS to predict PYT 2022 (GSM2); (iii) GS model with no markers as a fixed effect using data from PYT 2021 as TS to predict PYT 2022 (GSM3); (iv) GS model with a specific set of markers as a fixed effect using data from PYT 2021 as TS to predict PYT 2022 (GSM4). To compare the effect of genotyping platforms on prediction accuracy, the four GS scenarios were evaluated using two types of genotyping data: GBS (9,341 SNPs) and 3 K (2,290 SNPs). Based on GWAS results, four markers were used as a fixed effect in the above scenarios were: (i) Schr1D_372963087 on 1D, Schr2C_67182758 on 2 C, Schr_2D_142397650 on 2D, and Schr_7D_329527800 on 7D from GBS data, and (ii) GMI_ES02_c16996_412 on 1D, PepOat_178 on 2 C, PepOat_245 on 2D, and GMI_ES22_c18772_417 on 7D from 3 K array data. If markers at the same position were not available within the 3 K marker set, a 3 K marker in close physical proximity was identified to reproduce/compare the results. The above GS scenarios were ran using the kin.blup function of the rrBLUP package in R [61]. This linear mixed model partitioned the effects into random and fixed as per Eq. 5:
| 5 |
where is the vector of phenotypic observations (predicted GEBV values from Eq. 3 for GS scenario (i and ii) and BLUE values from Eq. 2 for GS scenario (iii and iv)), full-rank design matrix for the fixed effects (), is the design matrix for the random effects () with ), where is a genomic relationship matrix calculated using all the SNP markers and is for the residuals () with being the identity matrix [61].
Results
Phenotypic analysis
Large phenotypic variation was observed in the overall dataset for β-glucan content with a minimum value of 3.02% (VLG_21) to a maximum value of 7.24% (BRK_17) and with the environmental means ranging from 4.21% (BRF_20) to 5.82% (BRF_15) (Table 2). Overall, there is sufficient phenotypic diversity in the GWAS panel to explore the genetic basis of β-glucan content (Fig. 1). Broad-sense heritability (H2) estimates for β-glucan content ranged from 0.30 (SSH_16) to 0.91 (BRF_22). We observed lower heritability in the years 2015, 2016, and 2017 as compared to 2020, 2021, and 2022. Many factors can affect H2 estimates, including genetic variance and error variance. The variation in heritability could be due to the use of more diverse materials as parents of the PYT after 2016. The higher heritability values in later years could also suggest that the new NIRS calibration may be more accurate at estimating β-glucan content. The overall moderate to high heritability of β-glucan content indicates the robustness of the data and the strong genetic control for β-glucan content in our breeding populations. There was a significant (p < 0.001) positive moderate to high correlation among environments in each year for β-glucan content, ranging from 0.27 to 0.85 (Supplementary Table 1). Low correlation among environments was observed in the years prior to 2017.
Table 2.
Descriptive statistics and broad sense heritability (H2) of β-glucan content in GWAS panel evaluated at different locations in multiple years
| Year | Location | Environmenta | µ (%) | Min – Max (%) | Std | H2 |
|---|---|---|---|---|---|---|
| 2015 | SSH | SSH_15 | 5.66 | 4.63–6.69 | 0.48 | 0.46 |
| BRF | BRF_15 | 5.82 | 4.70–6.97 | 0.38 | 0.40 | |
| VLG | VLG_15 | 5.69 | 4.60–6.73 | 0.40 | 0.59 | |
| WIN | WIN_15 | 5.31 | 4.23–6.39 | 0.40 | 0.49 | |
| Across | ACR_15 | 5.62 | 4.77–6.31 | 0.30 | 0.29 | |
| 2016 | SSH | SSH_16 | 5.54 | 3.84–6.72 | 0.53 | 0.30 |
| VLG | VLG_16 | 5.31 | 3.91–6.52 | 0.48 | 0.72 | |
| WIN | WIN_16 | 5.37 | 4.21–6.84 | 0.50 | 0.50 | |
| Across | ACR_16 | 5.41 | 4.09–6.39 | 0.38 | 0.28 | |
| 2017 | SSH | SSH_17 | 5.01 | 3.65–6.48 | 0.57 | 0.54 |
| BRF | BRF_17 | 5.60 | 3.82–7.24 | 0.66 | 0.70 | |
| VLG | VLG_17 | 5.04 | 3.42–6.68 | 0.61 | 0.69 | |
| WIN | WIN_17 | 5.00 | 3.44–6.89 | 0.61 | 0.54 | |
| Across | ACR_17 | 5.16 | 3.86–6.63 | 0.50 | 0.47 | |
| 2020 | BRK | BRK_20 | 4.42 | 3.45–5.34 | 0.41 | 0.88 |
| SSH | SSH_20 | 4.56 | 3.41–5.70 | 0.42 | 0.79 | |
| BRF | BRF_20 | 4.21 | 3.21–5.69 | 0.45 | 0.85 | |
| VLG | VLG_20 | 4.53 | 3.27–5.57 | 0.44 | 0.84 | |
| WIN | WIN_20 | 4.48 | 3.27–5.84 | 0.45 | 0.85 | |
| Across | ACR_20 | 4.44 | 3.50–5.50 | 0.39 | 0.71 | |
| 2021 | BRK | BRK_21 | 4.27 | 3.50–5.84 | 0.33 | 0.76 |
| SSH | SSH_21 | 4.94 | 4.12–5.96 | 0.35 | 0.60 | |
| BRF | BRF_21 | 4.31 | 3.35–5.37 | 0.32 | 0.76 | |
| VLG | VLG_21 | 4.30 | 3.02–5.63 | 0.38 | 0.36 | |
| PIE | PIE_21 | 4.22 | 3.19–5.30 | 0.38 | 0.78 | |
| Across | ACR_21 | 4.40 | 3.67–5.57 | 0.29 | 0.34 | |
| 2022 | BRK | BRK_22 | 4.74 | 3.60–6.46 | 0.40 | 0.85 |
| SSH | SSH_22 | 4.97 | 3.93–6.32 | 0.45 | 0.89 | |
| BRF | BRF_22 | 4.92 | 3.67–6.50 | 0.46 | 0.91 | |
| VLG | VLG_22 | 4.95 | 3.83–7.03 | 0.45 | 0.83 | |
| PIE | PIE_22 | 4.70 | 3.62–6.47 | 0.45 | 0.65 | |
| Across | ACR_22 | 4.86 | 3.80–6.32 | 0.39 | 0.66 |
µ, Min, Max, and Std are the mean, minimum, maximum, and standard deviation of β-glucan content (%) in a specific environment
aCoded as year and location combination, where BRK Brookings, BRF Beresford, SSH South Shore, VLG Volga, PIE Pierre, WIN Winner, ACR Across locations for the specific year and 15, 16, 17, 20,21, and 22 for the years 2015, 2016, 2017, 2020, 2021, and 2022, respectively
Fig. 1.
Box plots showing the distribution of β-glucan content in GWAS panels evaluated at different locations in multiple years. Cross (×) represents the mean β-glucan content in each respective location
Population structure and linkage disequilibrium
Principal component analysis (PCA) was used to infer the population structure of the GWAS panels within each year and for all the lines together. The first two PCs account for 13.42 to 20.68% of the genotypic variation present in the GWAS panels. This along with no discrete clustering patterns indicates weak population sub-structure (Supplementary Fig. 1). Linkage Disequilibrium (LD) decay was calculated as the squared correlation coefficient (r2) among the marker pairs. The LD for the whole genome decayed at 28 Mb (Fig. 2). Among the three genomes of oat, the LD decay was ~ 13 Mb for genome A, ~ 142 Mb for genome C, and ~ 4 Mb for genome D (Supplementary Fig. 2). We found sub-genome C had a slow LD decay rate as compared to A and D sub-genomes.
Fig. 2.

Genome-wide linkage disequilibrium (LD) decay and the decline of LD-r2 between SNP marker pairs are presented as a function of physical distance (in Mb)
Association analysis
We performed GWAS for each environment individually, across the environments within each year, and across all years. GWAS models such as GLM, MLM, FarmCPU, and BLINK were compared based on a Q-Q plot to select the best-fitted model to conduct GWAS. The BLINK model performed better than others based on the Q-Q plot with fewer false associations and was selected for the association analysis moving forward (Fig. 3). Across all analyses for individual environments and across locations within a specific year, a total of 38 significant marker trait associations (MTAs) for β-glucan content were positioned on fourteen oat chromosomes (1A, 1D, 2A, 2C, 2D, 3C, 4C, 5A, 5D, 6A, 6C, 6D, 7C, and 7D) (Table 3). The phenotypic variation explained by significant SNPs ranged from 1.78 to 57.54% (Table 3). The position of the significant SNPs identified on oat chromosomes for the different environments is presented in Fig. 4. Based on the whole genome LD decay value, significant SNP markers flanking a region of 28 Mb were clustered as a single locus, which reduced the number from 38 to 22 loci. Out of these 22 loci, 12 were reported previously, and 10 are new loci. Among the new loci, 4 of them (Locus 12IND on chromosome (chr) 4C, Locus 17IND on chr 6D, Locus 18IND on chr 7C, and Locus 19IND on chr 7C) were detected in more than one environment (Table 3). Those loci contribute to a significant proportion of variance for β-glucan content in the breeding material (Table 3).
Fig. 3.

A quantile-quantile plot of observed -log10(p) vs. expected -log10(p) for GWAS models comparison (GLM: Generalized linear model; MLM: Mixed linear model; FarmCPU: Fixed and random model Circulating Probability Unification; BLINK: Bayesian-information and Linkage-disequilibrium Iteratively Nested Keyway)
Table 3.
Significant SNP markers linked with β-glucan content in the GWAS panel identified by association analysis for individual environments and across environments for specific years and its co-localization with β-glucan QTLs in previous studies and reported CslF genes/QTLs
| Locusa | Envb | SNPc | Chrd | Position (bp)e | -log10(P)f | Allelic Effectg | R2(%)h | Fav Allelei | QTL Referencej | Reported QTL/ CslF genek |
|---|---|---|---|---|---|---|---|---|---|---|
| 1IND. | BRF_22 | Schr1A_460912744 | 1A | 460912744 | 8.29 | −0.15 | 8.09 | AA/TT | ||
| 2 IND. | WIN_15 | Schr1A_530450138 | 1A | 530450138 | 11.89 | 0.17 | 8.84 | CC/TT | 39 | QTL QBG.CORE-BG_18.1 |
| ACR_15 | Schr1A_530450138 | 1A | 530450138 | 5.94 | 0.07 | 1.83 | CC/TT | |||
| 3l IND. | SSH_21 | Schr1D_372963087 | 1D | 372963087 | 6.47 | −0.15 | 19.77 | CC/GG | 39 | QTL QBG.CORE-BG_1.2 |
| BRK_21 | Schr1D_394700343 | 1D | 394700343 | 6.30 | 0.10 | 15.41 | AA/GG | |||
| 4 IND. | VLG_20 | Schr2A_435494783 | 2A | 435494783 | 6.29 | 0.13 | 7.35 | GG/TT | 43 | |
| 5l IND. | ACR_20 | Schr2C_67182758 | 2C | 67182758 | 5.55 | 0.11 | 8.12 | AA/GG | 43 | QTL QBG.CORE-BG_13.2; CslF3/4/8_chr2C_87 Mb |
| 6 IND. | WIN_17 | Schr2D_105868877 | 2D | 105868877 | 5.47 | 0.16 | 5.18 | CC/TT | ||
| 7l IND. | WIN_15 | Schr2D_143950575 | 2D | 143950575 | 6.02 | −0.12 | 7.84 | AA/GG | 39 | QBG.CORE-BG_8.3 |
| 8 IND. | SSH_22 | Schr3C_57082338 | 3C | 57082338 | 6.59 | −0.16 | 12.59 | AA/GG | ||
| 9 IND. | SSH_22 | Schr3C_494512246 | 3C | 494512246 | 6.13 | −0.14 | 6.08 | CC/TT | ||
| 10l IND. | BRF_17 | Schr4C_18577723 | 4C | 18577723 | 6.59 | 0.20 | 11.10 | AA/GG | 39,41 | QTL QBG.CORE-BG_11.1, Qbgl.UFRGS.L17N-Mrg11.1; CslF9_chr4C_24Mb; |
| 11l IND. | ACR_20 | Schr4C_161363513 | 4C | 161363513 | 5.80 | 0.10 | 13.72 | AA/GG | 41 | Qbgl.UFRGS.E18N-Mrg11.6 |
| BRK_20 | Schr4C_161363513 | 4C | 161363513 | 5.46 | 0.10 | 4.77 | AA/GG | |||
| SSH_22 | Schr4C_171146697 | 4C | 171146697 | 5.91 | −0.10 | 1.78 | AA/TT | |||
| 12l IND. | BRK_22 | Schr4C_706039374 | 4C | 706039374 | 10.08 | 0.31 | 57.54 | AA/GG | ||
| BRF_22 | Schr4C_706039374 | 4C | 706039374 | 7.92 | 0.30 | 39.95 | AA/GG | |||
| 13l IND. | VLG_15 | Schr5A_436924985 | 5A | 436924985 | 7.22 | 0.29 | 48.31 | GG/TT | 41 | QTL QBG.CORE-BG_24.4 |
| ACR_15 | Schr5A_436924985 | 5A | 436924985 | 7.27 | 0.19 | 11.12 | GG/TT | |||
| SSH_22 | Schr5A_450770290 | 5A | 450770290 | 6.46 | 0.21 | 13.45 | GG/TT | |||
| 14l IND. | PIE_22 | Schr5D_397064548 | 5D | 397064548 | 6.30 | −0.13 | 4.72 | CC/TT | 39 | QBG.CORE-BG_6.4 |
| 15l IND. | BRK_20 | Schr6A_367992170 | 6A | 367992170 | 6.68 | −0.12 | 5.40 | AA/GG | ||
| 16 IND. | ACR_15 | Schr6C_586192808 | 6C | 586192808 | 6.48 | 0.08 | 3.76 | AA/TT | 28 | QBG.ISU-BG_36 |
| 17l IND. | SSH_17 | Schr6D_283928205 | 6D | 283928205 | 6.24 | 0.24 | 33.41 | AA/GG | - | CslF9/11_Chr6D_254 Mb |
| ACR_17 | Schr6D_283928205 | 6D | 283928205 | 6.04 | 0.18 | 19.81 | AA/GG | |||
| 18l IND. | VLG_21 | Schr7C_9754827 | 7C | 9754827 | 6.27 | −0.15 | 18.25 | CC/TT | ||
| ACR_15 | Schr7C_20935319 | 7C | 20935319 | 8.52 | −0.15 | 6.39 | AA/GG | |||
| 19l IND. | PIE_22 | Schr7C_79586276 | 7C | 79586276 | 5.59 | 0.24 | 12.75 | AA/GG | ||
| WIN_17 | Schr7C_94029283 | 7C | 94029283 | 5.47 | 0.17 | 4.45 | AA/GG | |||
| BRF_17 | Schr7C_103345888 | 7C | 103345888 | 6.06 | −0.20 | 11.16 | AA/GG | |||
| ACR_17 | Schr7C_103345888 | 7C | 103345888 | 5.57 | −0.15 | 13.39 | AA/GG | |||
| 20 IND. | PIE_22 | Schr7D_216827089 | 7D | 216827089 | 7.71 | −0.17 | 10.70 | CC/TT | ||
| 21 IND. | ACR_15 | Schr7D_428906183 | 7D | 428906183 | 5.37 | −0.11 | 4.09 | CC/TT | 39,41 | Qbgl.UFRGS.L17N-Mrg02.2 |
| BRK_20 | Schr7D_454257773 | 7D | 454257773 | 11.25 | −0.16 | 7.17 | GG/TT | |||
| 22l IND. | BRF_20 | Schr7D_466875467 | 7D | 466875467 | 7.26 | 0.18 | 35.15 | CC/TT | 39,41 | QBG.CORE-BG_2.2b, Qbgl.UFRGS.L17E-Mrg02.1 |
| VLG_20 | Schr7D_467361253 | 7D | 467361253 | 5.41 | 0.15 | 9.08 | AA/GG | |||
| WIN_20 | Schr7D_467361253 | 7D | 467361253 | 6.17 | 0.16 | 21.77 | AA/GG | |||
| WIN_17 | Schr7D_518690781 | 7D | 518690781 | 5.51 | 0.22 | 7.87 | CC/TT |
aGenomic loci associated with β-glucan content and SNP present within a confidence interval of 28 Mb clustered in individual loci. Subscript IND. denotes locus detected either in a single environment or across environments in a single year
bCoded as year and location combination, where BRK Brookings, BRF Beresford, SSH South Shore, VLG Volga, PIE Pierre WIN Winner, ACR Across locations for the specific year and 15, 16, 17, 20, 21, and 22 for the years 2015, 2016, 2017, 2020, 2021, and 2022, respectively
cSignificant markers associated with β-glucan content
dChromosome on which significant SNP is present
ePhysical position of associated SNP in base pair (bp) based on A. sativa reference genome version2.0
f-log10 of the P-value of the significant SNP market
gAllelic effect of the significant marker. The allelic effect is a difference in mean β-glucan content between genotypes with major and minor alleles
hPercentage of phenotypic variance explained by the significant SNP marker (R2)
iMajor/minor allele of significant SNP. The favorable allele (increases the β-glucan content) for that specific marker is bolded
jPrevious studies where the identified locus also reported
kPreviously reported QTL/CslF gene present in the confidence interval of locus identified in the present study
lLocus also identified in the across all-years analysis
Fig. 4.
Distribution over oat chromosomes of significant SNP associated with β-glucan content in the GWAS panel identified in individual environments/year (IND, orange highlighted) and in multi-years GWAS analysis (MULTI, blue highlighted). Rectangle over the locus showed the loci identified in both single and multi-year GWAS analysis. The star on the left side of the chromosome represents the QTLs associated with β-glucan content found at the same or near positions in previous studies [27, 29, 36, 39–41, 44, 66, 67]
Using genomic predicted values for each genotype in each environment, we also performed an association analysis across all six years (2015, 2016, 2017, 2020, 2021, and 2022) and found a total of 73 loci distributed over all oat chromosomes except 2A (Table 4). Loci identified in individual environments/years are differentiated from loci identified in the multi-year GWAS by adding subscripts “IND” and “MULTI”, respectively, along with the locus number for clarity (Tables 3 and 4). A total of 13 loci were detected by both the single-year analyses and the multi-year analysis (Locus 3IND on chr 1D; 5IND on chr 2C; 7IND on chr 2D; 10IND, 11IND, and 12IND on chr 4C; 13IND on chr 5A; 14IND on chr 5D; 15IND on chr 6A; 17IND on chr 6D; 18IND and 19IND on chr 7C; 22IND on chr 7D) with 9 of those loci being previously reported (Fig. 5). The four new loci (Locus 12IND on chr 4C, Locus 17IND on chr 6D, Locus 18IND on chr 7C, and Locus 19IND on chr 7C) identified in more than one single environment GWAS were also detected in the multi-year analysis. Colocalization of loci identified in the individual environments and in multi-year analysis increases the confidence of genomic regions being truly associated with β-glucan content. Out of the 73 loci detected in the multi-year analysis, 37 loci were detected in the multi-year analysis and in previous studies but not in single environment analyses. The remaining 21 loci detected in the multi-year analysis were not reported previously in the literature and could be novel genomic regions associated with β-glucan content.
Table 4.
GWAS results for β-glucan content when the analysis was conducted across all the years (2015 to 2022) and its co-localization with β-glucan QTLs in previous studies and reported CslF genes/QTLs
| Locusa | SNPb | Chrc | Position (bp)d | -log10(P)e | Coincide Individual env. locusf | QTL Referenceg | Reported CslF geneh |
|---|---|---|---|---|---|---|---|
| 1MULTI | Schr1A_384797276 | 1A | 384797276 | 29.64 | 36,39,44 | ||
| 2MULTI | Schr1A_493308933 | 1A | 493308933 | 8.15 | 39,44 | ||
| 3MULTI | Schr1C_20530564 | 1C | 20530564 | 8.45 | |||
| Schr1C_37214405 | 37214405 | 10.81 | |||||
| 4MULTI | Schr1C_76291638 | 1C | 76291638 | 7.97 | |||
| 5MULTI | Schr1D_70591432 | 1D | 70591432 | 12.77 | 39 | ||
| 6MULTI | Schr1D_254437491 | 1D | 254437491 | 8.36 | |||
| 7MULTI | Schr1D_333161751 | 1D | 333161751 | 11.05 | 39,44 | ||
| 8MULTI | Schr1D_367551307 | 1D | 367551307 | 11.79 | 3 | 39 | |
| Schr1D_372963087 | 372963087 | 10.32 | |||||
| Schr1D_377173026 | 377173026 | 17.10 | |||||
| Schr1D_382578378 | 382578378 | 6.86 | |||||
| 9MULTI | Schr1D_420289976 | 1D | 420289976 | 12.54 | 3 | 41 | |
| Schr1D_448548603 | 448548603 | 6.70 | |||||
| 10MULTI | Schr1D_481203247 | 1D | 481203247 | 6.31 | |||
| 11MULTI | Schr2C_67182758 | 2C | 67182758 | 15.84 | 5 | 39 | |
| 12MULTI | Schr2C_346291638 | 2C | 346291638 | 10.82 | 39 | ||
| 13MULTI | Schr2C_367576622 | 2C | 367576622 | 22.46 | 39 | ||
| 14MULTI | Schr2C_476138324 | 2C | 476138324 | 28.58 | 39 | ||
| 15MULTI | Schr2C_547058747 | 2C | 547058747 | 6.92 | 39 | ||
| 16MULTI | Schr2C_555114495 | 2C | 555114495 | 7.47 | 39 | ||
| 17MULTI | Schr2D_45694651 | 2D | 45694651 | 34.26 | |||
| Schr2D_56178195 | 56178195 | 8.06 | 27,44 | ||||
| Schr2D_59094977 | 59094977 | 9.53 | |||||
| 18MULTI | Schr2D_142397650 | 2D | 142397650 | 25.28 | 7 | 39 | |
| 19MULTI | Schr2D_181009434 | 2D | 181009434 | 6.89 | 39 | ||
| Schr2D_189050926 | 189050926 | 7.90 | |||||
| 20MULTI | Schr3A_4984378 | 3A | 4984378 | 17.23 | |||
| 21MULTI | Schr3A_194568800 | 3A | 194568800 | 11.87 | 29,39,44 | ||
| 22MULTI | Schr3A_377182544 | 3A | 377182544 | 8.60 | 39,44,66,67 | ||
| 23MULTI | Schr3A_421912601 | 3A | 421912601 | 8.25 | 39 | ||
| 24MULTI | Schr3C_294355 | 3C | 294355 | 24.66 | |||
| 25MULTI | Schr3C_374039611 | 3C | 374039611 | 16.83 | |||
| 26MULTI | Schr3C_443733273 | 3C | 443733273 | 19.38 | |||
| 27MULTI | Schr3C_611975641 | 3C | 611975641 | 8.20 | 39 | ||
| 28MULTI | Schr3D_19577733 | 3D | 19577733 | 9.58 | |||
| 29MULTI | Schr3D_451241845 | 3D | 451241845 | 7.92 | 39,41 | ||
| 30MULTI | Schr4A_203506415 | 4A | 203506415 | 26.39 | |||
| Schr4A_229092078 | 229092078 | 6.60 | |||||
| 31MULTI | Schr4A_280002673 | 4A | 280002673 | 7.40 | 29,44 | ||
| 32MULTI | Schr4A_435265955 | 4A | 435265955 | 8.40 | |||
| 33MULTI | Schr4C_33900744 | 4C | 33900744 | 12.65 | 10 | 40 | CslF9_chr4C_24 Mb |
| Schr4C_52140370 | 52140370 | 16.19 | |||||
| 34MULTI | Schr4C_83002071 | 4C | 83002071 | 22.23 | 39,44 | ||
| 35MULTI | Schr4C_127448389 | 4C | 127448389 | 9.60 | 29,40,44 | ||
| 36MULTI | Schr4C_172148894 | 4C | 172148894 | 17.81 | 11 | 40 | |
| 37MULTI | Schr4C_688571895 | 4C | 688571895 | 20.71 | 12 | ||
| 38MULTI | Schr4D_261944205 | 4D | 261944205 | 8.22 | |||
| 39MULTI | Schr4D_347891017 | 4D | 347891017 | 18.23 | |||
| 40MULTI | Schr4D_393489739 | 4D | 393489739 | 9.11 | 27,41,44 | ||
| Schr4D_415401302 | 415401302 | 7.18 | |||||
| Schr4D_426900840 | 426900840 | 8.35 | |||||
| Schr4D_432568653 | 432568653 | 8.22 | |||||
| Schr4D_433400944 | 433400944 | 5.53 | |||||
| Schr4D_435485375 | 435485375 | 7.34 | |||||
| Schr4D_443871497 | 443871497 | 6.49 | |||||
| Schr4D_444182728 | 444182728 | 7.16 | |||||
| Schr4D_444372059 | 444372059 | 5.54 | |||||
| Schr4D_444434309 | 444434309 | 5.61 | |||||
| Schr4D_444526825 | 444526825 | 5.58 | |||||
| Schr4D_444526867 | 444526867 | 5.72 | |||||
| Schr4D_444535579 | 444535579 | 5.61 | |||||
| 41MULTI | Schr5A_3987284 | 5A | 3987284 | 11.94 | 40 | ||
| 42MULTI | Schr5A_94477736 | 5A | 94477736 | 6.15 | |||
| 43MULTI | Schr5A_335671941 | 5A | 335671941 | 8.25 | 39 | ||
| Schr5A_402281419 | 402281419 | 12.01 | |||||
| 44MULTI | Schr5A_436463393 | 5A | 436463393 | 8.27 | 13 | 39 | |
| Schr5A_436924985 | 436924985 | 7.05 | |||||
| Schr5A_437216254 | 437216254 | 7.33 | |||||
| Schr5A_437216264 | 437216264 | 7.71 | |||||
| Schr5A_437345826 | 437345826 | 5.34 | |||||
| Schr5A_437345885 | 437345885 | 65.00 | |||||
| Schr5A_437441308 | 437441308 | 8.05 | |||||
| Schr5A_438690717 | 438690717 | 8.17 | |||||
| Schr5A_439200480 | 439200480 | 6.95 | |||||
| Schr5A_439200517 | 439200517 | 6.43 | |||||
| Schr5A_439200520 | 439200520 | 6.43 | |||||
| Schr5A_439200524 | 439200524 | 6.43 | |||||
| Schr5A_442135864 | 442135864 | 7.36 | |||||
| Schr5A_442408039 | 442408039 | 7.88 | |||||
| Schr5A_442408104 | 442408104 | 8.22 | |||||
| Schr5A_442408107 | 442408107 | 8.22 | |||||
| Schr5A_442408113 | 442408113 | 8.22 | |||||
| Schr5A_444061480 | 444061480 | 6.83 | |||||
| Schr5A_444088899 | 444088899 | 7.54 | |||||
| Schr5A_444088914 | 444088914 | 7.54 | |||||
| Schr5A_444624291 | 444624291 | 6.67 | |||||
| Schr5A_444624294 | 444624294 | 6.67 | |||||
| Schr5A_444624355 | 444624355 | 6.67 | |||||
| Schr5A_444624357 | 444624357 | 6.20 | |||||
| Schr5A_444624380 | 444624380 | 6.23 | |||||
| Schr5A_444624403 | 444624403 | 6.20 | |||||
| Schr5A_444928281 | 444928281 | 6.58 | |||||
| Schr5A_445419515 | 445419515 | 8.86 | |||||
| Schr5A_450770290 | 450770290 | 5.51 | |||||
| Schr5A_455456754 | 455456754 | 5.44 | |||||
| Schr5A_455456789 | 455456789 | 5.44 | |||||
| Schr5A_455456796 | 455456796 | 5.44 | |||||
| Schr5A_455456804 | 455456804 | 5.44 | |||||
| Schr5A_455456809 | 455456809 | 5.44 | |||||
| Schr5A_455456825 | 455456825 | 5.44 | |||||
| Schr5A_455456831 | 455456831 | 5.44 | |||||
| Schr5A_455456837 | 455456837 | 5.44 | |||||
| Schr5A_455456839 | 455456839 | 5.44 | |||||
| Schr5A_455456841 | 455456841 | 5.44 | |||||
| Schr5A_455456860 | 455456860 | 5.44 | |||||
| Schr5A_455456862 | 455456862 | 5.44 | |||||
| Schr5A_459620200 | 459,620,200 | 7.10 | |||||
| Schr5A_479430988 | 479430988 | 22.38 | |||||
| 45MULTI | Schr5C_20094258 | 5C | 20094258 | 18.62 | 29,39,44 | ||
| Schr5C_22624452 | 22624452 | 33.19 | |||||
| 46MULTI | Schr5C_152476214 | 5C | 152476214 | 20.96 | 39,44 | ||
| 47MULTI | Schr5C_372915413 | 5C | 372915413 | 22.83 | 39,44 | ||
| Schr5C_374966630 | 374966630 | 5.98 | |||||
| 48MULTI | Schr5C_531636904 | 5C | 531636904 | 5.92 | 39,41 | ||
| Schr5C_535293220 | 535293220 | 8.27 | |||||
| Schr5C_545123612 | 545123612 | 7.01 | |||||
| Schr5C_568292149 | 568292149 | 6.73 | |||||
| Schr5C_596791583 | 596791583 | 8.00 | |||||
| 49MULTI | Schr5D_410021112 | 5D | 410021112 | 21.67 | 14 | 39,41 | |
| 50MULTI | Schr5D_442808910 | 5D | 442808910 | 9.70 | 39 | ||
| 51MULTI | Schr5D_475690512 | 5D | 475690512 | 16.58 | 39 | ||
| 52MULTI | Schr6A_234858768 | 6A | 234858768 | 29.37 | |||
| 53MULTI | Schr6A_267195489 | 6A | 267195489 | 12.63 | |||
| Schr6A_292728342 | 292728342 | 7.66 | |||||
| 54MULTI | Schr6A_382776022 | 6A | 382776022 | 12.72 | 15 | 39,41 | |
| Schr6A_398040302 | 398,040,302 | 23.98 | CslF9_chr6A_416 Mb | ||||
| Schr6A_414716719 | 414716719 | 18.30 | |||||
| 55MULTI | Schr6C_24190624 | 6C | 24190624 | 8.51 | 39,44 | ||
| Schr6C_45173202 | 45173202 | 13.56 | |||||
| 56MULTI | Schr6C_65706004 | 6C | 65706004 | 10.56 | 39 | ||
| Schr6C_90813862 | 90813862 | 16.79 | |||||
| 57MULTI | Schr6C_118611070 | 6C | 118611070 | 7.16 | 41 | ||
| 58MULTI | Schr6C_256378545 | 6C | 256378545 | 11.49 | 39 | ||
| 59MULTI | Schr6C_482355608 | 6C | 482355608 | 7.40 | 29,39,44,67 | ||
| 60MULTI | Schr6D_200889441 | 6D | 200889441 | 5.72 | |||
| 61MULTI | Schr6D_274125462 | 6D | 274125462 | 12.16 | 17 | CslF9/11_chr6D_254 Mb | |
| 62MULTI | Schr7A_51412334 | 7A | 51412334 | 8.92 | |||
| 63MULTI | Schr7A_487098818 | 7A | 487098818 | 7.11 | |||
| 64MULTI | Schr7C_17256941 | 7C | 17256941 | 7.59 | 18 | ||
| 65MULTI | Schr7C_45870666 | 7C | 45870666 | 36.31 | 18 | ||
| Schr7C_70734181 | 70734181 | 22.33 | |||||
| 66MULTI | Schr7C_102880155 | 7C | 102880155 | 7.60 | 19 | ||
| 67MULTI | Schr7C_186204467 | 7C | 186204467 | 7.54 | |||
| 68MULTI | Schr7C_521748351 | 7C | 521748351 | 8.07 | |||
| 69MULTI | Schr7D_7926065 | 7D | 7926065 | 11.10 | 29,36,39,40,44 | ||
| 70MULTI | Schr7D_65830000 | 7D | 65830000 | 11.55 | 29,36,39,40,44 | ||
| 71MULTI | Schr7D_181512834 | 7D | 181512834 | 35.75 | 29,36,39,40,44 | ||
| 72MULTI | Schr7D_329527800 | 7D | 329527800 | 5.35 | 29,36,39,40 | CslF6_chr7D_301 Mb | |
| 73MULTI | Schr7D_418639913 | 7D | 418639913 | 15.84 | 22 | 40,39,29,36,44 | |
| Schr7D_434959047 | 434959047 | 11.78 | |||||
| Schr7D_454257791 | 454257791 | 88.45 | |||||
| Schr7D_466875464 | 466875464 | 41.87 |
aGenomic loci associated with β-glucan content and SNP present within a confidence interval of 28 Mb clustered in individual loci. Subscript MULTI denotes locus detected across all year GWAS analysis
bSignificant markers associated with β-glucan content
cChromosome on which significant SNP is present
dPhysical position of associated SNP in base pair (bp) based on A sativa reference genome version2.0
e-log10 of the p-value of the significant SNP marker
fCorresponding locus identified when analysis conducted for a single environment and single year
gPrevious studies in which the identified locus was also reported
hPreviously reported QTL/CslF gene near/present in the confidence interval of locus identified in the present study
Fig. 5.
Venn diagram representing the number of loci detected for β-glucan content in individual environment/year analyses, multi-year GWAS analysis, and previously reported [44]
Co-localization with previous studies and candidate gene identification
The significant genomic regions identified in the present study were compared with previous QTL studies. We searched these regions with ± 10 Mb confidence interval for previously identified β-glucan QTLs as well as for candidate genes that have direct or indirect functions related to β-glucan synthesis. We found that Locus 2IND (at 530 Mb on 1A), Locus 3IND (at 372–394 Mb on 1D), Locus 4IND (at 435 Mb on 2A), Locus 5IND (at 67 Mb on 2C), Locus 7IND (at 143 Mb on 2D), Locus 10IND (at 185 Mb on 4C), Locus 11IND (at 161–171 Mb on 4C), Locus 13IND (at 436 [27, 42–44]– 450 Mb on 5A), Locus 14IND (at 397 Mb on 5D), Locus 16IND (at 586 Mb on 6C), Locus 21IND (at 425–454 Mb on 7D), and Locus 22IND (at 466–518 Mb on 7D) coincided with the genomic regions previously reported [29, 39–41] for β-glucan content (Table 3; Figs. 4 and 5). Candidate gene analysis found that Locus 5IND (at 67 Mb on 2C), Locus 10IND (at 18 Mb on 4C), and Locus 17IND (at 283 Mb on 6D) are located near Cellulose synthase-like CslF genes, CslF3/4/8_chrom2C_87 Mb, CslF9_chrom4C_24 Mb, and CslF9/11_chr6D_254 Mb, respectively [43] (Table 3).
GWAS across all years (multi-year analysis) resulted in the identification of 73 loci, among which 37 were not detected in the single environment/year analyses but were previously reported for their association with β-glucan content (Fig. 5). The list of loci co-localized with previous studies is provided in Table 4 [26, 28, 36, 39, 41, 43, 44, 66, 67]. Three loci, i.e., Locus 33MULTI (at 33–52 Mb on 4C), Locus 54MULTI (at 382–414 Mb on 6A), and Locus 72MULTI (at 329 Mb on 7D), are identified in the vicinity of CslF genes CslF9_chr4C_24Mb, CslF9_chr6A_416 Mb, and CslF6_chr7D_301 Mb, respectively. These three loci were also reported to be associated with β-glucan content in previous studies [39–41].
Breeding for higher β-glucan content in oat
Pyramiding of favorable alleles of significant markers can improve the trait performance. Four loci were selected: Schr1D_372963087 (Locus 3IND) which was detected in multiple environments, was reported in previous studies, and explained a large proportion of phenotypic variance; Schr2C_67182758 (Locus 5IND) which was detected in multiple environments, was reported in previous studies, and is located in the vicinity of a β-glucan synthesis genes (CslF); Schr2D_142397650 (Locus 7IND) which was detected in multiple environments and was reported in previous studies, and Schr7D_329527800 (Locus 72MULTI, across year analysis) which was also detected in multiple environments, reported in previous studies, and located in the vicinity of a β-glucan synthesis genes (CslF). The 1,230 lines used in the GWAS were scanned for favorable and unfavorable alleles at those 4 loci. Favorable alleles for these four markers resulted in an increase in β-glucan content which was dependent on the number of favorable alleles at those loci (Fig. 6). The average predicted value of β-glucan content ranged from 4.86% (zero favorable allele) to 5.22% (four favorable alleles). These markers could be exploited to develop Kompetitive allele-specific PCR (KASP) markers to screen oat germplasm in the early breeding cycle.
Fig. 6.
Box plots showing the β-glucan content values for each stack of favorable alleles in the GWAS panel. The legend represents the number of favorable alleles stacked, with zero meaning no favorable allele and four meaning all four favorable alleles
The results from this GWAS study confirm that many loci are involved in controlling β-glucan content in oat. Therefore, GS is a valid approach to increase β-glucan content in oat. Using the same breeding data as used for the GWAS, we compared several factors and their effect on genomic prediction accuracy (PA) for β-glucan content. We investigated the source genotyping platform used (GBS vs. 3 K), if important markers should be considered as a fixed effect in GS models, and the inclusion of multiple years of data in the training set. The accuracy of each scenario varies depending on the set of lines used as training sets (TS). In scenarios GSM1_GBS and GSM2_GBS, we used data from PYT 2015 to 2021 as TS to predict lines from the 2022 PYT. In scenarios GSM3_GBS and GSM4_GBS, however, only data from the 2021 PYT was used as TS to predict PYT 2022 lines. A significant improvement in PA was observed when multiple years of PYT data (2015–2021) were used to build the model (0.61 to 0.72) as compared to when only one year of data (2021) was used as a training set (0.38 to 0.48) to predict the genotypes evaluated in 2022 (Table 5). Using markers as a fixed effect on the other hand did not result in drastic improvement in PA (no marker as fixed effect: 0.38 to 0.71 versus marker as fixed effect: 0.39 to 0.70). Models built using data from the Illumina 3 K chip platform were based on a reduced number of markers (2290 SNPs) compared to the GBS dataset (9,341 SNPs), but the reduction in markers did not significantly impact the accuracy of the prediction models (Table 5).
Table 5.
Accuracies of β-glucan content predictions for four different genomic prediction scenarios (GSM)
| Scenario | BRK_22 | SSH_22 | BRF_22 | VLG_22 | PIE_22 | ACR_22 |
|---|---|---|---|---|---|---|
| GSM1_GBS | 0.62 | 0.62 | 0.66 | 0.61 | 0.63 | 0.71 |
| GSM2_GBS | 0.64 | 0.63 | 0.66 | 0.63 | 0.63 | 0.72 |
| GSM3_GBS | 0.38 | 0.39 | 0.47 | 0.38 | 0.48 | 0.48 |
| GSM4_GBS | 0.39 | 0.41 | 0.46 | 0.40 | 0.48 | 0.48 |
| GSM1_3K | 0.62 | 0.62 | 0.66 | 0.61 | 0.63 | 0.71 |
| GSM2_3K | 0.63 | 0.61 | 0.65 | 0.62 | 0.62 | 0.71 |
| GSM3_3K | 0.38 | 0.38 | 0.44 | 0.34 | 0.44 | 0.45 |
| GSM4_3K | 0.32 | 0.32 | 0.36 | 0.28 | 0.36 | 0.37 |
BRK Brookings, BRF Beresford, SSH South Shore, VLG Volga, PIE Pierre, WIN Winner, and 22 for the year 2022
GSM1-GSM4 are four different GS scenarios, GBS: genotyping by sequencing data, 3 K: SNP array data. GSM1: GS model with no markers as a fixed effect using PYT data from 2015 to 2021 as the training set (TS) to predict PYT 2022; GSM2: GS model with four markers as a fixed effect (located at loci 3 IND ; 5 IND ; 7 IND ; and 72 MULTI ) using PYT data from 2015 to 2021 as TS to predict PYT 2022; GSM3: GS model with no markers as a fixed effect using data from PYT 2021 as TS to predict PYT 2022; GSM4: GS model with a specific set of markers as a fixed effect using data from PYT 2021 as TS to predict PYT 2022
Discussion
Although oat production has decreased worldwide, it is still highly valued for its positive health benefits due to the presence of water-soluble β-glucan. A better understanding of the genetic architecture of β-glucan content will help breeders increase β-glucan content in oat groats and ultimately enhance the nutritional quality of oat-based food products. An association analysis for β-glucan content was conducted using a panel of oat breeding populations from a North American oat breeding program. Broad sense heritability estimates for β-glucan content were moderate to high in most environments (except SSH_16 and VLG_21) for the breeding populations considered, indicating a relatively strong effect of the genotype relative to the environment (Table 2). Many factors can affect H2 estimates, including genetic variance and error variance. The variation in heritability could be due to the use of more diverse materials as parents of the PYT after 2016. The higher heritability values in later years could also suggest that the new NIRS calibration may be more accurate at estimating β-glucan content. The overall moderate to high heritability of β-glucan content indicates the robustness of the data and confirms the strong genetic control for β-glucan content in our breeding populations. Similar levels of heritability for β-glucan content was reported previously [25, 29]. Even though the breeding germplasm used in this study was derived from a limited number of progenitors (parents), there was sufficient phenotypic variability and high heritability in the present panel to capture reliable associations including variations controlled by rare alleles [68, 69]. Although phenotypic variation may not be as large as a diverse population of non-related lines, the associated genes regions found here will have less problems with non-desirable linkage drag and should be more easily deployed.
There was no defined population structure in the GWAS panels as the breeding lines were derived from parents developed primarily from South Dakota State University oat breeding programs. The sum of genotypic variations explained by the first two principal components (PCs) for individual years ranges from 13.42 to 20.68% (Supplemental Fig. 1), which is less compared to other crops such as barley that was ~ 90% [70]. The low level of population structure in the GWAS panel reduces the chance of false associations. The extent of LD and rate of LD decay across the whole genome affect the power and resolution of association mapping. Previous studies reported different ranges (1.4–4.1 cM) of whole genome LD decay, which is difficult to convert in terms of physical distance (base pairs) [42, 71–74]. However, these studies found that the rate of LD decay is slower in North American oat germplasm; this is consistent with the 28 Mb in the present study. In contrast, a whole genome LD decay of 2.29 Mb was reported for a diverse panel of worldwide oat accessions [75]. Differences in LD decay among studies might be due to the inclusion of different landraces/accessions in the GWAS panel and differences in selection practices [76]. In predominantly self-pollinating species such as oat and barley, it is expected that LD decay occurs over longer map distances, and this influences the power and resolution of GWAS. The extent of LD decay for each genome varies and we found that sub-genome C had higher LD (142 Mb) as compared to A (13 Mb) and D (4 Mb) genomes. Genome C also had more marker coverage with less gaps in comparison to the other two genomes (Supplementary Fig. 3).
Both false positives and true negatives occur in GWAS. These errors can be corrected by using a statistical model that controls false associations, and BLINK outperformed other models in this regard (Fig. 4), as found in previous studies [64, 77–79]. Stringent conditions such as Bonferroni-corrected threshold value of -log10(P) ≥ 5.2 were used to identify positive and significant marker-trait associations. In this study, the association analysis for β-glucan content was conducted using six breeding panels and was performed for a total of 26 environments. Using multiple GWAS panels and evaluating lines in multiple environments can enhance the power and accuracy of GWAS to capture significant associations. This approach differs from previous studies that either evaluated the same population in multiple environments or different populations in fewer environments [27, 29, 36, 39, 40]. Overall, our study confirmed the quantitative inheritance of β-glucan content [36, 41] and resulted in the identification of a high number of genomic regions associated with the trait, providing further insight into the genetic architecture of β-glucan content in oat.
Conducting GWAS individually for each of the 26 environments and across locations within a specific year, we identified 22 genomic loci associated with β-glucan content on 14 different chromosomes, and 12 loci that were congruent with previously identified β-glucan QTLs (Table 4) based upon their physical position on the oat consensus map. The use of fewer genotypes (~ 230) and fewer environments in single-year GWAS analyses led to a much smaller number of loci being detected as compared to the multi-year analysis, and only few loci co-localized with previously detected QTLs. However, single location/year GWAS analyses led to the identification of 10 loci that have not been previously reported and four of those loci were detected in more than one environment when analyzing each environment separately, suggesting that they are true associations. Furthermore, a search for candidate genes in the vicinity of those loci revealed the presence of CslF9/11 (positioned at 245 Mb) in proximity to the loci 17IND/61MULTI located on 6D.
Simultaneously using the information from all PYTs from 2015 to 2022 to conduct GWAS, we identified 73 loci associated with β-glucan content, of which 15 co-localized with the genomic regions identified in individual environments and 47 were reported in previous studies (Fig. 5). Direct comparison of QTL identified in the present study with previous linkage mapping and GWAS studies is facilitated by the recent QTL inventory [44], as the previously reported QTLs are assigned to chromosomes (in base pairs) according to the OT3098 oat genome version 2 [43]. Co-localization of QTLs among individual environments, across all-year GWAS analysis, and with previously reported QTL and candidate genes for β-glucan synthesis increase our confidence in their true association with β-glucan content. To the best of our knowledge, we are the first to report a GWAS conducted using genomic predicted values computed for many environments to increase the power of detection. Genomic prediction was used to achieve the evaluation of 1,230 breeding lines in 26 environments. While this approach may violate some assumptions typically required for GWAS, it enabled the detection of a higher number of loci compared to single environment analyses and confirmed the importance of 47 loci (detected in the multi-environment GWAS and reported previously in the literature) in controlling β-glucan content that would not have been otherwise detected when performing single location/year analyses. New associations were also discovered. Among the 25 new associations, four were also detected in the single environment analyses suggesting that they are valid associations. Further evaluation is needed to confirm the validity of the remaining 21 new associations. But overall, the multi-environment analysis was more powerful in detecting associations than the GWAS conducted for single environment/single year. This study highlights the importance of performing GWAS in multiple environments to obtain robust results.
Several loci merit consideration for marker-assisted breeding. Locus 3IND (Schr1D_372963087- Schr1D_ 394700343)/8MULTI and 9MULTI explained from 15 to 19% of the PV and coincided with markers GMI_ES14_c3227_73 and avgbs_109934 (named as QTL QBG.CORE-BG_1.2) identified previously [39]. Locus 5IND (Schr2C_67182758)/11MULTI and Locus 7IND (Schr2D_143950575)/18MULTI fall in the confidence interval of QTL QBG.CORE-BG_13.2 and QBG.CORE-BG_8.3, respectively, which were identified in the Oat CORE panel [39, 43]. Locus 10IND/33MULTI and Locus 11IND/36MULTI explained 11–13.72% of the PV and are co-localized with Qbgl.UFRGS.L17N-Mrg11.1 and QTL Qbgl.UFRGS.E18N-Mrg11.6 in the UFRGS Oat Panel [41]. Finally, Locus 13IND (Locus 44MULTI) and Locus 22IND (Locus 73MULTI) explained more than 10% of the PV and are mapped in the genomic position of QTL QBG.CORE-BG_24.4 and QBG.CORE-BG_2.2b/Qbgl.UFRGS.L17E-Mrg02.1, respectively, which were detected in the Oat CORE Panel [39, 41].
Four Loci, Locus 5IND/11MULTI, Locus 10IND/33MULTI, Locus 17IND/61MULTI, and Locus 72MULTI are present near CslF genes CslF3/4/6_chr2C_87 Mb, CslF9_chr4C_24 Mb, CslF9/11_chr6D_254 Mb, and CslF6_chr7D_301 Mb genes, respectively [41]. Cellulose synthase-like (Csl) genes play a major role in cellulose synthesis and are associated with β-glucan production in small grains, with CslF likely being of major importance [80–82]. In the Poaceae, gene families belonging to cellulose synthase (CeS) and cellulose synthase-like genes (Csl) are involved in the biosynthesis of β-glucan in the endosperm and include CesA, CslA, CslB, CslC, CslD, CslE, CslF, CslG, and CslH [80, 83]. Out of these, CslF and CslH glucosyltransferase families are only present in grasses/monocotyledons such as oat, rice, wheat, and barley, and CslF family genes are considered potential candidate genes controlling β-glucan synthesis [80, 81, 84]. In wheat, down-regulation of CslF6 resulted in a 30–53% decrease in β-glucan content, while overexpression of this gene led to an increase in β-glucan content [81, 82]. In barley, bgl (β-glucan-less mutant) trait was mapped on chromosome 7H, where candidate gene HvCslF6 is located [85]. In oat, the level of expression of AsCslF6_C relative to its homeologs was associated with β-glucan content [39]. Surprisingly, no QTL was previously reported for β-glucan content on chromosome 7C where AsCslF6_C is located [41, 44]. In this study, five genomic regions were reported on chromosome 7C, however, none coincided with the location of AsCslF6_C, the closest associated SNP is located approximately 100 MB from AsCslF6_C.
Despite the important role played by Csl and CeS genes in controlling β-glucan content, the detection of many loci associated with β-glucan content in this study and in previous GWAS studies suggest that even traits such as β-glucan content that have relatively simple biological control can be affected by variation at many loci across the genome. Interestingly, in SDSU breeding lines, the five loci that were detected near CsFl genes were not among those that had the strongest effect on the PV. Of interest are the presence of loci on 1D (3IND), 5A (13IND), and 7D (22IND) which explained a larger proportion of the phenotypic variation (19–48%) than the loci located near CsFl genes (5–11%). QTLs were previously detected in the region of those three loci located on 1D, 5A and 7D reinforcing the importance of those three QTLs in controlling β-glucan content in oat. Furthermore, results indicate that the genomic regions associated with β-glucan content found in this study are common with the ones detected in germplasm from different parts of the globe, such as the Oat CORE Panel and the UFRGS Oat Panel. We used advanced breeding lines to capture marker-trait association instead of using a diverse GWAS panel, yet we were able to capture many MTAs that were reported in QTLs studies using diverse panels. Our observation demonstrates the presence of enough variation for β-glucan content in SDSU breeding population to increase β-glucan content in future oat varieties. We expect that this is also the case in other US oat breeding programs as germplasm exchange is common among programs.
Because the trait is controlled by many loci across the genome, GS is a promising strategy to increase β-glucan content. Leveraging the data collected in the PYTs over years as part of our breeding activities (2015–2021) increased prediction accuracy by 28% when predicting the 2022 PYT as compared to building a model using the data collected for the PYT from the previous year only (2021), even though data from five locations in 2021 were used to build the model. There are two likely reasons for these results. First, our strategy of leveraging breeding data over multiple years allows us to increase the genetic diversity used to build the model by using a much larger set of genotypes (997 genotypes versus 230 genotypes). Second, the multi-year strategy built a model based on marker effects from 26 environments. Despite β-glucan content having high heritability, the effect of each marker on the phenotypic value depends on the environment (as observed when performing the GWAS in single locations, i.e. markers detected and their effect on phenotypic value varied based on the location in each year). It is therefore not surprising that a model that captures each marker effect over a larger number of environments would be a more robust model. On the other hand, the genotyping platforms did not affect prediction accuracy. The lower number of markers on the USDA 3 K array as compared to GBS did not reduce prediction accuracy; the cost effectiveness of the 3 K array makes it a promising genotyping platform for implementing GS in oat breeding programs.
Conclusion
In summary, a new approach was proposed to perform GWAS and build genomic prediction models. The approach consisted of leveraging data collected as part of breeding activities to increase our ability to detect loci associated with specific traits and to increase the robustness of genomic prediction models. The study demonstrated that performing GWAS multiple environments increase the number of significant associations that are detected and is therefore necessary to obtain robust association results. Similarly, building GS models based on large multi-environment trials data leads to more robust models, even for traits with high heritability such as beta-glucan content. In addition to providing valuable insights into the genetic basis of β-glucan content in oat, the findings of this study will facilitate the implementation of GS to accelerate the genetic progress for increasing β-glucan content in oat and will be of value for improving other traits in oat and other crops.
Supplementary Information
Acknowledgements
We want to acknowledge Nicholas Hall and part-time employees for conducting the field trials and assisting with the collection of phenotypic data. Thank you to Terrance Peterson and Mary Osenga at the North Central Small Grains Genotyping Laboratory for acquiring genotyping data. The authors are also thankful to Paul Richter from General Mills for helping with the quality data collection in 2015.
Abbreviations
- Csl
Cellulose synthase-like (Csl)
- GBS
Genotyping by Sequencing (GBS)
- GEBV
Genomic estimated breeding value
- GS
Genomic selection
- GWAS
Genome-wide association study
- H2
Broad sense heritability
- KASP
Kompetitive allele-specific PCR
- LD
Linkage disequilibrium
- MAF
Minor allele frequency
- MAS
Marker-assisted selection
- NIRS
Near-infrared spectroscopy
- PA
Prediction accuracy
- PCA
Principal component analysis
- QTL
Quantitative trait loci
- RRBLUP
Ridge regression best linear unbiased prediction
- SNP
Single nucleotide polymorphism
Authors’ contributions
SKB performed the statistical analyses and wrote the first draft. GO assisted with data analyses. JDF and RSD helped with genotyping data processing and imputation. JLJ supported the genotyping in 2015 and assisted with the genotyping in 2016 and 2017. MC oversaw the project, secured funding, and participated in writing the manuscript. All authors were involved with editing the manuscript.
Funding
This study was supported by the South Dakota Agricultural Experiment Station, South Dakota State University Agronomy Horticulture and Plant Science Department, the National Institute of Food and Agriculture under hatch proposal SD00H529-14, the South Dakota Crop Improvement Association, Grain Millers, and General Mills.
Data availability
The genotyping data is available on T3/Oat (triticeaetoolbox.org) under the genotyping projects named SDSU_2015_GBS, SDSU_2016_GBS, SDSU_2017_GBS, SDSU_2020_GBS, SDSU_2021_GBS, SDSU_2022_GBS, and 3K_SDSU_PYT2015_2022. Small seed samples from the breeding lines used in this study are available for research purposes upon request (melanie.caffe@sdstate.edu).
Declarations
Ethics approval and consent to participate
All methods were performed in accordance with the relevant guidelines and regulations of institutional, national, and international guidelines and legislation.
Consent for publication
Not applicable.
Competing interests
The authors declare no competing interests.
Supplementary Information
Supplementary Material 1.
Footnotes
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.FAOSTAT. https://www.fao.org/faostat/en/#home
- 2.Butt MS, Tahir-Nadeem M, Khan MKI, Shabir R, Butt MS. Oat: unique among the cereals. Eur J Nutr. 2008;47(2):68–79. [DOI] [PubMed] [Google Scholar]
- 3.Fincher G, Stone B. Cell walls and their components in cereal grain technology. Adv Cereal Sci Technol. 1986;8:207–95. [Google Scholar]
- 4.Shen RL, He J. Research advances in fine structure of cereal β-glucan. J Henan Univ Technol (Nat Sci Ed). 2009;30:85–7. [Google Scholar]
- 5.Lazaridou A, Biliaderis CG, Izydorczyk MS. Cereal beta-glucans: structures, physical properties, and physiological functions. In Functional Food Carbohydrates; Biliaderis CG, Izydorczyk MS, Eds.; Boca Raton: CRC Press; 2007. p. 1–72.
- 6.Charalampopoulos D, Wang R, Pandiella SS, Webb C. Application of cereals and cereal components in functional foods: a review. Int J Food Microbiol. 2002;79(1):131–41. [DOI] [PubMed] [Google Scholar]
- 7.Hu X, Zhao J, Zhao Q, Zheng J. Structure and characteristic of β-glucan in cereal: a review: β-glucan from barley, oat and wheat. J Food Process Preserv. 2015;39(6):3145–53. [Google Scholar]
- 8.Whitehead A, Beck EJ, Tosh S, Wolever TM. Cholesterol-lowering effects of oat β-glucan: a meta-analysis of randomized controlled trials. Am J Clin Nutr. 2014;100(6):1413–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Estrada A, Yun CH, Kessel AV, Li B, Hauta S, Laarveld B. Immunomodulatory activities of oat β-glucan in vitro and in vivo. Microbiol Immunol. 1997;41(12):991–8. [DOI] [PubMed] [Google Scholar]
- 10.Tiwari U, Cummins E. Factors influencing β-glucan levels and molecular weight in cereal‐based products. Cereal Chem. 2009;86(3):290–301. [Google Scholar]
- 11.Meydani M. Potential health benefits of avenanthramides of oats. Nutr Rev. 2009;67(12):731–5. [DOI] [PubMed] [Google Scholar]
- 12.Food US, Drug Administration. Food labeling: health claims; oats and coronary heart disease. Final rule. Fed Regist. 1997;62:3583–601. [Google Scholar]
- 13.Paudel D, Dhungana B, Caffe M, Krishnan PA. Rev Health-Beneficial Prop Oats Foods. 2021;10(11):2591. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Welch RW, Leggett JM, Lloyd JD. Variation in the kernel (1 → 3) (1 → 4)-β-D-Glucan content of oat cultivars and wild Avena species and its relationship to other characteristics. J Cereal Sci. 1991;13(2):173–8. [Google Scholar]
- 15.Welch RW, Lloyd JD. Kernel (1 → 3) (1 → 4)-β-d-glucan content of oat genotypes. J Cereal Sci. 1989;9(1):35–40. [Google Scholar]
- 16.Peterson DM. Genotype and environment effects on oat beta-glucan concentration. Crop Sci. 1991;31(6):1517–20. [Google Scholar]
- 17.Brunner BR, Freed RD. Oat grain β-glucan content as affected by nitrogen level, location, and year. Crop Sci. 1994;34(2):473–6. [Google Scholar]
- 18.Humphreys DG, Smith DL, Mather DE. Nitrogen fertilizer and seeding date induced changes in protein, oil and β-glucan contents of four oat cultivars. J Cereal Sci. 1994;20(3):283–90. [Google Scholar]
- 19.Jackson GD, Berg RK, Kushnak GD, Blake TK, Yarrow GI. Nitrogen effects on yield, beta-glucan content, and other quality factors of oat and waxy hulless barley. Commun Soil Sci Plant Anal. 1994;25(17–18):3047–55. [Google Scholar]
- 20.Peterson DM, Wesenberg DM, Burrup DE. β-glucan content and its relationship to agronomic characteristics in elite oat germplasm. Crop Sci. 1995;35(4):965–70. [Google Scholar]
- 21.Ajithkumar A, Andersson R, Åman P. Content and molecular weight of extractable β-glucan in American and Swedish oat samples. J Agric Food Chem. 2005;53(4):1205–9. [DOI] [PubMed] [Google Scholar]
- 22.Doehlert DC, McMullen MS, Hammond JJ. Genotypic and environmental effects on grain yield and quality of oat grown in North Dakota. Crop Sci. 2001;41(4):1066–72. [Google Scholar]
- 23.Zhou MX, Glennie-Holmes M, Robards K, Roberts GL, Helliwell S. Effects of sowing date, nitrogen application, and sowing rate on oat quality. Aust J Agric Res. 1998;49(5):845–52. [Google Scholar]
- 24.Peterson DM. Composition and nutritional characteristics of oat grain and products. Oat Sci Technol. 1992;33:265–92. [Google Scholar]
- 25.Holthaus JF, Holland JB, White PJ, Frey KJ. Inheritance of β-glucan content of oat grain. Crop Sci. 1996;36(3):567–73. [Google Scholar]
- 26.Humphreys DG, Mather DE. Heritability of β-glucan, groat percentage, and crown rust resistance in two oat crosses. Euphytica. 1996;91(3):359–64. [Google Scholar]
- 27.De Koeyer DL, Tinker NA, Wight CP, Deyl J, Burrows VD, O’Donoughue LS, et al. A molecular linkage map with associated QTLs from a hulless × covered spring oat population. Theor Appl Genet. 2004;108(7):1285–98. [DOI] [PubMed] [Google Scholar]
- 28.Herrmann MH, Yu J, Beuch S, Weber WE. Quantitative trait loci for quality and agronomic traits in two advanced backcross populations in oat (Avena sativa L). Plant Breed. 2014;133(5):588–601. [Google Scholar]
- 29.Asoro FG, Newell MA, Scott MP, Beavis WD, Jannink JL. Genome-wide association study for beta-glucan concentration in elite north American oat. Crop Sci. 2013;53(2):542–53. [Google Scholar]
- 30.Cervantes-Martinez CT, Frey KJ, White PJ, Wesenberg DM, Holland JB. Selection for greater β-glucan content in oat grain. Crop Sci. 2001;41(4):1085–91. [Google Scholar]
- 31.Saastamoinen M. Effects of environmental factors on the β-glucan content of two oat varieties. Acta Agric Scand Sect B — Soil Plant Sci. 1995;45(3):181–7. [Google Scholar]
- 32.Güler M. Nitrogen and irrigation effects on β-glucan content of wheat grain. Acta Agric Scand Sect B — Soil Plant Sci. 2003;53(3):156–60. [Google Scholar]
- 33.Saastamoinen M, Plaami S, Kumpulainen J. Genetic and environmental variation in β-glucan content of oats cultivated or tested in Finland. J Cereal Sci. 1992;16(3):279–90. [Google Scholar]
- 34.Baur SK, Geisler G. β-Glucan content in caryopses of oat varieties with regard to cultivation year and nitrogen level. J Agron Crop Sci. 1996;176:5–14. [Google Scholar]
- 35.Paudel D, Caffe-Treml M, Krishnan P. A single analytical platform for the rapid and simultaneous measurement of protein, oil, and β-glucan contents of oats using near-infrared reflectance spectroscopy. Cereal Foods World. 2018;63(1):17–25. [Google Scholar]
- 36.Kianian SF, Phillips RL, Rines HW, Fulcher RG, Webster FH, Stuthman DD. Quantitative trait loci influencing β-glucan content in oat (Avena sativa, 2n = 6x = 42. Theor Appl Genet. 2000;101(7):1039–48. [Google Scholar]
- 37.O’Donoughue LS, Sorrells ME, Tanksley SD, Autrique E, Deynze AV, Kianian SF, et al. A molecular linkage map of cultivated oat. Genome. 1995;38(2):368–80. [DOI] [PubMed] [Google Scholar]
- 38.Newell MA, Asoro FG, Scott MP, White PJ, Beavis WD, Jannink JL. Genome-wide association study for oat (Avena sativa L.) beta-glucan concentration using germplasm of worldwide origin. Theor Appl Genet. 2012;125(8):1687–96. [DOI] [PubMed] [Google Scholar]
- 39.Fogarty MC, Smith SM, Sheridan JL, Hu G, Islamovic E, Reid R, et al. Identification of mixed linkage β-glucan quantitative trait loci and evaluation of AsCslF6 homoeologs in hexaploid oat. Crop Sci. 2020;60(2):914–33. [Google Scholar]
- 40.Zimmer CM, McNish IG, Klos KE, Oro T, Arruda KMA, Gutkoski LC, et al. Genome-wide association for β-glucan content, population structure, and linkage disequilibrium in elite oat germplasm adapted to subtropical environments. Mol Breed. 2020;40(11):103. [Google Scholar]
- 41.Tinker NA, Wight CP, Bekele WA, Yan W, Jellen EN, Renhuldt NT, et al. Genome analysis in Avena sativa reveals hidden breeding barriers and opportunities for oat improvement. Commun Biol. 2022;5(1):1–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Chaffin AS, Huang Y, Smith S, Bekele WA, Babiker E, Gnanesh BN, et al. A Consensus map in cultivated hexaploid oat reveals conserved grass synteny with substantial subgenome rearrangement. Plant Genome. 2016;9(2):0102. 10.3835/plantgenome2015.10.0102. [DOI] [PubMed] [Google Scholar]
- 43.Avena. sativa - OT3098 v2, PepsiCo. https://wheat.pw.usda.gov/jb?data=/ggds/oat-ot3098v2-pepsico
- 44.Wight CP, Blake VC, Jellen EN, Yao E, Sen TZ, Tinker NA. One hundred years of comparative genetic and physical mapping in cultivated oat (Avena sativa). Crop Pasture Sci. 2024;75(2):CP23246. [Google Scholar]
- 45.Asoro FG, Newell MA, Beavis WD, Scott MP, Tinker NA, Jannink JL. Genomic, marker-assisted, and pedigree-BLUP selection methods for β-glucan concentration in Elite Oat. Crop Sci. 2013;53:1894–906. [Google Scholar]
- 46.Goddard M. Genomic selection: prediction of accuracy and maximisation of long term response. Genetica. 2009;136(2):245–57. [DOI] [PubMed] [Google Scholar]
- 47.Kim GW, Hong JP, Lee HY, Kwon JK, Kim DA, Kang BC. Genomic selection with fixed-effect markers improves the prediction accuracy for Capsaicinoid contents in Capsicum annuum. Hortic Res. 2022;9:uhac204. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Meuwissen THE, Hayes BJ, Goddard ME. Prediction of total genetic value using genome-wide dense marker maps. Genetics. 2001;157(4):1819–29. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Sarinelli JM, Murphy JP, Tyagi P, Holland JB, Johnson JW, Mergoum M, et al. Training population selection and use of fixed effects to optimize genomic predictions in a historical USA winter wheat panel. Theor Appl Genet. 2019;132(4):1247–61. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Spindel JE, Begum H, Akdemir D, Collard B, Redoña E, Jannink JL, et al. Genome-wide prediction models that incorporate de novo GWAS are a powerful new tool for tropical rice improvement. Heredity. 2016;116(4):395–408. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Burgueño J, Crossa J, Cotes JM, Vicente FS, Das B. Prediction assessment of linear mixed models for multienvironment trials. Crop Sci. 2011;51(3):944–54. [Google Scholar]
- 52.Bates D, Mächler M, Bolker B, Walker S. Fitting linear mixed-effects models using lme4. J Stat Softw. 2015;67:1–48. [Google Scholar]
- 53.R: The R Project for Statistical Computing. https://www.r-project.org/
- 54.Carlson CH, Fiedler JD, Naraghi SM, Nazareno ES, Ardayfio NK, McMullen MS, et al. Archetypes of inflorescence: genome-wide association networks of panicle morphometric, growth, and disease variables in a multiparent oat population. Genetics. 2023;223(2):iyac128. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Bushnell B. BBTools. 38.79 edn. https://sourceforge.net/projects/bbmap/
- 56.Poland JA, Brown PJ, Sorrells ME, Jannink JL. Development of high-density genetic maps for barley and wheat using a novel two-enzyme genotyping-by-sequencing approach. PLoS ONE. 2012;7(2):e32253. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Danecek P, Bonfield JK, Liddle J, Marshall J, Ohan V, Pollard MO, et al. Twelve years of SAMtools and BCFtools. GigaScience. 2021;10(2):giab008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Li H. Statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics. 2011;27:2987–93. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Bradbury PJ, Zhang Z, Kroon DE, Casstevens TM, Ramdoss Y, Buckler ES. TASSEL: software for association mapping of complex traits in diverse samples. Bioinformatics. 2007;23(19):2633–5. [DOI] [PubMed] [Google Scholar]
- 60.Hill WG, Weir BS. Variances and covariances of squared linkage disequilibria in finite populations. Theor Popul Biol. 1988;33(1):54–78. [DOI] [PubMed] [Google Scholar]
- 61.Endelman JB. Ridge regression and other kernels for genomic selection with R package rrBLUP. Plant Genome. 2011;4(3):205–55. [Google Scholar]
- 62.Wang J, Zhang Z. GAPIT Version 3: boosting power and accuracy for genomic association and prediction. Genomics Proteom Bioinf. 2021;19(4):629–40. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Liu X, Huang M, Fan B, Buckler ES, Zhang Z. Iterative usage of fixed and random effect models for powerful and efficient genome-wide association studies. PLOS Genet. 2016;12(2):e1005767. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Huang M, Liu X, Zhou Y, Summers RM, Zhang Z. BLINK: a package for the next level of genome-wide association studies with both individuals and markers in the millions. GigaScience. 2019;8(2):giy154. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Kamal N, Tsardakas Renhuldt N, Bentzer J, Gundlach H, Haberer G, Juhász A, et al. The mosaic oat genome gives insights into a uniquely healthy cereal crop. Nature. 2022;606(7912):113–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Tanhuanpää P, Manninen O, Kiviharju E. QTLs for important breeding characteristics in the doubled haploid oat progeny. Genome. 2010;53(6):482–93. [DOI] [PubMed] [Google Scholar]
- 67.Tanhuanpää P, Manninen O, Beattie A, Eckstein P, Scoles G, Rossnagel B, et al. An updated doubled haploid oat linkage map and QTL mapping of agronomic and grain quality traits from Canadian field trials. Genome. 2012;55(4):289–301. [DOI] [PubMed] [Google Scholar]
- 68.Alqudah AM, Sallam A, Stephen Baenziger P, Börner A, GWAS. Fast-forwarding gene identification and characterization in temperate cereals: lessons from barley – a review. J Adv Res. 2020;22:119–35. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Korte A, Farlow A. The advantages and limitations of trait analysis with GWAS: a review. Plant Methods. 2013;9(1):29. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Hamblin MT, Close TJ, Bhat PR, Chao S, Kling JG, Abraham KJ, et al. Population structure and linkage disequilibrium in U.S. barley germplasm: implications for association mapping. Crop Sci. 2010;50(2):556–66. [Google Scholar]
- 71.Wang L, Xu J, Wang H, Chen T, You E, Bian H, et al. Population structure analysis and genome-wide association study of a hexaploid oat landrace and cultivar collection. Front Plant Sci. 2023;14:1131751. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Newell MA, Cook D, Tinker NA, Jannink JL. Population structure and linkage disequilibrium in oat (Avena sativa L.): implications for genome-wide association studies. Theor Appl Genet. 2011;122(3):623–32. [DOI] [PubMed] [Google Scholar]
- 73.Esvelt Klos K, Huang YF, Bekele WA, Obert DE, Babiker E, Beattie AD, et al. Population genomics related to adaptation in elite oat germplasm. Plant Genome. 2016;9(2):0103. 10.3835/plantgenome2015.10.0103. [DOI] [PubMed] [Google Scholar]
- 74.Sunstrum FG, Bekele WA, Wight CP, Yan W, Chen Y, Tinker NA. A genetic linkage map in southern-by-spring oat identifies multiple quantitative trait loci for adaptation and rust resistance. Plant Breed. 2019;138(1):82–94. [Google Scholar]
- 75.Peng Y, Yan H, Guo L, Deng C, Wang C, Wang Y, et al. Reference genome assemblies reveal the origin and evolution of allohexaploid oat. Nat Genet. 2022;54(8):1248–58. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Flint-Garcia SA, Thornsberry JM, Buckler ES. Structure of linkage disequilibrium in plants. Annu Rev Plant Biol. 2003;54(1):357–74. [DOI] [PubMed] [Google Scholar]
- 77.Gill HS, Halder J, Zhang J, Rana A, Kleinjan J, Amand, PSt, et al. Whole-genome analysis of hard winter wheat germplasm identifies genomic regions associated with spike and kernel traits. Theor Appl Genet. 2022;135(9):2953–67. [DOI] [PubMed] [Google Scholar]
- 78.Liu L, Wang M, Zhang Z, See DR, Chen X. Identification of stripe rust resistance loci in U.S. spring wheat cultivars and breeding lines using genome-wide association mapping and yr gene markers. Plant Dis. 2020;104(8):2181–92. [DOI] [PubMed] [Google Scholar]
- 79.Juliana P, Singh RP, Poland J, Shrestha S, Huerta-Espino J, Govindan V, et al. Elucidating the genetics of grain yield and stress-resilience in bread wheat using a large-scale genome-wide association mapping study with 55,568 lines. Sci Rep. 2021;11(1):5254. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Hazen SP, Scott-Craig JS, Walton JD. Cellulose Synthase-Like genes of Rice. Plant Physiol. 2002;128(2):336–40. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Burton RA, Collins HM, Kibble NAJ, Smith JA, Shirley NJ, Jobling SA, et al. Over-expression of specific HvCslF cellulose synthase-like genes in transgenic barley increases the levels of cell wall (1,3;1,4)-β-d-glucans and alters their fine structure. Plant Biotechnol J. 2011;9(2):117–35. [DOI] [PubMed] [Google Scholar]
- 82.Nemeth C, Freeman J, Jones HD, Sparks C, Pellny TK, Wilkinson MD, et al. Down-regulation of the CSLF6 gene results in decreased (1,3;1,4)- β- d -glucan in endosperm of wheat. Plant Physiol. 2010;152(3):1209–18. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Buckeridge MS, Vergara CE, Carpita NC. The mechanism of synthesis of a mixed-linkage (1→3),(1→4)β-d-glucan in maize. Evidence for multiple sites of glucosyl transfer in the synthase complex1. Plant Physiol. 1999;120(4):1105–16. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84.Burton RA, Wilson SM, Hrmova M, Harvey AJ, Shirley NJ, Medhurst A, et al. Cellulose synthase-like CslF genes mediate the synthesis of cell wall (1,3;1,4)-β-d-glucans. Sci. 2006;311(5769):1940–2. [DOI] [PubMed] [Google Scholar]
- 85.Tonooka T, Aoki E, Yoshioka T, Taketa S. A novel mutant gene for (1–3, 1–4)-β-D-glucanless grain on barley (Hordeum vulgare L.) chromosome 7H. Breed Sci. 2009;59(1):47–54. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The genotyping data is available on T3/Oat (triticeaetoolbox.org) under the genotyping projects named SDSU_2015_GBS, SDSU_2016_GBS, SDSU_2017_GBS, SDSU_2020_GBS, SDSU_2021_GBS, SDSU_2022_GBS, and 3K_SDSU_PYT2015_2022. Small seed samples from the breeding lines used in this study are available for research purposes upon request (melanie.caffe@sdstate.edu).




