Genomic regions underlying agronomic traits in linseed (Linum usitatissimum L.) as revealed by association mapping

Braulio J Soto-Cerda; Scott Duguid; Helen Booker; Gordon Rowland; Axel Diederichsen; Sylvie Cloutier

doi:10.1111/jipb.12118

. 2014 Jan 15;56(1):75–87. doi: 10.1111/jipb.12118

Genomic regions underlying agronomic traits in linseed (Linum usitatissimum L.) as revealed by association mapping^‡

Braulio J Soto-Cerda ^1,^2,^†, Scott Duguid ³, Helen Booker ⁴, Gordon Rowland ⁴, Axel Diederichsen ⁵, Sylvie Cloutier ^1,^2,^*

PMCID: PMC4253320 PMID: 24138336

Abstract

The extreme climate of the Canadian Prairies poses a major challenge to improve yield. Although it is possible to breed for yield per se, focusing on yield-related traits could be advantageous because of their simpler genetic architecture. The Canadian flax core collection of 390 accessions was genotyped with 464 simple sequence repeat markers, and phenotypic data for nine agronomic traits including yield, bolls per area, 1,000 seed weight, seeds per boll, start of flowering, end of flowering, plant height, plant branching, and lodging collected from up to eight environments was used for association mapping. Based on a mixed model (principal component analysis (PCA) + kinship matrix (K)), 12 significant marker-trait associations for six agronomic traits were identified. Most of the associations were stable across environments as revealed by multivariate analyses. Statistical simulation for five markers associated with 1000 seed weight indicated that the favorable alleles have additive effects. None of the modern cultivars carried the five favorable alleles and the maximum number of four observed in any accessions was mostly in breeding lines. Our results confirmed the complex genetic architecture of yield-related traits and the inherent difficulties associated with their identification while illustrating the potential for improvement through marker-assisted selection.

Keywords: Linum usitatissimum, marker-assisted selection, quantitative trait loci mapping, yield-related traits, Favorable alleles

Introduction

Linseed (Linum usitatissimum L.) is important for the oil and nutraceutical industries (Green et al. 2008). Its oil, characterized by a high concentration of omega-3 alpha linolenic acid (∼55%), is widely recognized for its health benefits (Simopoulos 2000). A unique feature of linseed resides in the prospect of also commercializing its stems because they produce good quality fibers that have many end-uses (Czemplik et al. 2011) including paper, technical fiber, and biofuels (Diederichsen and Ulrich 2009; Cullis 2011). In 2011, the total world production of linseed reached approximately 1.6 million tons, with Canada (∼23%), China (∼21%), and the Russian Federation (∼14%) being the main producers (FAOSTAT 2013). Although Canada is the world's largest linseed producer and exporter (FAOSTAT 2013), linseed remains a minor crop, in part because its yield has been stagnating over the last decade, averaging 1.2 T/Ha compared to other oilseeds such as canola (rapeseed) that now reach 1.9 T/Ha (Statistics Canada; http://www.statcan.gc.ca).

Conventional breeding methods have been the cornerstone for linseed genetic improvement releasing new cultivars with durable resistance to diseases, agronomic fitness, and greater yield stability (Green et al. 2008). However, the narrow genetic base used for the development of Canadian linseed cultivars (Fu et al. 2002, 2003; Cloutier et al. 2009), the scarce availability of related species to incorporate new variation, the lack of hybrid production systems (Green et al. 2008), and the limited genomic tools for molecular breeding (Cloutier et al. 2011, 2012a) have hampered yield and quality improvements, limiting linseed competitiveness.

Yield is the most important and complex trait in crops that shows correlations with other traits (Li et al. 2011). In linseed, yield and its components such as 1,000 seed weight (TSW), seeds per boll (SPB), and bolls per area (BPA), are quantitatively inherited and controlled by many genes affected by multiple interactions with other genes and the environment (Shi et al. 2009; Parry and Hawkesford 2012; Cadic et al. 2013). An understanding of the genetic basis of yield-related traits is of practical value to breeders because such information assists in the design of efficient breeding strategies. This approach, focused on yield-related traits, has been embraced in oilseeds such as Brassica napus (Shi et al. 2009), soybean (Panthee et al. 2007; Liu et al. 2011), and maize (Huang et al. 2010; Peng et al. 2011) focusing on the improvement and inheritance of yield-related traits for achieving greater yield. Other important agronomic traits such as flowering time (FL), plant height (PH), plant branching (PB), and lodging resistance (LDG) may also indirectly affect yield through various physiological mechanisms (Huang et al. 2010; Li et al. 2011), allowing crop phenology and plant architecture to be adapted to regional growing conditions, thus avoiding yield and quality losses (Duguid 2009). The estimation of the positions of quantitative trait loci (QTL) with consistent effects across environments for yield and its components and other agronomic traits is of central importance for marker-assisted selection (MAS) and, ultimately, for enhancing linseed competitiveness.

In oilseed breeding, most of the QTL contributing to yield and other agronomic traits have been identified through classical linkage mapping (Panthee et al. 2007; Shi et al. 2009; Huang et al. 2010; Liu et al. 2011; Peng et al. 2011). Despite the proven usefulness of this technique to identify QTL involved in complex traits, the limited genetic diversity and recombination events accumulated in biparental populations impede the simultaneous identification of favorable alleles available to breeding programs and the precision of the location of QTL, thus weakening MAS applications (Würschum 2012). Often presented as an alternative approach, association mapping (AM) makes use of all recombination events that have occurred during the history of a germplasm collection representing a broader genetic diversity and, consequently, leading to a higher mapping resolution and the simultaneous survey of a larger number of alleles (Flint-Garcia et al. 2003; Würschum 2012). In the last decade, AM has been successfully applied to crops (reviewed in Gupta et al. 2005; Soto-Cerda and Cloutier 2012), showing that faster breeding progresses can be achieved (Myles et al. 2009; Cadic et al. 2013; Huang et al. 2013).

In 2009, the Total Utilization Flax GENomics (TUFGEN; http://www.tufgen.ca) project was initiated in Canada, generating a wealth of genomic resources with one of the main goals being applications to flax breeding (Cloutier et al. 2009, 2011, 2012a, 2012b; Ragupathy et al. 2011; Venglat et al. 2011; Kumar et al. 2012; Wang et al. 2012a). The comprehensive characterization of the Canadian flax world collection preserved by Plant Gene Resources Canada permitted the assembly of the Canadian flax core collection of 390 accessions representing the diversity from 76 countries (Diederichsen et al. 2013). This valuable genetic resource ensures a cost-effective access to the diversity harbored in the whole collection of approximately 3,500 accessions (Diederichsen et al. 2013). Further molecular characterization of the Canadian flax core collection revealed its abundant genetic diversity, weak population, and family structure, and quantified its relatively fast genome-wide linkage disequilibrium (LD) decay, all positive attributes for AM studies (Soto-Cerda et al. 2013). In the present study, we carried out AM for yield, TSW, SPB, BPA, start of flowering (FL 5%), end of flowering (FL 95%), PH, PB, and LDG on the Canadian flax core collection assessed in Western Canada over 4 years. The objective of this research was to identify QTL contributing to these agronomic traits that could be capitalized upon to assist in breeding superior linseed cultivars with improved yield and consequently market competitiveness.

Results

Agronomic traits

All agronomic traits showed significant genotype (G), location (L), and year (Y) effects (P < 0.001; Table S1). Most of the genotype-by-environment (GE) interactions (G × L, G × Y, L × Y, and G × L × Y) were significant, except for yield where only L × Y was significant. The overall means, ranges, H, and coefficient of variations are summarized in Table1. In MB, H ranged 0.15–0.83, while in SK, it ranged 0.37–0.78, indicating that the repeatability was highly variable among the agronomic traits at both locations. Among the 36 possible correlations, 25 were significant at P < 0.01 (Table2). Yield and its components were positively correlated with one another but they were negatively correlated with the phenological traits FL 5% and FL 95%, the morphological traits PH and PB, and the LDG agronomic trait.

Table 1.

Number of environments, descriptive statistics, and broad sense heritability (H) for the nine agronomic traits assessed in the Canadian flax core collection

Trait	Environments	Mean	Range	C.V. (%)	H (MB)	H (SK)
Yield (K/ha)	6	1312.10	565.2–2468.8	36.2	0.59	0.59
Bolls per area (bolls/m²)	6	4134.80	1653.6–6482.8	22.8	0.41	0.49
1,000 seed weight (g)	6	5.10	2.7–8.4	3.9	0.75	0.76
Seeds/boll	6	6.20	3.5–8.1	11.5	0.63	0.63
Flowering 5% (d)	7	45.10	40.0–61.9	3.3	0.83	0.47
Flowering 95% (d)	7	51.20	45.9–71.4	3.3	0.80	0.49
Plant height (cm)	6	51.30	28–92.9	11.8	0.63	0.76
Plant branching	4	3.40	1.7–5.3	23.1	0.15	0.78
Lodging	8	1.34	1.0–3.3	19.1	0.20	0.37

Open in a new tab

Table 2.

Pearson correlation coefficients amongst the nine agronomic traits in the Canadian flax core collection

Trait	Yield	BPA	TSW	SPB	FL 5%	FL 95%	PH	PB	LDG
Yield	—
BPA	0.528^**	—
TSW	0.173^**	−0.285^**	—
SPB	0.541^**	0.272^**	−0.123^*	—
FL 5%	−0.111^*	0.029	−0.361^**	−0.323^**	—
FL 95%	−0.108^*	0.036	−0.352^**	−0.347^**	0.964^**	—
PH	−0.140^**	−0.046	−0.361^**	0.026	0.506^**	0.497^**	—
PB	−0.073	0.007	−0.265^**	−0.049	0.429^**	0.416^**	0.633^**	—
LDG	−0.134^**	−0.005	0.094	−0.354^**	0.005	0.007	−0.261^**	−0.238^**	—

Open in a new tab

P < 0.01 and

^**

P < 0.001.

Association between population structure and agronomic traits

Due to different population sizes (G1 = 153; G3 = 211) and unequal variances within the two major groups for the agronomic traits, the Kruskal–Wallis test was applied as suggested by Lin et al. (2008). Only PH showed significant differences (P = 0.03) with G1 accessions being 3 cm taller than G3 accessions (Figure S1).

Of the 92 fiber flax accessions of the core collection, 48 (36% of G1) clustered within G1 while 23 (12.8% of G3) belonged to G3, suggesting that although the coefficient of population differentiation (F_ST) was weak (0.09), the fiber morphotype could be the main factor responsible for the population structure of the flax core collection. We investigated the pattern of population structure within G1 and G3 separately and showed that both major groups were organized in two subpopulations (Q ≥ 0.7) and one admixed subpopulation (Q < 0.7) (Figure S2). Within G1, the two subpopulations largely corresponded to the oil and fiber morphotypes, with 91% of the fiber accessions initially clustering within G1 (Figure S2). Within G3, however, the two subpopulation clusters reflected their geographic distribution with no clear sub-clustering of the 23 fiber accessions (Figure S2). Thus, flax morphotype and geographic distribution constituted the main factors responsible for the population structure patterns observed in the Canadian flax core collection, with the Q matrix and the first three principal component analyses (PCAs) explaining 11.3% and 39% of PH variation, respectively.

AM analysis in the core collection and subgroups

As depicted by the cumulative probability–probability (P–P) plots generated using the 390 accessions (Figure 1), numerous spurious associations for all traits were observed with Q general linear model (GLM). This model was characterized by an excess of small P-values causing spurious associations. On the other hand, the PCA GLM overcorrected the majority of the small P-values with few higher P-values departing at the very end of the expected distribution. The mixed linear models (MLMs) K and Q + K performed similarly for the nine agronomic traits with their observed P-values deviating the most from the expected ones for TSW, SPB, PH, PB, and LDG, indicating that inclusion of the Q matrix brought little or no improvement to the AM model. Nevertheless, they displayed a better distribution of P-values for BPA and FL 95% (Figure 1). The PCA + K MLM had the smallest deviation from the expected distribution for all agronomic traits. The three first PCAs in combination with the K matrix were sufficient to control the majority of the potential false-positive associations created by population and family structure. Therefore, the PCA + K model was selected to conduct AM for the nine agronomic traits in the core collection.

Probability-probability (P-P) plots of observed versus expected −log₁₀ (P) values for nine agronomic traits evaluated with five association mapping modelsQ general linear model using the Q matrix, PCA general linear model using the principal component analysis matrix, K mixed linear model using the kinship matrix, Q + K mixed linear model using the Q and K matrices, PCA + K mixed linear model using the PCA and K matrices.

Mixed linear models may overcompensate when traits are correlated with population structure, leading to false negatives (Zhao et al. 2011). Because up to 39% of the variation for PH was explained by population structure, we conducted AM for this trait within G1 and G3 separately. The P–P plot of G1 showed an improvement for the K and Q + K models, with the latter performing as well as the PCA + K (Figure S2). On the other hand, the P–P plot of G3 exhibited a better performance for the Q + K model only, the PCA + K being the most suitable. Thus, AM model comparisons indicated that conducting subpopulation-independent AM analyses partially alleviated the effect of population structure within G1 but did not correct it for G3, making it necessary to consider population structure as a fixed covariate. Hence, AM analyses for PH were conducted using the Q + K and PCA + K models.

Marker-trait associations

After removing alleles with a minor allele frequency (MAF) of less than 0.05, 37 simple sequence repeat (SSR) markers became monomorphic, leaving 427 polymorphic loci for the AM analyses. Using the PCA + K model, a total of 12 significant marker-trait associations (estimated false discovery rate (qFDR) < 0.01) were identified as significant in at least half of the environments tested. They corresponded to 10 different markers distributed across six linkage groups (LGs). The majority of these associations remained significant even after Bonferroni correction (0.05/427 = 1.17E − 4) (Table3). Numerous other significant associations were detected but they were not consistent in at least half of the environments. This was the case for yield, SPB, and BPA, although six markers were associated with these traits in one or more of the environments.

Table 3.

Marker loci significantly associated with 1,000 seed weight (TSW), start of flowering (FL5%), end of flowering (FL95%), plant height (PH), plant branching (PB) and lodging (LDG), and their explained phenotypic variance (R²)

Trait	Marker	LG (cM)¹	MB09 (P-value)	MB10 (P-value)	MB11 (P-value)	MB12 (P-value)	SK09 (P-value)	SK10 (P-value)	SK11 (P-value)	SK12 (P-value)	R² (%)
TSW	Lu2164	3 (76.5)	N.E.	n.s.	n.s.	1.61E − 4	N.E.	7.50E − 5	1.10E − 8	1.10E − 4	0.50
	Lu2555	6 (72.0)	N.E.	n.s.	n.s.	1.78E − 4	N.E.	7.10E − 4	1.24E − 4	6.51E − 4	0.72
	Lu2532	7 (2.7)	N.E.	n.s.	n.s.	1.53E − 5	N.E.	9.60E − 5	2.36E − 6	7.90E − 5	8.0
	Lu58a	7 (104.3)	N.E.	n.s.	n.s.	3.92E − 4	N.E.	n.s.	2.38E − 6	1.90E − 4	5.5
	Lu526	9 (32.6)	N.E.	4.20E − 5	n.s.	6.81E − 6	N.E.	2.27E − 4	1.10E − 4	n.s.	15.2
FL 5%	Lu943	1 (149.9)	n.s.	4.42E − 7	7.88E − 5	n.s.	N.E.	n.s.	4.34E − 5	7.35E − 7	7.1
FL 95%	Lu943	1 (149.9)	n.s.	2.60E − 5	8.94E − 5	n.s.	N.E.	n.s.	8.74E − 5	4.90E − 6	7.6
PH	Lu943	1 (149.9)	N.E.	N.E.	1.31E − 4	n.s.	1.01E − 4	n.s.	n.s.	2.31E − 4	4.6
	Lu316	Unknown	N.E.	N.E.	1.15E − 5	9.23E − 5	n.s.	n.s.	n.s.	1.62E − 5	18.5
PB	Lu2067a	2 (59.7)	n.s.	N.E.	n.s.	N.E.	N.E.	9.08E − 5	3.35E − 5	N.E.	12.9
LDG	Lu2560	6 (63.4)	n.s.	4.95E − 4	n.s.	N.V.	N.V.	5.73E − 5	1.38E − 18	n.s.	8.9
	Lu2564	6 (64.1)	1.53E − 4	8.74E − 4	9.05E − 11	N.V.	N.V.	n.s.	1.20E − 4	n.s.	7.1

Open in a new tab

Linkage group and, in bracket, loci position in centiMorgan according to Cloutier et al. (2012b). N.E., trait not evaluated; N.V., trait not phenotypically variable; n.s. non-significant. Values in bold script are significant at qFDR < 0.01 and after Bonferroni correction (0.05/427 = 1.17E − 4); those in normal script are significant at qFDR < 0.01.

A total of five significant markers were associated with TSW, together explaining approximately 30% of the phenotypic variation for the trait. Marker Lu943 was associated with FL5%, FL 95%, and PH, in agreement with their positive and significant correlations (Table2). LG6 markers Lu2560 and Lu2564 located 0.7 cM apart formed a candidate QTL for LDG. For PH AM analyses, no additional associations were identified. However, for G1, marker Lu2067a associated with PB, which was correlated with PH (r = 0.633) and showed associations in two of the six environments evaluated.

Allelic effects of significant markers

Some of the alleles significantly improved TSW. For example, the 289 bp allele of Lu526 significantly increased TSW by an average of 1.02 g (P = 8.5E − 13) across the six environments tested (Figure 2A). For Lu2532, the 270 bp allele had the largest effect, increasing TSW by 1.91 g (P = 1.7E − 6) over the 280 bp allele and 1.3 g (P = 0.003) over the 282 bp allele (Figure 2B). The 271 bp allele of Lu943 significantly shortened FL 5% by 2.13 d (P = 1.64E − 9) compared to the other two alleles (Figure 2C). These allelic differences carried through to FL 95% (Table4). A reduction of up to 23.7 cm (P = 2.2E − 13) in PH was associated with the 241 bp allele of Lu316 compared with the 223 bp allele (Figure 2D). However, this large allelic effect can be inflated by the higher PH of the fiber accessions, where the 223 bp allele was present in 33% of the fiber morphotype and only 6% of the linseed morphotype while the 241 bp allele was present in 31% of the linseed morphotype but only 7% of the fiber morphotype. The 205 bp allele of marker Lu2067a, increased PB up to 0.76 units compared with the 211 bp allele (P = 2.03E − 8) (Figure 2E). The null allele of Lu2560 decreased LDG by 0.34 units (P = 3.14E–6) (Figure 2F).

Comparisons of allelic effects of six associated markers with agronomic traits in linseed

(A) Lu526 and (B) Lu2532 associated with 1 000 seed weight. (C) Lu943 associated with start of flowering.(D) Lu316 associated with plant height.(E) Lu2067a associated with plant branching. (F) Lu2560 associated with lodging.Box plots followed by the same letter do not differ statistically according to the Kruskal–Wallis test (α = 0.01).

Table 4.

Favorable alleles at the ten SSR loci associated with agronomic traits, their frequencies, phenotypic effects, and stability

Trait	Marker	Favorable allele (bp)	Frequency (%)	Effect^a	K–W test^b	IPCA1^c	ASV^d
TSW	Lu2164	377	44.9	0.68 g	1.9E − 3^*	0.907	3.222
	Lu2555	202	47.9	0.85 g	2.1E − 12^*	−0.411	1.446
	Lu2532	270	8.0	1.91 g	5.6E − 7^*	−0.729	1.537
	Lu58a	209	72.5	0.72 g	3.1E − 3^*	0.209	1.441
	Lu526	289	15.8	1.02 g	8.4E − 13^*	0.023	1.178
FL 5%	Lu943	271	60.8	−2.13 d	5.5E − 5^*	−0.215	0.215
FL 95%	Lu943	271	60.8	−2.15 d	1.2E − 9^*	−0.181	0.181
PH	Lu943	271	60.8	−9.25 cm	8.4E − 9^*	2.532	2.532
	Lu316	241	17.3	−23.7 cm	1.6E − 14^*	−2.532	2.532
PB	Lu2067a	205	27.6	−0.76 u	1.5E − 9^*	0.265	0.321
LDG	Lu2560	null	47.5	−0.34 u	4.7E − 8^*	−0.557	0.558
	Lu2564	257	11.7	−0.28 u	6.4E − 4^*	0.557	0.558

Open in a new tab

Effect of favorable alleles represented in grams (g) for TSW, days (d) for FL 5% and FL 95%, centimeters (cm) for PH, and units (u) of the respective scales for PB and LDG.

P-value for Kruskal-Wallis test for the allelic effect between favored alleles and others

^*P < 0.01.

First interaction principal component.

AMMI's stability values.

Marker effect and stability

The additive main effect and multiplicative interaction (AMMI) analysis established that 1/3 of the marker-trait associations were highly stable with first interaction principal component (IPCA1) values close to ± 0.2 and that another third were moderately stable with values ranging from ± 0.25 to ± 0.6 (Table4). The AMMI stability values (ASV) parameter indicated that six marker-trait associations were highly stable with values ranging 0.18–1.17. The QTL main effect and QTL-by-environment interaction (QQE) biplot displays the average environment defined by the average IPCA1 and IPCA2 scores across environments (indicated by an open circle) (Figure 3A). The arrow passing through the biplot origin is called the AEC abscissa and points towards increasing marker/QTL main effect. The AEC ordinate line, perpendicular to the abscissa, indicates stability/instability. Highly unstable markers have longer projections on the AEC abscissa irrespective of their direction. The markers associated with TSW varied in stability. For example, Lu2532 and Lu526 were more stable than Lu2555, Lu2164, and Lu58a (Figure 3A). The intersection of the two axes defines the average marker/QTL main effect, hence, the latter three markers had effects below average; whereas, Lu2532 and Lu526 had the largest main effects on TSW across the six environments in which this TSW was tested (Figure 3A, Table4). Taking into consideration that approximately 300 accessions of the core collection are the linseed type, the favorable alleles of Lu2532 and Lu526, present in 31 and 62 accessions, respectively, clearly demonstrate that they have not been the target of intensive selection by linseed breeders to date.

Linear regression analysis between TSW and the number of favorable alleles of associated markers showed a linear correlation, suggesting additive effects (Figure 3B). No accession had all five favorable alleles but 10 accessions had four of them. Among these, only one US modern cultivar (Maritime, mean TSW = 7.3 g) showed four alleles while the remaining nine were breeding lines including three belonging to the convar. mediterraneum characterized by its large seeds and high TSW (Figure 4). The high yielding and broadly adapted Canadian cultivar CDC Bethune (mean TSW = 5.2 g) possesses only two of the five TSW favorable alleles.

Linseed accessions with different number of favorable alleles associated with 1 000 seed weight

(A) Accessions with zero favorable alleles. (B) Canadian cultivars with two favorable alleles. (C) Accessions with four favorable alleles.Values in brackets are the 1 000 seed weights for each accession. *Indicates the accessions that belong to the convar. mediterraneum.

Discussion

Yield is a complex trait that can be broken down into its components which are in turn affected by other traits involving diverse pathways (Shi et al. 2009). For example, seed number, seed weight, flowering time, plant height, and plant branching have all been identified as affecting yield in rapeseed (Ishimaru 2003; Salamini 2003; Ashikari et al. 2005; Clark et al. 2006; Cockram et al. 2007). Phenotypic correlations and QTL analyses suggest that yield-associated traits tend to be clustered in the genome and have pleiotropic effects (Shi et al. 2009; Li et al. 2011; Liu et al. 2011). Hence, understanding the genetic bases and relationships of yield-associated traits and agronomic traits in linseed through AM can provide the scientific background needed to devise breeding strategies that would permit and/or accelerate yield improvements beyond the 1.2 T/Ha achieved to date.