Summary
Flax has been cultivated for its oil and fiber for thousands of years. However, it remains unclear how the modifications of agronomic traits occurred on the genetic level during flax cultivation. In this study, we conducted genome-wide variation analyses on multiple accessions of oil-use, fiber-use, landraces, and pale flax to identify the genomic variations during flax cultivation. Our findings indicate that, during flax domestication, genes relevant to flowering, dehiscence, oil production, and plant architecture were preferentially selected. Furthermore, regardless of origins, the improvement of the modern oil-use flax preceded that of the fiber-use flax, although the dual selection on oil-use and fiber-use characteristics might have occurred in the early flax domestication. We also found that the expansion of MYB46/MYB83 genes may have contributed to the unique secondary cell wall biosynthesis in flax and the directional selections on MYB46/MYB83 may have shaped the morphological profile of the current oil-use and fiber-use flax.
Subject Areas: Biological Sciences, Evolutionary Biology, Genomics, Plant Evolution, Plant Genetics
Graphical Abstract

Highlights
-
•
Assemblies of genomes, including oil-use flax, fiber-use flax and pale flax
-
•
Comparative genomic analysis between pale flax and cultivated flax
-
•
Dual-selection mode on oil-use and fiber-use characteristics might be existing
-
•
Expansion and selection of MYB46/MYB83 may shape the morphological profile of flax
Biological Sciences; Evolutionary Biology; Genomics; Plant Evolution; Plant Genetics
Introduction
Flax (Linum usitatissimum L.) is one of the earliest domesticated crops, with records spanning more than 8,000 years, and provides a source of oil and fiber for humans (Fu, 2011, van Zeist and Bakker-Heeres, 1975). There are two primary morphotypes of cultivated flax, oil-use flax, and fiber-use flax, which display remarkable differences in morphology and agronomic performance. That is, oil-use flax is shorter, has more branches, and produces larger seeds that contain ∼40% oil, and fiber-use flax is comparatively taller, less branched, and produces fewer seeds. The primitive cultivated flax is deemed to be descended from a wild flax species, pale flax (L. bienne Mill.), which is a winter annual or perennial that possesses narrow leaves, dehiscent capsules, and lodging-prone stems (Zohary and Hopf, 2000, Allaby et al., 2005). Since then, multiple domestication processes gave rise to the cultivated flax, whose traits such as indehiscence, winter hardiness, oil content, and fiber content were improved. Owing to the inconsistent use of genetic markers and sampling strategies, previous flax population analyses often drew inconsistent conclusions regarding which trait-specific group was first established (Fu, 2011, Fu, 2012, Fu et al., 2012). Although molecular evidence suggests that the domestication of modern oil-use flax occurred before that of fiber-use flax, the studies of early flax domestication were probably complicated by the fact that flax was domesticated as an oil-fiber dual-use crop from prehistoric times, as revealed by archaeological records (Helback, 1959, van Zeist and Bakker-Heeres, 1975). Especially, pale flax has a very wide biogeographic range spanning Europe, Africa, and Asia (Helback, 1959, Diederichsen and Hammer, 1995), unlike many relic wild progenitors of crops that were confined to a single geographic location. Therefore, multiple independent domestication events might have occurred in the flax domestication history (Fu, 2012, Fu and Peterson, 2012).
The artificial selections during crop domestications and improvements often substantially reduce genetic variations. For many conventional crops such as rice (Zhang et al., 2014, Stein et al., 2018), soybean (Li et al., 2014, Xie et al., 2019), maize (Yang et al., 2017), cassava (Bredeson et al., 2016), sunflower (Hübner et al., 2019), pepper (Qin et al., 2014), tomato (Bolger et al., 2014, Gao et al., 2019), Brassica (Golicz et al., 2016), and citrus (Wang et al., 2018), both the desirable trait targeted selection in the domesticated crops and the genomic diversity in their wild progenitors have been extensively studied. For example, the selection on TomLoxC promoter is found to affect the tomato flavor during domestication by sequencing 725 representative tomato samples (Gao et al., 2019); the aconitate hydratase (ACO) gene regulating citrate content was under selection during the domestication by analyzing the wild and landrace mandarin (Wang et al., 2018); introgression of the genes related to biotic stress response from wild species to cultivated sunflower (Hübner et al., 2019); and the progenitor Malus sylvestris contributed alleles for fruit quality and production traits to dessert apple cultivars (Duan et al., 2017). However, similar studies for flax are still lacking. In previous studies, a variety of molecular markers were used to investigate the genetic diversity and lineage relationships in cultivated and pale flax (Allaby et al., 2005, Fu et al., 2002a, Fu et al., 2002b, Soto-Cerda et al., 2012, Smykal et al., 2011, Xie et al., 2018). Some selective loci responsible for the agricultural improvement of flax were identified through genetic mapping and genome-wide association studies (Cloutier et al., 2011, Kumar et al., 2015, Xie et al., 2018). However, in these studies, the low coverage of the flax genome potentially clouded the conclusions. For example, by analyzing sad2 locus, Fu et al. (2012) deduced that the increased oil content occurred prior to capsular indehiscence; but if using another set of 49 EST-SSRs, capsular dehiscence was identified as the earliest domesticated trait (Fu, 2011). In addition to the low genome coverage, the lack of pale flax genome sequence prevented the inference of genome-wide variations during the flax cultivation. In this study, we de novo assembled three flax genomes and resequenced 83 cultivated flax accessions. Through this, we sought to identify and understand the genetic variations that resulted from flax domestication and improvement at the global genome level.
Results
De Novo Assembly of Three Flax Genomes
Whole-genome shotgun sequencing was performed on oil-use flax variety “Longya-10,” fiber-use flax variety “Heiya-14,”,and pale flax (Table S1 and Figure S1). A total of 68.2, 73.5, and 49.1 billion high-quality base pairs (133-, 142-, and 93-fold genome coverage, respectively) were assembled into 306.0-, 303.7-, and 293.5-Mb genomes for Longya-10, Heiya-14, and pale flax, with the contig N50/scaffold N50 length of 131Kb/1,235Kb, 156Kb/700Kb, and 59Kb/384 Kb, respectively (Tables S2–S6 and 1). The gap length in Longya-10, Heiya-14, and pale flax genome was 5.8, 2.8, and 5.6, respectively (Table 1). To further improve the assembly quality, we utilized Hi-C technology and genetic map to improve the Longya-10 genome, resulting in 434 scaffolds (295.7 Mb in total) for chromosomal-level assembly (Tables S7 and S8 and Figures S2 and S3). Approximately 43,500 protein-coding genes and ∼2,600–2,800 non-coding RNAs were identified in each genome. In addition, there were 288,633 (∼122.2 Mb), 275,796 (∼115.4 Mb), and 244,460 (∼109.4 Mb) repetitive sequences found in the Longya-10, Heiya-14, and pale flax genomes, respectively (Figure 1 and Tables S9–S12). Phylogenetic analysis revealed that the cultivated flax and pale flax diverged at about 2.32 million years ago (Figure S4). There were two whole-genome duplication events (WGDs) (Ks = 0.13 and Ks = 0.77, respectively) identified since the ancient hexaploidization occurred during angiosperm evolution (Table S13 and Figure S5).
Table 1.
Assembly Statistics for Longya-10, Heiya-14, and Pale Flax
| Accession | Scaffold Number | Total Scaffold Length (bp) | Scaffold N50 (bp) | Scaffold N90 (bp) | Longest Scaffold (bp) | Total Gap Length (bp) |
|---|---|---|---|---|---|---|
| Longya-10 | 1,865 | 305,975,888 | 1,235,007 | 270,149 | 4,613,305 | 5,817,576 |
| Heiya-14 | 2,748 | 303,668,802 | 699,937 | 156,528 | 3,040,329 | 2,841,264 |
| Pale flax | 2,609 | 293,538,124 | 383,912 | 88,775 | 3,507,611 | 5,635,035 |
| Contig Number | Total Contig Length (bp) | Contig N50 (bp) | Contig N90 (bp) | Longest Contig (bp) | GC content (%) | |
|---|---|---|---|---|---|---|
| Longya-10 | 6,319 | 300,092,509 | 130,916 | 29,719 | 926,781 | 38.30 |
| Heiya-14 | 6,191 | 300,827,538 | 156,153 | 35,568 | 1,138,976 | 38.94 |
| Pale flax | 10,198 | 287,903,089 | 59,226 | 15,158 | 654,413 | 38.94 |
Figure 1.
Characterization of the Three Flax Genomes
The outermost to innermost tracks indicate GC content, repeat sequence density, gene density, noncoding RNA distribution, and colinear gene pairs (a set of quadruplicate collinear regions were highlighted). The outer to inner layers of each track indicate pale flax, Longya-10, and Heiya-14 data. See also Tables S11 and S12.
Genomic Comparison of Two Cultivars and Wild Pale Flax
We generated a phylogenetic tree combining our four sequenced genomes (an additional L. grandiflorum individual was also shotgun sequenced) and the available GenBank data of another ten Linum species, giving the hypothesis that the modern cultivated flax might have originated from pale flax (Allaby et al., 2005, Diederichsen and Hammer, 1995, Fu et al., 2002a, Fu et al., 2002b, Gill, 1966, Gill, 1987, Tammes, 1928) (Figure S6). Then, we explored the genomic variations between the two cultivars and pale flax to understand the molecular mechanism for the selection of key agronomic traits in flax domestication. In the Longya-10 genome, a total of 3,623,057 single nucleotide variations (SNVs) and 555,580 insertions and deletions (InDels) were identified, and 3,686,366 SNVs and 557,691 InDels were identified in the Heiya-14 genome (Figure 2A and Table S14). Our results showed that approximately 13.7% SNVs in Longya-10 and 14.2% SNVs in Heiya-14 fell into coding regions, more than half of which were nonsynonymous variations (covering more than 31,000 protein-coding genes in each genome; Figure 2B). In addition, 482 genes containing these nonsynonymous SNVs were positively selected in the two cultivars compared with pale flax (Table S15) and 23 of these genes are homologs of genes involved in oil and fiber biosynthesis (Table S16). Only 4.26% and 4.51% of InDels existed in CDS regions of the Longya-10 and Heiya-14 genomes (covering ∼11,000 genes in each genome), respectively (Figure 2B and Table S14).
Figure 2.
Genomic Variations between Longya-10, Heiya-14, and Pale Flax
(A) Distribution and density of genomic variations across the flax genomes. The outer to inner circles of each track show SNVs and InDels. The outer to inner layers of each track indicate variations between pale flax and Longya-10 and variations between Heiya-14 and Longya-10. See also Table S14.
(B) Distribution of SNVs and InDels in intergenic, intron, and CDS regions between pale flax and Longya-10 and pale flax and Heiya-14. In CDS, SNVs were classified into synonymous and nonsynonymous SNVs. See also Table S14.
(C) KEGG enrichment of genes carrying nonsynonymous SNVs between cultivars (Longya-10 and Heiya-14) and pale flax. An asterisk indicates a significantly enriched pathway. See also Table S20.
(D) KEGG enrichment of genes carrying InDels between cultivars (Longya-10 and Heiya-14) and pale flax. An asterisk indicates a significantly enriched pathway. See also Table S20.
(E–H) InDels in LuFCA, LuMYB83-1, LuALC, and LuLEC1. Gene structures of LuFCA, LuMYB83-1, LuALC, and LuLEC1 in Longya-10 are shown at the top (The exons are shown in orange, introns are shown in black lines); nucleotide and amino acid sequences are shown at the bottom. Red indicates InDels in Longya-10 and Heiya-14 compared with pale flax. At the bottom, the upper layers to the lower layers indicate pale flax, Longya-10, and Heiya-14. See also Table S18 and Figure S7.
To identify genomic variants that are likely important in flax domestication, we annotated the genes harboring the common nonsynonymous SNVs and InDels in the two cultivars. The results show that InDel variations occurred in the homologs of flowering time-related gene FCA, fruit dehiscence-related gene ALCATRAZ (ALC), secondary cell wall biosynthesis-related gene MYB83, and seed oil biosynthesis-related gene leafy cotyledon 1 (LEC1) during flax domestication (Figures 2E–2H and S7 and Tables S17 and S18) (Simpson et al., 2010, Rajani and Sundaresan, 2001, Zhong et al., 2007, Tang et al., 2018). Importantly, LuALC, a gene related to the MYC/bHLH family of transcription factors, carries a frameshift variation caused by a 4-bp insertion in the two cultivars compared with pale flax; LuMYB83-1, a homolog of lodging-related gene AtMYB83, has a 21-bp insertion in the C terminal domain in the two cultivars. These large-effect variations (nonsynonymous SNV, frameshift, premature, etc.) were possibly maintained from the original selection for favorable agronomic traits in flax domestication. Additionally, the gene expressions of LuFCA, LuMYB83-1, and LuLEC1, but not LuALC, were remarkably elevated in the two cultivars (Table S19 and Figure S8). In Arabidopsis, AtALC expression can promote the cell separation in fruit dehiscence (Rajani and Sundaresan, 2001), whereas in cultivated flax, a low level of LuALC expression is maintained until fruit harvest. This reduced expression of LuALC may indicate the selection for indehiscent flax lineages during flax cultivation. Functional enrichment analysis of genes carrying SNVs and InDels shows that genes involved in plant hormone signal transduction (ko04075, ko00905), pentose and glucuronate interconversions (ko00040), starch and sucrose metabolism (ko00500), and glycosphingolipid biosynthesis (ko00603) are significantly overrepresented (Figures 2C and 2D and Table S20), indicating that plant architecture (plant height, leaf shape, branching pattern, upright/prostrate, etc.), seed yield, and/or nutritional quality were the primary domestication objectives.
Divergence of the Cultivated Flax Population
The cultivated flax is divided into two major morphotypes: oil-use flax and fiber-use flax. To understand the genomic basis of divergence of oil-use and fiber-use flax during its improvement, we performed a population analysis using 83 flax accessions (including 24 landraces, 47 oil-use, and 12 fiber-use cultivars, Table S21 and Figure S9). Re-sequencing of these 83 accessions generated a total of 4.88 billion paired-end reads (∼615 Gb) with an average depth of 11.2× and coverage of 97.4%. By aligning all sequencing reads against the Longya-10 genome, a total of 2,245,463 SNPs and 394,658 InDels were detected in 83 accessions (Tables S22 and S23). We constructed a phylogenetic tree and conducted a population structure analysis using whole-genome SNPs, supporting that all 83 flax accessions resulted in three large groups belonging to landrace, oil-use, and fiber-use flax groups, respectively (Figures 3A and S10). These three groups were further validated by the principal component analysis (Figure 3B). A closer relationship between the oil-use group and landrace group was resolved through the phylogenetic tree and population structure analyses. Additionally, the lowest population diversity (π = 9.80×10−4) and longest linkage disequilibrium (LD) decay distance (66.7Kb) were observed in the fiber flax group (Figures 3C and 3D). The climate oscillations and artificial directional selections on crop traits can dramatically diminish genetic diversity and in turn influence the effective population sizes (Ne). Using SMC++ (Terhorst et al., 2017), we indeed inferred that all three flax populations experienced sharp bottlenecks mirroring by the continual Ne declines in the recent 20,000 years, coinciding with the period of the Last Glacial Maximum (about 20,000 years ago) and the onset of flax cultivation (about 10,000 years ago, Figure S11, Kleman and Hättestrand, 1999, Hillman, 1975, van Zeist and Bakker-Heeres, 1975, Zohary and Hopf, 2000).
Figure 3.
Flax Populations
(A) A neighbor-joining tree of 83 flax accessions (24 landraces, 47 oil-use flax, and 12 fiber-use flax) using SNPs detected in whole-genome resequencing data.
(B) Principal component analysis plots of the first two components of 83 accessions.
(C) Nucleotide diversity (π) within groups and population divergence (FST) across groups.
(D) Decay of LD measured by r2 for each of the three groups.
Selective Sweeps during Flax Improvement
Crop improvement frequently causes a drastic loss of diversity in genomic regions (named selective sweep) that contain genes conferring favorable agronomic traits. To illuminate the different molecular mechanisms underlying the divergence of traits in flax improvement, we identified potential selective sweeps by comparing the oil-use and fiber-use groups with the landrace group separately (designated as landrace-to-oil and landrace-to-fiber, respectively). A total of 108 putative selective sweeps (15.5 Mb in length, 1,958 genes) and 60 potential selective sweeps (8.2 Mb in length, 1,018 genes) were detected in the landrace-to-oil and landrace-to-fiber comparison, respectively, among which 27 selective sweeps overlapped with each other (Tables S24, S25, and S26 and Figure S12).
Variations of genes in the selective sweeps unique for either the oil-use or the fiber-use flax might be specifically required for the improvement of the oil or fiber properties. Therefore, we investigated the 1,547 and 780 genes in the unique sweeps of the landrace-to-oil and landrace-to-fiber comparison, respectively. Annotations of the genes carrying large-effect variations show that oil-related genes encoding alpha biotin carboxyl carrier protein (LuBCCP), lipoxygenase (LuLOX), fatty acyl-ACP thioesterases A (LuFatA), lipid transfer protein (LuLTP), E2 component of pyruvate dehydrogenase complex (LuPDH-E2), and seed size-related genes brassinosteroid Insensitive 2 (LuBIN2) and LuGW5 are detected in the landrace-to-oil comparison, whereas homologs of the secondary cell wall biosynthesis-related genes (LuMYB46-1, LuXTH, and LuROPGAP3) and the plant stem length-related genes (LuGA3ox, LuGA20ox, and LuGID1; Figures 4A, 4C–4E, and S13, Tables S27 and S28) were found in landrace-to-fiber comparison. Along with the differential gene expression patterns associated with fatty acid and secondary cell wall biosynthesis during stem and seed development (Figure S14), these results illustrate the direction and strength of artificial selections on the oil-use and fiber-use flax diverge during the modern flax breeding.
Figure 4.
Detection and Functional Annotation of Selective Sweeps
(A and B) Selection signals in landrace-to-oil comparison and oil-to-fiber comparison were defined by the top 5% πratio and FST values (the genomic regions below and above the horizonal lines, respectively). The arrows indicate the genes associated with several important agronomic traits. (A) Landrace-to-oil comparison; (B) oil-to-fiber comparison.
(C–G) (C–G) The πratio and FST values for candidate genes are shown at the top; the amino acid substitutions resulting from the large-effect SNP mutations for those candidate genes are shown at the bottom. Red indicates amino acid substitutions between landrace, oil, and fiber flax. Landrace, oil, and fiber flax groups are indicated from the top to the lower layers.
Considering that the modern fiber-use flax cultivars were often bred from oil-use flax (Allaby et al., 2005, Fu et al., 2012), we also identified 47 potential selective sweeps (6.5 Mb in length, 867 genes) in the oil-to-fiber comparison, of which 50.9% (441/867 genes) are also in the selective sweeps found in the landrace-to-fiber comparison, suggesting that these relevant genomic regions were continuously subjected to strong selective pressure during the improvement of fiber-use flax (Figures 4B, 4F, 4G, and S12, Tables S24, S25, and S26). Approximately half of the genes (426/867 genes) were only found to locate in the oil-to-fiber comparison. Annotations of these unique genes carrying large-effect variations identified the homologs of genes encoding endo-β-1,4-glucanase (LuKorrigan), pectin methyl esterase (LuPME), and copalyl pyrophosphate synthase (LuCPS) (Tables S27 and S28). These divergent selections in fiber-use flax, corroborated by the transcriptome analysis results (Figure S14), imply that multiple rounds of selection on diverse genomic loci contributed to the improvement of flax fiber properties.
To further investigate the contributions of selective sweeps to the flax improvement, we compared our selective sweeps with the previously reported quantitative trait/genome-wide association study (QTL/GWAS) loci (Soto-Cerda et al., 2014, Kumar et al., 2015; Xie et al., 2018). We found two oil-use selective sweeps that overlap with two QTLs of stearic acid and one fiber-use selective sweep that overlaps with a GWAS locus of stem length. Interestingly, we also found another three fiber-use selective sweeps that intersect with three oil biosynthesis QTL/GWAS loci. This phenomenon, in conjunction with the common selective sweeps found in the landrace-to-oil and landrace-to-fiber comparisons, implies a dual selection for oil-use and fiber-use flax, also called “syndrome” traits domestication/improvement (Table S29 and Figure S15).
Evolution of MYB46/MYB83 Genes and Their Roles in the Secondary Cell Wall Biosynthesis in Flax
Fibers are a type of specialized cell with a thickened secondary cellular wall in plants. It is well known that AtMYB83/MYB46 are two master regulators for secondary cell wall biosynthesis in Arabidopsis (Zhong et al., 2007). Phylogenetic analysis of MYB46/MYB83 genes from the eleven species uncovered that at least two copies of MYB46/MYB83 existed within the ancestral lineages of eudicots, belonging to the MYB46 and MYB83 gene lineages, respectively (Figure S16). In the following evolutionary trajectory, species-specific duplications occurred in MYB46/MYB83 genes for flax, poplar, apple, alfalfa, and cassava. In our study, four of the eight identified LuMYB46/LuMYB83 homologs displayed elevated expressions in Longya-10 or Heiya-14 in comparison with pale flax (Table S19 and Figure S8). Additionally, many genomic variations of LuMYB46-1, -2 and LuMYB83-1 were found in cultivated flax. LuMYB83-1 was detected a 21-bp insertion in two cultivars in comparison to pale flax (Figure 2F), and LuMYB46-1 underwent strong selection during the flax improvement (Figures 4B and 4G). LuMYB46-2 also has divergent insertion/deletion variations in Longya-10 and/or Heiya-14 (Figure S17). Because MYB46/MYB83 genes are important for the secondary cell wall biosynthesis (Zhong et al., 2007, Zhong and Ye, 2012), the evolution of LuMYB46/LuMYB83 was likely to be essential in reshaping the biosynthesis of the secondary cell wall during flax domestication and improvement.
In flax, four pairs of MYB46/MYB83 sister genes situate in collinear genomic regions and the latest split happened around the time when the most recent WGD occurred (Ks = 0.13, Table S30), implying that this WGD event led to the latest expansion of MYB46/MYB83 genes in flax. A comparison of the collinear blocks between flax and grape supports the hypothesis that two additional block duplications caused the expansions of MYB46/MYB83 genes (Table S31 and Figure S18). The deteriorated collinearity between the non-sister blocks and the high Ks values (all Ks > 1 except for the sister MYB46/MYB83 gene pairs) of MYB46/MYB83 gene pairs seemingly excluded the possibility that the expansion of MYB46/MYB83 genes stemmed from an early WGD event (Ks = 0.77) or other block duplications happened at that period (Table S32). Of course, the status of divergence in MYB46/MYB83 genes might be blurred by the dynamic changes of the evolutionary rate and the genome fractionation during the repeated polyploidization and diploidization. But no matter how they duplicated under what kinds of circumstances, the expansion of MYB46/MYB83 genes provided potential activators of secondary cell wall biosynthesis. These MYB46/MYB83 homologs, also observed in several other plants, might be specifically required for the secondary cell wall biosynthesis by regulating the expressions of downstream genes (Zhao and Dixon, 2011, Zhong et al., 2007, Zhong and Ye, 2015). To test this hypothesis, we examined the expressions of 49 genes associated with secondary cell wall biosynthesis in Longya-10, Heiya-14, and pale flax (Table S33). Of the identified 40 differentially expressed genes, eight showed more than a 10-fold increase in at least one cultivar, and the expression levels of three genes encoding Xyloglucan endotransglycosylases/hydrolases, which participate in fiber elongation, increased by more than 100-fold in Heiya-14 compared with that of Longya-10 and pale flax (Figure S19). A more comprehensive expression profile of 1,199 genes associated with secondary cell wall biosynthesis between Tianshuixian (a landrace accession), Longya-10, and Heiya-14 was further investigated using RNA sequencing (Figure S20). The result reveals that highly expressed genes tend to enrich in Heiya-14, demonstrating that artificial selection for fiber properties was intensified in fiber-use flax.
Discussion
A previous study produced a fragmented genome assembly for an oil-use cultivar CDC Bethune, consisting of 88,384 scaffolds (116,602 contigs) (Wang et al., 2012). Recently, a chromosome-level assembly of the CDC Bethune genome has been constructed using BioNano genome optical map technology (You et al., 2018). However, a large number of discontinuous contigs remained in the flax genome assembly. In this study, we de novo assembled the genome of another oil-use cultivar, Longya-10, reducing the number of contigs and scaffolds to 6,521 and 2,006, respectively (You et al., 2018), among which 96.7% of assembly could be further scaffolded into 15 pseudochromosomes by combined Hi-C interaction signal and genetic map. This improved flax reference genome can deepen the evolutionary genomics analysis. Under the long-term artificial selection of beneficial agronomic traits, the cultivated flax has distinct phenotypes compared with pale flax: decreased growing period (60 versus 300 days), indehiscent capsule, increased yield (∼5 versus ∼1 g/1,000 seeds), modifications in plant architecture (upright versus prostate; 70 versus 40 cm in plant height; ∼5 versus ∼70 in branching number). The genetic changes behind these changes of phenotypes from pale flax to cultivated flax were not expounded by a genome-wide comparative analysis. With the aid of the assemblies of two flax cultivars and a pale flax in our study, we found 804 flax genes with large-effect variations whose homologs are considered to regulate domestication-related traits in plants (Badouin et al., 2017, Fang et al., 2017, Li et al., 2014, Varshney et al., 2017). Importantly, homologs of FCA, ALC, LEC1, and MYB83-1 genes are important for flowering, oil synthesis, secondary cell wall biosynthesis, and indehiscence, respectively. Published studies revealed that activated FCA promotes early flowering by repressing the mRNA accumulation of floral repressor FLOWERING LOCUS C (FLC); overexpression of LEC1 in Arabidopsis and Arachis hypogaea can enhance the production of fatty acid; overexpression of MYB83 is capable of thickening secondary cell walls in the xylem vessels; and wild-type siliques in Arabidopsis forms a nonlignified cell layer at the site of separation but alc mutation fails to differentiate such a cell layer, leading to the production of indehiscent fruits (Simpson et al., 2010, Tang et al., 2018, Zhu et al., 2018, McCarthy et al., 2009, Rajani and Sundaresan, 2001). The novel variations found in these genes in cultivated flax may help to reveal the early footprints of flax domestication. Additionally, we speculated that the modified regulations of plant hormones (gibberellin and brassinosteroid) profoundly affected the flax plant architecture during domestication based on the functional enrichment of genes with large-effect variations in the two cultivars compared with pale flax.
The Ne analysis implies that the ancestors of flax experienced strong bottlenecks owing to prehistoric climatic oscillations and subsequent human selections. Furthermore, in agreement with previous studies, our population analysis confirmed that the domestication of oil-use flax preceded the fiber-use flax, although the scarcity of fiber-use flax (12 accessions) probably caused a loss of information on the pedigree relationships. It is noteworthy that most flax cultivars investigated up to now have been representatives of modern flax breeding programs since the 1900s, whereas landrace and oil-fiber dual-purpose flax are supposed to be more closely related to the primitive domesticated flax lineages (Fu et al., 2012). As a consequence, the selective sweeps explored in our study can provide hints of modern oil-use and fiber-use flax improvement. As expected, oil-use and fiber-use flax have undergone divergent selections owing to their respective application preference. Similar to previous studies of oil-use flax domestication history, unique selective sweeps found in landrace-to-fiber comparison and oil-to-fiber comparison imply divergent geographic origins or multiple rounds of selection for fiber-use characteristic, despite their monophyletic clustering in our population phylogeny. Unlike other crop progenitors, the pale flax has a worldwide biogeographical distribution. Furthermore, as a principal source of oil and fiber, its domestication started from prehistoric times (Zohary and Hopf, 2000). Therefore, it is likely that a suite of landrace flax populations independently formed in situ, from which oil-use and fiber-use flax were gradually domesticated/improved. Moreover, the repeated selections on the same genomic region imperative for both oil and fiber characteristics signified that a series of syndrome traits collectively evolved during the cultivation in flax.
The MYB transcription factor family participates in a wide range of biological processes in plants (Cominelli and Tonelli, 2009, Xie et al., 2010). The MYB46/MYB83, as master switch genes, can activate secondary cell wall biosynthesis in fibers and vessels (Zhong and Ye, 2012). In flax, the number of MYB46/MYB83 genes expanded 4-fold since the divergence from the ancestral eudicots lineages, and the latest expansion of MYB46/MYB83 genes resulted from the most recent WGD event. The continual duplication and functional divergence of MYB46/MYB83 genes potentially shaped the unique regulation in the secondary cell wall biosynthesis in flax. During the domestication and improvement, the agronomically beneficial variations of MYB46/MYB83 genes were retained by the artificial selections in the oil-use and fiber-use flax populations, making the flax a popular crop worldwide. Our data that uncovered genes with major effects on flax domestication and improvement will facilitate molecular breeding in the future.
Limitations of the Study
Owing to the absence of wild flax populations (pale flax populations), the domestication history from pale flax to landrace flax was studied by genomic comparison between pale and two cultivated flax assemblies. Although the fiber flax accessions were gathered over four countries (Belgium, France, Holland, and China), genetic diversity within the fiber-use flax population might be largely underestimated when only twelve individuals were investigated.
Methods
All methods can be found in the accompanying Transparent Methods supplemental file.
Acknowledgments
This project was supported by China Agricultural Research System of 577 Construct Special on Characteristics Oil (CARS-14-1-05), Major Science and Technology Projects of Gansu (17ZD2NA016-3), Technology Innovation of Gansu Academy of Agricultural Sciences (2017GAAS22), and the National Natural Science Foundation of China (31560401, 31760426, and 31460388). X.Y. also provided financial support for this project.
Author Contributions
Z. Dang and J.Z. conceived the project. J.Z., Z.Dang, Y.Q., and L. Wang directed and managed the research. L. Wang and Y.Q. performed the data analyses. X.Y., X.L., M.L., J.W., X.Z., and H.Z. contributed to the data analysis. J.Z., T.L., and Y.Q. interpreted the data and wrote the manuscript. J.Z., L. Wang, and Z. Dang conducted the fieldwork and performed phenotyping. Y.Q. designed and performed the verification experiment. W.L. and W.Z. provided support for the experiment. L. Wang, X.Y., X.P., M.T., L. Wang, and Y.L. contributed to sample collection. Y.Q. submitted the data to the databases.
Declaration of Interests
The authors declare no competing interests.
Published: April 24, 2020
Footnotes
Supplemental Information can be found online at https://doi.org/10.1016/j.isci.2020.100967.
Contributor Information
Jianping Zhang, Email: zhangjpzw3@gsagr.ac.cn.
Zhanhai Dang, Email: 13669338239@163.com.
Hongkun Zheng, Email: zhenghk@biomarker.com.cn.
Touming Liu, Email: liutouming@caas.cn.
Data and Code Availability
Genome assemblies of Longya-10, Heiya-14 and pale flax have been deposited at DDBJ/ENA/GenBank: QMEI00000000, QMEH00000000 and QMEG00000000. The re-sequencing raw data and transcriptome sequence reads have been deposited in SRA: SRP160418 and PRJNA505721.
Supplemental Information
References
- Allaby R.G., Peterson G.W., Merriwether D.A., Fu Y.B. Evidence of the domestication history of flax (Linum usitatissimum L.) from genetic diversity of the sad2 locus. Theor. Appl. Genet. 2005;112:58–65. doi: 10.1007/s00122-005-0103-3. [DOI] [PubMed] [Google Scholar]
- Badouin H., Gouzy J., Grassa C.J., Murat F., Staton S.E., Cottret L., Lelandais-Briere C., Owens G.L., Carrere S., Mayjonade B. The sunflower genome provides insights into oil metabolism, flowering and Asterid evolution. Nature. 2017;546:148–152. doi: 10.1038/nature22380. [DOI] [PubMed] [Google Scholar]
- Bolger A., Scossa F., Bolger M.E., Lanz C., Maumus F., Tohge T., Quesneville H., Alseekh S., Sorensen I., Lichtenstein G. The genome of the stress-tolerant wild tomato species Solanum pennellii. Nat. Genet. 2014;46:1034–1038. doi: 10.1038/ng.3046. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bredeson J.V., Lyons J.B., Prochnik S.E., Wu G.A., Ha C.M., Edsinger-Gonzales E., Grimwood J., Schmutz J., Rabbi I.Y., Egesi C. Sequencing wild and cultivated cassava and related species reveals extensive interspecific hybridization and genetic diversity. Nat. Biotechnol. 2016;34:562–570. doi: 10.1038/nbt.3535. [DOI] [PubMed] [Google Scholar]
- Cloutier S., Ragupathy R., Niu Z., Duguid S. SSR-based linkage map of flax (Linum usitatissimum L.) and mapping of QTLs underlying fatty acid composition traits. Mol. Breed. 2011;28:437–451. [Google Scholar]
- Cominelli E., Tonelli C. A new role for plant R2R3-MYB transcription factors in cell cycle regulation. Cell Res. 2009;19:1231–1232. doi: 10.1038/cr.2009.123. [DOI] [PubMed] [Google Scholar]
- Diederichsen A., Hammer K. Variation of cultivated flax (Linum usitatissimum L. subsp.usitatissimum) and its wild progenitor pale flax (subsp.angustifolium (Huds.) Thell.) Genet. Resour. Crop Evol. 1995;42:263–272. [Google Scholar]
- Duan N., Bai Y., Sun H., Wang N., Ma Y., Li M., Wang X., Jiao C., Legall N., Mao L. Genome re-sequencing reveals the history of apple and supports a two-stage model for fruit enlargement. Nat. Commun. 2017;8:249. doi: 10.1038/s41467-017-00336-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fang L., Gong H., Hu Y., Liu C., Zhou B., Huang T., Wang Y., Chen S., Fang D.D., Du X. Genomic insights into divergence and dual domestication of cultivated allotetraploid cottons. Genome Biol. 2017;18:33. doi: 10.1186/s13059-017-1167-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fu Y.B. Genetic evidence for early flax domestication with capsular dehiscence. Genet. Resour. Crop Evol. 2011;58:1119–1128. [Google Scholar]
- Fu Y.B. Population-based resequencing revealed an ancestral winter group of cultivated flax: implication for flax domestication processes. Ecol. Evol. 2012;2:622–635. doi: 10.1002/ece3.101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fu Y.B., Peterson G.W. Developing genomic resources in two Linum species via 454 pyrosequencing and genomic reduction. Mol. Ecol. Resour. 2012;12:492–500. doi: 10.1111/j.1755-0998.2011.03100.x. [DOI] [PubMed] [Google Scholar]
- Fu Y.B., Diederichsen A., Allaby R.G. Locus-specific view of flax domestication history. Ecol. Evol. 2012;2:139–152. doi: 10.1002/ece3.57. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fu Y.B., Diederichsen A., Richards K.W., Peterson G. Genetic diversity within a range of cultivars and landraces of flax (Linum usitatissimum L.) as revealed by RAPDs. Genet. Resour. Crop Evol. 2002;49:167–174. [Google Scholar]
- Fu Y.B., Peterson G., Diederichsen A., Richards K.W. RAPD analysis of genetic relationships of seven flax species in the genus Linum L. Genet. Resour. Crop Evol. 2002;49:253–259. [Google Scholar]
- Gao L., Gonda I., Sun H., Ma Q., Bao K., Tieman D.M., Burzynski-Chang E.A., Fish T.L., Stromberg K.A., Sacks G.L. The tomato pan-genome uncovers new genes and a rare allele regulating fruit flavor. Nat. Genet. 2019;51:1044–1051. doi: 10.1038/s41588-019-0410-2. [DOI] [PubMed] [Google Scholar]
- Gill K. 1966. Evolutionary Relationships Among Linum Species. Ph.D. Thesis. [Google Scholar]
- Gill K. Indian Council of Agricultural Research; 1987. Linseed. [Google Scholar]
- Golicz A.A., Bayer P.E., Barker G.C., Edger P.P., Kim H., Martinez P.A., Chan C.K., Severn-Ellis A., McCombie W.R., Parkin I.A. The pangenome of an agronomically important crop plant Brassica oleracea. Nat. Commun. 2016;7:13390. doi: 10.1038/ncomms13390. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Helback H. Domestication of Food Plants in the Old World: joint efforts by botanists and archeologists illuminate the obscure history of plant domestication. Science. 1959;130:365–372. doi: 10.1126/science.130.3372.365. [DOI] [PubMed] [Google Scholar]
- Hillman G. Proceedings of the Prehistoric Society. 1975. The plant remains from Tell Abu Hureyra: a preliminary report; pp. 70–73. [Google Scholar]
- Hübner S., Bercovich N., Todesco M., Mandel J.R., Odenheimer J., Ziegler E., Lee J.S., Baute G.J., Owens G.L., Grassa C.J. Sunflower pan-genome analysis shows that hybridization altered gene content and disease resistance. Nat. Plants. 2019;5:54–62. doi: 10.1038/s41477-018-0329-0. [DOI] [PubMed] [Google Scholar]
- Kleman J., Hättestrand C. Frozen-bed fennoscandian and laurentide ice sheets during the Last glacial maximum. Nature. 1999;402:63. [Google Scholar]
- Kumar S., You F.M., Duguid S., Booker H., Rowland G., Cloutier S. QTL for fatty acid composition and yield in linseed (Linum usitatissimum L.) Theor. Appl. Genet. 2015;128:965–984. doi: 10.1007/s00122-015-2483-3. [DOI] [PubMed] [Google Scholar]
- Li Y.H., Zhou G., Ma J., Jiang W., Jin L.G., Zhang Z., Guo Y., Zhang J., Sui Y., Zheng L. De novo assembly of soybean wild relatives for pan-genome analysis of diversity and agronomic traits. Nat. Biotechnol. 2014;32:1045–1052. doi: 10.1038/nbt.2979. [DOI] [PubMed] [Google Scholar]
- McCarthy R.L., Zhong R., Ye Z.H. MYB83 is a direct target of SND1 and acts redundantly with MYB46 in the regulation of secondary cell wall biosynthesis in Arabidopsis. Plant Cell Physiol. 2009;50:1950–1964. doi: 10.1093/pcp/pcp139. [DOI] [PubMed] [Google Scholar]
- Qin C., Yu C., Shen Y., Fang X., Chen L., Min J., Cheng J., Zhao S., Xu M., Luo Y. Whole-genome sequencing of cultivated and wild peppers provides insights into Capsicum domestication and specialization. Proc. Natl. Acad. Sci. U S A. 2014;111:5135–5140. doi: 10.1073/pnas.1400975111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rajani S., Sundaresan V. The Arabidopsis myc/bHLH gene ALCATRAZ enables cell separation in fruit dehiscence. Curr. Biol. 2001;11:1914–1922. doi: 10.1016/s0960-9822(01)00593-0. [DOI] [PubMed] [Google Scholar]
- Simpson G.G., Laurie R.E., Dijkwel P.P., Quesada V., Stockwell P.A., Dean C., Macknight R.C. Noncanonical Translation Initiation of the Arabidopsis flowering time and alternative polyadenylation regulator FCA. Plant Cell. 2010;22:3764–3777. doi: 10.1105/tpc.110.077990. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Smykal P., Bacova-Kerteszova N., Kalendar R., Corander J., Schulman A.H., Pavelek M. Genetic diversity of cultivated flax (Linum usitatissimum L.) germplasm assessed by retrotransposon-based markers. Theor. Appl. Genet. 2011;122:1385–1397. doi: 10.1007/s00122-011-1539-2. [DOI] [PubMed] [Google Scholar]
- Soto-Cerda B.J., Duguid S., Booker H., Rowland G., Diederichsen A., Cloutier S. Association mapping of seed quality traits using the Canadian flax (Linum usitatissimum L.) core collection. Theor. Appl. Genet. 2014;127:881–896. doi: 10.1007/s00122-014-2264-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Soto-Cerda B.J., Maureira-Butler I., Muñoz G., Rupayan A., Cloutier S. SSR-based population structure, molecular diversity and linkage disequilibrium analysis of a collection of flax (Linum usitatissimum L.) varying for mucilage seed-coat content. Mol. Breed. 2012;30:875–888. [Google Scholar]
- Stein J.C., Yu Y., Copetti D., Zwickl D.J., Zhang L., Zhang C., Chougule K., Gao D., Iwata A., Goicoechea J.L. Genomes of 13 domesticated and wild rice relatives highlight genetic conservation, turnover and innovation across the genus Oryza. Nat. Genet. 2018;50:285–296. doi: 10.1038/s41588-018-0040-0. [DOI] [PubMed] [Google Scholar]
- Tammes T. The genetics of the genus Linum. Bibliogr. Genet. 1928;4:1–36. [Google Scholar]
- Tang G., Xu P., Ma W., Wang F., Liu Z., Wan S., Shan L. Seed-specific expression of AtLEC1 increased oil content and altered fatty acid composition in seeds of peanut (Arachis hypogaea L.) Front. Plant Sci. 2018;9:260. doi: 10.3389/fpls.2018.00260. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Terhorst J., Kamm J.A., Song Y.S. Robust and scalable inference of population history from hundreds of unphased whole genomes. Nat. Genet. 2017;49:303–309. doi: 10.1038/ng.3748. [DOI] [PMC free article] [PubMed] [Google Scholar]
- van Zeist W., Bakker-Heeres J.A.H. Evidence for linseed cultivation before 6000 BC. J. Archaeol. Sci. 1975;2:215–219. [Google Scholar]
- Varshney R.K., Saxena R.K., Upadhyaya H.D., Khan A.W., Yu Y., Kim C., Rathore A., Kim D., Kim J., An S. Whole-genome resequencing of 292 pigeon pea accessions identifies genomic regions associated with domestication and agronomic traits. Nat. Genet. 2017;49:1082–1088. doi: 10.1038/ng.3872. [DOI] [PubMed] [Google Scholar]
- Wang L., He F., Huang Y., He J., Yang S., Zeng J., Deng C., Jiang X., Fang Y., Wen S. Genome of wild Mandarin and domestication history of Mandarin. Mol. Plant. 2018;11:1024–1037. doi: 10.1016/j.molp.2018.06.001. [DOI] [PubMed] [Google Scholar]
- Wang Z., Hobson N., Galindo L., Zhu S., Shi D., McDill J., Yang L., Hawkins S., Neutelings G., Datla R. The genome of flax (Linum usitatissimum) assembled de novo from short shotgun sequence reads. Plant J. 2012;72:461–473. doi: 10.1111/j.1365-313X.2012.05093.x. [DOI] [PubMed] [Google Scholar]
- Xie D., Dai Z., Yang Z., Tang Q., Sun J., Yang X., Song X., Lu Y., Zhao D., Zhang L. Genomic variations and association study of agronomic traits in flax. BMC Genomics. 2018;19:512. doi: 10.1186/s12864-018-4899-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Xie M., Chung C.Y., Li M.W., Wong F.L., Wang X., Liu A., Wang Z., Leung A.K., Wong T.H., Tong S.W. A reference-grade wild soybean genome. Nat. Commun. 2019;10:1216. doi: 10.1038/s41467-019-09142-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Xie Z., Lee E., Lucas J.R., Morohashi K., Li D., Murray J.A., Sack F.D., Grotewold E. Regulation of cell proliferation in the stomatal lineage by the Arabidopsis MYB FOUR LIPS via direct targeting of core cell cycle genes. Plant Cell. 2010;22:2306–2321. doi: 10.1105/tpc.110.074609. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yang N., Xu X.W., Wang R.R., Peng W.L., Cai L., Song J.M., Li W., Luo X., Niu L., Wang Y. Contributions of Zea mays subspecies mexicana haplotypes to modern maize. Nat. Commun. 2017;8:1874. doi: 10.1038/s41467-017-02063-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- You F.M., Xiao J., Li P., Yao Z., Jia G., He L., Zhu T., Luo M.C., Wang X., Deyholos M.K. Chromosome-scale pseudomolecules refined by optical, physical and genetic maps in flax. Plant J. 2018;95:371–384. doi: 10.1111/tpj.13944. [DOI] [PubMed] [Google Scholar]
- Zhang Q.J., Zhu T., Xia E.H., Shi C., Liu Y.L., Zhang Y., Liu Y., Jiang W.K., Zhao Y.J., Mao S.Y. Rapid diversification of five Oryza AA genomes associated with rice adaptation. Proc. Natl. Acad. Sci. U S A. 2014;111:E4954–E4962. doi: 10.1073/pnas.1418307111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhao Q., Dixon R.A. Transcriptional networks for lignin biosynthesis: more complex than we thought. Trends Plant Sci. 2011;16:227–233. doi: 10.1016/j.tplants.2010.12.005. [DOI] [PubMed] [Google Scholar]
- Zhong R., Ye Z.H. MYB46 and MYB83 bind to the SMRE sites and directly activate a suite of transcription factors and secondary wall biosynthetic genes. Plant Cell Physiol. 2012;53:368–380. doi: 10.1093/pcp/pcr185. [DOI] [PubMed] [Google Scholar]
- Zhong R., Ye Z.H. Secondary cell walls: biosynthesis, patterned deposition and transcriptional regulation. Plant Cell Physiol. 2015;56:195–214. doi: 10.1093/pcp/pcu140. [DOI] [PubMed] [Google Scholar]
- Zhong R., Richardson E.A., Ye Z.H. The MYB46 transcription factor is a direct target of SND1 and regulates secondary wall biosynthesis in Arabidopsis. Plant Cell. 2007;19:2776–2792. doi: 10.1105/tpc.107.053678. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhu Y., Xie L., Chen G.Q., Lee M.Y., Logue D., Scheller H.V. A transgene design for enhancing oil content in Arabidopsis and Camelina seeds. Biotechnol. Biofuels. 2018;11:46. doi: 10.1186/s13068-018-1049-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zohary D., Hopf M. Oxford University Press; 2000. Domestication of Plants in the Old World: The Origin and Spread of Cultivated Plants in West Asia, Europe and the Nile Valley. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Genome assemblies of Longya-10, Heiya-14 and pale flax have been deposited at DDBJ/ENA/GenBank: QMEI00000000, QMEH00000000 and QMEG00000000. The re-sequencing raw data and transcriptome sequence reads have been deposited in SRA: SRP160418 and PRJNA505721.




