A chromosome-scale genome assembly of Artemisia argyi reveals unbiased subgenome evolution and key contributions of gene duplication to volatile terpenoid diversity

Hongyu Chen; Miaoxian Guo; Shuting Dong; Xinling Wu; Guobin Zhang; Liu He; Yuannian Jiao; Shilin Chen; Li Li; Hongmei Luo

doi:10.1016/j.xplc.2023.100516

. 2023 Jan 2;4(3):100516. doi: 10.1016/j.xplc.2023.100516

A chromosome-scale genome assembly of Artemisia argyi reveals unbiased subgenome evolution and key contributions of gene duplication to volatile terpenoid diversity

Hongyu Chen ^1,¹¹, Miaoxian Guo ^1,¹¹, Shuting Dong ^1,¹¹, Xinling Wu ^1,², Guobin Zhang ^3,⁴, Liu He ¹, Yuannian Jiao ^5,⁶, Shilin Chen ^7,^8,^∗, Li Li ^9,^10,^∗∗, Hongmei Luo ^1,^∗∗∗

PMCID: PMC10203441 PMID: 36597358

Abstract

Artemisia argyi Lévl. et Vant., a perennial Artemisia herb with an intense fragrance, is widely used in traditional medicine in China and many other Asian countries. Here, we present a chromosome-scale genome assembly of A. argyi comprising 3.89 Gb assembled into 17 pseudochromosomes. Phylogenetic and comparative genomic analyses revealed that A. argyi underwent a recent lineage-specific whole-genome duplication (WGD) event after divergence from Artemisia annua, resulting in two subgenomes. We deciphered the diploid ancestral genome of A. argyi, and unbiased subgenome evolution was observed. The recent WGD led to a large number of duplicated genes in the A. argyi genome. Expansion of the terpene synthase (TPS) gene family through various types of gene duplication may have greatly contributed to the diversity of volatile terpenoids in A. argyi. In particular, we identified a typical germacrene D synthase gene cluster within the expanded TPS gene family. The entire biosynthetic pathways of germacrenes, (+)-borneol, and (+)-camphor were elucidated in A. argyi. In addition, partial deletion of the amorpha-4,11-diene synthase (ADS) gene and loss of function of ADS homologs may have resulted in the lack of artemisinin production in A. argyi. Our study provides new insights into the genome evolution of Artemisia and lays a foundation for further improvement of the quality of this important medicinal plant.

Keywords: Artemisia argyi, subgenome evolution, gene duplication, terpene synthase, germacrene synthase, non-artemisinin production

This study reported a chromosome-scale genome of Artemisia argyi, indicating that unbiased subgenome evolution and gene replication contributed to the diversity of volatile terpenes, and clarified the biosynthetic pathways of germacrenes, (+)-borneol, and (+)-camphor in A. argyi. The absence of amorpha-4,11-diene synthase (ADS) in the A. argyi genome may be the major reason for its lack of artemisinin production.

Introduction

Artemisia argyi Lévl. et Vant. is a perennial herb in the Artemisia genus with bisexual flowers, underground horizontal rhizomes, and an intense fragrance (Figure 1A–1C). The dried leaf of A. argyi is a famous traditional Chinese medicine known as “Chinese mugwort” and is widely used for treatment of eczema, diarrhea, inflammation, hemostasis, and menstruation-related symptoms in China and many other Asian countries (Shin et al., 2017; Liu et al., 2021). Modern pharmacological studies indicate that A. argyi can be used as a broad-spectrum antibacterial and antiviral treatment because of its abundance of volatile terpenoids (Wang et al., 2006; Jiang et al., 2019a). The terpenoids isolated from A. argyi consist mainly of monoterpenes, sesquiterpenes, and their derivatives, such as 1,8-cineole, thujone, β-pinene, camphor, borneol, germacrene D, caryophyllene, and caryophyllene oxide (Zhang et al., 2014; Guan et al., 2019; Song et al., 2019). Specifically, 1,8-cineole and borneol are identified by the Chinese Pharmacopoeia as elements of quality control indicators (Commission of Chinese Pharmacopoeia, 2020). A. argyi is used in production of many health products, leading to a large-scale Chinese mugwort industry with huge economic value for modern applications (Liu et al., 2021).

Landscape of *A. argyi* morphology, genome features, and synteny.

**(A)***A. argyi* in the field.

**(B)** Morphology of *A. argyi* flowers and seeds. 1, multiple inflorescences; 2, inflorescence; 3, hermaphrodite flower; 4, hermaphrodite flower top; 5, pistil of hermaphrodite flower; 6, synantherous stamen; 7, pistillate flower; 8, pollen; 9, seeds.

**(C)** Morphology of *A. argyi* underground horizontal rhizomes.

**(D)** Distribution of *A. argyi* genomic features. The linking lines in the circle represent synteny of paralogous sequences in the genome.

Frequent occurrences of polyploidization, also referred to as whole-genome duplication (WGD), are some of the strongest drivers of angiosperm genome evolution, contributing to speciation and emergence of valuable traits in plants (Eric Schranz et al., 2012; Soltis and Soltis, 2016). Ancient plant polyploidization can produce dissimilar subgenomes. It is common for genes to be unequally lost from subgenomes (a phenomenon known as biased fractionation), and the subgenome that has lost the lowest number of genes is more highly expressed, which is known as genome dominance (Liang and Schnable, 2018). Genome fractionation and genome dominance between two subgenomes show an unbiased pattern in paleo-autopolyploids, whereas biased gene loss and genome dominance have been detected in most ancient allotetraploids (Garsmeur et al., 2014). In addition, WGDs initially double the chromosome number, resulting in a large number of duplicated genes (Van de Peer et al., 2009; Panchy et al., 2016). Previous studies have identified many genes derived from gene duplication that are associated with plant-specialized metabolic pathways (Aubourg et al., 2002; Ober, 2005).

Artemisia is one of the most diverse genera in the Asteraceae family, comprising over 500 species widely distributed throughout areas including Asia, Europe, and North America (Bora and Sharma, 2011). This genus has been the focus of numerous investigations over the last several decades because of the abundance and diversity of its bioactive components, which lend it potential ecological and economic value (Ivănescu et al., 2021; Kshirsagar and Rao, 2021). Although the genome sizes and chromosome ploidies of Artemisia vary, the basic chromosome number is usually eight or nine (Inceer and Hayirlioglu-Ayaz, 2007; Pellicer et al., 2007). Our research (Supplemental Figure 1) and previous studies revealed an A. argyi karyotype of 2n = 34 (Pellicer et al., 2010), which is not consistent with the usual basic chromosome number. The unique karyotype of 2n = 34 suggests that potential species-specific polyploidization events leading to the existence of subgenomes probably occurred in A. argyi. However, data concerning the evolutionary patterns of A. argyi are limited. Moreover, the molecular basis for the biosynthesis of diverse volatile terpenoids in A. argyi is still unclear.

Here we assembled a high-quality chromosome-scale genome of A. argyi and combined comparative genomics, transcriptomics, metabolomics, and functional assays to understand its genome evolution and the diversification of biosynthetic pathways that produce major terpenes in A. argyi. This study establishes a novel and valuable foundation that will contribute to unravelling the genetic diversity and medicinal applications of Artemisia species.

Results

Genome sequencing, assembly, and annotation of the A. argyi genome

The A. argyi genome was assembled using 276.20 Gb of Illumina sequencing data, 497.09 Gb of PacBio Sequel II long-read data, and 426.46 Gb of Illumina Hi-C data (Supplemental Table 1). The genome size of A. argyi was estimated to be approximately 3.87 Gb with a high level of heterozygosity (6.8%) through k-mer (k = 21) prediction using the Illumina sequencing data (Supplemental Figure 2). Flow cytometry measured the genome size to be 3.98 Gb (Supplemental Figure 3), reasonably close to the result of the k-mer prediction. Using PacBio long-read data, the genome was initially assembled into 7.87 Gb containing 14 638 contigs with a contig N50 of 1.45 Mb (Supplemental Table 2). The size of the initial genome assembly is almost twice the size of the genome survey, suggesting that there are two haplotype genomes covered in the initially assembled genome. The contigs were subsequently linked using the Hi-C paired-end reads and anchored into 34 pseudochromosomes, accounting for 97.86% of the assembled genome (Supplemental Table 3). According to the method for creating a chimeric monoploid genome, the 34 pseudochromosomes were separated into two haplotypes based on self-comparison. The longer and more complete chromosomes among the homologous chromosome pairs were assigned to haplotype group A, which contained pseudochromosomes 1–17 (Supplemental Table 4); the others were assigned to haplotype group B, which consisted of pseudochromosomes 18–34 (Supplemental Table 5). Finally, the A. argyi genome containing 17 pseudochromosomes (haplotype A) was obtained with a total length of 3.89 Gb (scaffold N50 of 214 Mb) and was used for subsequent genomic analysis (Figure 1D and Supplemental Figure 4; Supplemental Table 3). Analysis with Benchmarking Universal Single-Copy Orthologs (BUSCO, v.4.0) estimated that the genome completeness was as high as 95.38% (Supplemental Table 6). In total, 94.97% of the Illumina short reads were mapped to the assembled genome (Supplemental Table 7). Core Eukaryotic Genes Mapping Approach (CEGMA) analysis of the assembled genome showed a 97.98% match to the set of 248 highly conserved eukaryotic genes (Supplemental Table 8). The long terminal repeat (LTR) assembly index of the genome assembly was 18.65 (Supplemental Figure 5). Collectively, these results demonstrate the high continuity and quality of the A. argyi genome assembly. The quality of the A. argyi genome is comparable to that achieved in previous research using HiFi technology (Supplemental Table 9; Miao et al., 2022).

A total of 62 844 protein-coding genes were predicted in the A. argyi genome based on homology prediction, ab initio prediction, and transcriptome-based prediction (Supplemental Table 10). Most of the genes were assigned to functions (Supplemental Table 11). BUSCO analysis showed that 96.00% of the conserved core genes in the eudicot database were present among our predicted genes, confirming the high integrity of the gene prediction (Supplemental Table 12). In addition, annotation of noncoding RNA genes revealed 4079 transfer RNA, 202 ribosomal RNA, 584 small nuclear RNA, and 309 microRNA genes (Supplemental Table 13). A combination of homolog-based comparisons and structure-based analysis facilitated annotation of 3.16 Gb of repetitive elements, representing 81.03% of the A. argyi genome (Supplemental Table 14). LTR retrotransposons (LTR-RTs) were the major transposable elements (TEs) present, comprising 70.78% of the assembled genome (Supplemental Table 14). Among the LTR-RTs, most TEs were Gypsy and Copia elements, constituting 25.17% and 19.27% of the A. argyi genome, respectively, which may have contributed to genome expansion (Supplemental Figure 6).

The phylogenetic placement of A. argyi

We constructed a phylogenetic tree using a dataset containing 1057 single-copy orthologous genes that were identified by grouping orthologous protein sequences from A. argyi and 11 other plant species (Vitis vinifera, Solanum lycopersicum, Sesamum indicum, Lonicera japonica, Panax notoginseng, Carthamus tinctorius, Lactuca sativa, Helianthus annuus, Erigeron breviscapus, Chrysanthemum nankingense, and Artemisia annua) (Supplemental Table 15). Except for V. vinifera, these species are typical representatives of asterids I and II, thus enabling the phylogenetic tree to reveal accurate evolutionary characteristics of A. argyi.

Based on the phylogenetic tree and data from TimeTree (http://www.timetree.org/), A. argyi clustered with A. annua, and they were most closely related to C. nankingense (Figure 2A). The divergence time between A. argyi and A. annua was estimated to be ∼5.4 million years ago (mya) (Figure 2A). Phylogenetic analysis of the complete chloroplast genome sequences of 15 Artemisia species showed that A. argyi, Artemisia montana, and Artemisia lactiflora clustered together in a single clade, although the three species possessed different numbers of chromosomes (Supplemental Figure 7).

A recent lineage-specific whole-genome duplication (WGD) and the unbiased subgenome evolution of *A. argyi*.

**(A)** Phylogenetic tree comprising 12 species, including *A. argyi*. The numbers near the nodes indicate the estimated divergence times. Pink bars show the 95% confidence interval of the divergence times in millions of years ago (mya). Expansions and contractions of gene families are denoted as numbers with green plus and red minus signs, respectively.

**(B)** Density plot of synonymous substitution rate (K_s) values for orthologous and paralogous genes between *A. argyi*, *A. annua*, *H. annuus*, and *C. canephora*. WGT-1 represents the whole-genome triplication (WGT) event detected in Asteraceae.

**(C)** Dot plot of syntenic genes between the genomes of *A. argyi* and *A. annua*. The dotted red lines indicate the breakpoints in syntenic fragments of the *A. argyi* genome compared with that of *A. annua*; rectangles of the same color indicate fragments that are associated with each other in the diploid ancestral genome of *A. argyi*.

**(D)** Genome evolution trajectory of *A. argyi*.

**(E)** Fractionation pattern for two deduced subgenomes of *A. argyi* (in the units of homologous ancestral chromosome pairs) compared with the *A. annua* genome. The x axis indicates gene locations on each ancestral chromosome of *A. argyi* (ACA). The y axis indicates the percentage of retained orthologous genes in the ACAs corresponding to a 1000-gene sliding window. The dark and light lines represent homologous ACA pairs.

Using cluster-based analysis of homologous gene sequence identity, genes from the 12 plant species were clustered into 49 410 gene families. The A. argyi genes were distributed across 3251 gene families that were shared with other species and 585 gene families that were specific to A. argyi (Supplemental Figure 8). Nine hundred fifty-one significantly expanded and 220 significantly contracted gene families were identified in the A. argyi genome (Figure 2A; Supplemental Data 1). Enrichment analysis of species-specific and expanded gene families using the Kyoto Encyclopedia of Genes and Genomes (KEGG) demonstrated that terpenoid biosynthesis was significantly enriched in A. argyi. This may explain the characteristics that are specific to A. argyi (e.g., strong fragrance) and provide the molecular basis for the variety of volatile terpenoids found in this medicinal plant (Supplemental Figure 9).

A recent lineage-specific WGD event of A. argyi

After the ancient whole-genome triplication (WGT) event WGT-γ (approximately 122–164 mya) in dicotyledons, the WGT event WGT-1 (approximately 53–62 mya) was shared by the species of asterids II, such as H. annuus (Badouin et al., 2017), L. sativa (Reyes-Chin-Wo et al., 2017), and A. annua (Liao et al., 2022). H. annuus experienced a lineage-specific WGD (WGD-2, approximately 29 mya) (Badouin et al., 2017). However, no lineage-specific WGD was observed in A. annua (Liao et al., 2022). The syntenic relationships of the chromosomes in the A. argyi genome indicated that this species might have experienced polyploidization (Figure 1D). To infer the A. argyi-specific WGD event, collinearity analysis was performed with A. argyi, A. annua, and H. annuus. There were 30 195 pairs of collinear genes between A. argyi and A. annua and 17 916 pairs between A. argyi and H. annuus (Supplemental Figure 10). The syntenic depth ratio between A. argyi and A. annua was 2:1, meaning that each A. annua genomic region could be matched to two regions in the A. argyi genome; the syntenic depth ratio of A. argyi and H. annuus was 2:2 (Supplemental Figure 10). These analyses showed clear structural evidence of a WGD event in A. argyi. To further investigate the phylogenetic placement of the A. argyi-specific WGD, we compared the distribution of synonymous substitution rate (K_s) values. The K_s distribution of A. argyi paralogous genes showed a clear peak at ∼0.0486, which was near the peak of K_s values for orthologous genes shared by A. argyi and A. annua (K_s = 0.0802) (Figure 2B). The K_s peaks for paralogous genes in H. annuus and Coffea canephora were greater than that in A. argyi. Based on these K_s values and the divergence time between A. argyi and A. annua, we estimated that the WGD event in A. argyi occurred ∼3.3 mya (Figure 2B). K_s analysis indicated that this WGD event of A. argyi was not shared with A. annua, C. nankingense, C. tinctorius, or H. annuus (Supplemental Figure 11), suggesting that the recent WGD event was lineage specific to A. argyi as a result of either auto- or allo-tetraploidization.

Deciphering the ancestral diploid genome and unbiased subgenome evolution in A. argyi

To investigate the evolutionary trajectory of subgenomes in A. argyi derived from the recent lineage-specific WGD, we reconstructed the A. argyi diploid ancestral genome according to the methods used in previous studies (Xu et al., 2020; Wu et al., 2022). An unduplicated outgroup is crucial for identifying duplicated segmental pairs and homologous genes that are lost from one segment but retained in the other (Schnable et al., 2011). A. annua is a stable diploid species of Artemisia with an assembled chromosome-scale genome (Liao et al., 2022). A genomic structure comparison between A. annua and A. argyi showed that A. annua has good genomic synteny to the A. argyi genome (Figure 2C). Therefore, the genome of A. annua was used as a diploid reference to identify duplicated segmental pairs and homologous genes in A. argyi. Based on the genomic synteny relationship between the two species, we defined an Artemisia-specific genomic block (GB) system to identify breakpoints and associations within the ancestral diploid genome of A. argyi. The breakpoints with two copies were assigned as being duplicated and inherited through the WGD into the two subgenomes. A total of 16 breakpoints were identified in the 9 chromosomes of A. annua. These breakpoints split the A. annua genome into 25 GBs (A–Y) (Figure 2C; Supplemental Table 16). Subsequently, the 25 GBs were mapped onto the 17 chromosomes of A. argyi (Figure 2C), followed by screening GB associations in the A. argyi genome that were present across the two subgenomes. Correspondingly, a total of 16 such GB associations were identified (Supplemental Table 17) and used to fuse the 25 GBs into 9 groups. Hence, the diploid ancestral genome of A. argyi was inferred to contain 9 chromosomes before the recent WGD event (Figure 2D). Two sets of duplicated ancestral chromosomes were identified in the GB system (Figure 2C), strongly supporting the inclusion of two subgenomes within the genome of A. argyi. We also performed subgenome evolution analysis in the units of ancestral chromosomes of A. argyi (ACAs). Using a sliding window of 1000 genes along each ACA, we calculated the percentage of orthologous genes retained. None of the ACAs had a significantly higher or lower ortholog retention rate than their homologous chromosomes, indicating that subgenome fractionation was unbiased (Figure 2E; Supplemental Table 18). We also calculated K_a and K_s values of A. annua orthologous genes retained in each ACA and found no statistical differences in K_a/K_s across all ACAs. The peaks of K_s values for orthologous genes shared by each ACA and A. annua overlapped (Supplemental Figures 12 and 13), suggesting that the two subgenomes were similar in evolutionary pattern. Subsequently, we investigated the dominance of expression levels between the two subgenomes. Compared with the expression of homologous genes between each pair of homologous ACAs, no increases in overall expression levels were observed for ACAs (Supplemental Figure 14). The rapid evolution of TEs usually leads to accumulation of species-specific TEs in the process of diversification of two organisms or subgenomes (Renny-Byfield et al., 2015). Accordingly, the TE-specific accumulation between each pair of homologous ACAs was investigated. The matrix for statistics of copy numbers of 32 TE families in the 18 ACAs from the two subgenomes was used for principal-component analysis (Supplemental Data 2). The principal-component analysis result showed that 18 ACAs from the two subgenomes did not segregate into distinct clusters (Supplemental Figure 15), indicating that neither of the two subgenomes accumulated specific TEs. These results collectively demonstrate the unbiased subgenome evolution in A. argyi. Consequently, we propose that an autopolyploidization event occurred in the ancestor of A. argyi at about 3.3 mya, shortly after its divergence from A. annua. In addition, we also found that chromosome 10 (chr10) of A. argyi may have formed by fusion of two chromosomes (Supplemental Figure 16), and this fusion was not replicated by the WGD. Reasonably, we speculated that the fusion was most likely to have occurred after the autopolyploidization event, giving rise to the distinctive karyotype of A. argyi.

Contribution of gene duplication and terpene synthase (TPS) family expansion to volatile terpenoid diversity

Gene duplication is an important source of genetic material for evolution of new functions and has been considered the driving force of plant evolution and innovation (Moore and Purugganan, 2005; Flagel and Wendel, 2009). The recent WGD event in A. argyi produced many duplicated genes. Using DupGen_finder (Qiao et al., 2019), we identified a total of 59 030 duplicate genes in the A. argyi genome, which included WGD duplicates, tandem duplicates (TDs), proximal duplicates, transposed duplicates, and dispersed duplicates (Supplemental Figure 17). Next, we performed KEGG enrichment analysis on the duplicate genes within the expanded gene families in A. argyi. The abundance of duplicated genes in secondary metabolism pathways may have been caused primarily by WGD events (Figure 3A), whereas expansions of metabolic pathways and monoterpene biosynthesis gene families were induced by tandem, proximal, and transposed duplication (Figure 3A). Volatile terpenoids are the main source of aromatic and pharmacodynamic compounds in A. argyi. We detected a variety of terpenoids in the roots, stems, leaves, and flowers of A. argyi, including 1,8-cineole, terpinene, thujanol, β-pinene, camphor, isoborneol, farnesene, and caryophyllene (Figure 3B). To better understand the contribution of duplicate genes to the diversity of volatile terpenoids, we tested the copy number of the genes involved in the mevalonic acid (MVA) pathway and the methylerythritol phosphate (MEP or non-mevalonate) pathway. In contrast to A. annua, the copy numbers of genes (such as AACT, HMGR, DXS, and DXR) in these pathways in A. argyi were affected by various types of gene duplications (Figure 3C). Compared with other gene duplication types, the WGD event contributed more to the increase of gene copies involved in the MVA and MEP pathways (Figure 3C). Transcriptome analysis of the A. argyi root, stem, leaf, and flower suggested that some of the genes derived from gene duplication (such as MCT, CMK, and MDS) had similar expression patterns (Figure 3C). These results indicate that WGD and other duplication events significantly changed the copy number of genes involved in terpene biosynthesis.

Gene duplication and TPS family expansion contributed to diverse volatile terpenoid biosynthesis.

**(A)** Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis of genes in expanded gene families derived from different types of duplication events.

**(B)** Relative contents of volatile terpenoids in multiple *A. argyi* tissues. The gas chromatography-mass spectroscopy (GC-MS) chromatographic peak areas were used for relative quantification.

**(C)** Tissue-specific relative expression profiles of genes related to terpenoid biosynthesis in *A. argyi*, specifically the mevalonic acid (MVA) and methylerythritol phosphate (MEP) pathways. Intermediates are shown in black, and the enzymes involved in each step are shown in orange or green. Duplication types of genes encoding relevant enzymes are indicated with colored dots. Copy number variations of genes involved in volatile terpenoid biosynthesis in *A. argyi* and *A. annua* (from left to right) are shown near the enzymes.

**(D)** Phylogenetic tree of TPSs from *A. argyi*, *A. annua*, and *H. annuus*. Genes from the TPS-a, TPS-b, TPS-g, TPS-e/f, and TPS-c subfamilies are indicated with bands of different colors. The clades were defined by identified nodes representing divergence events and are indicated with black numbers.

**(E)** Copy number changes in TPS genes of the three species. The numbers in red circles and hexagons represent the copy numbers of TPS in ancestral and extant species, respectively. The numbers on the branch with plus and minus signs represent the numbers of genes gained and lost, respectively. The numbers in the right box represent gene counts derived from five types of duplication in each TPS subfamily in *A. argyi*.

TPSs are the key enzymes involved in terpene biosynthesis (Jiang et al., 2019b). One of the most striking features of the A. argyi genome is the annotation of a large number of TPS genes (AarTPSs). A total of 122 TPS genes were annotated in the A. argyi genome, making the AarTPSs more abundant than those in other Asteraceae species, such as Stevia rebaudiana (83 TPSs), H. annuus (79), L. sativa (67), A. annua (63), C. tinctorius (55), E. breviscapus (55), Arctium lappa (49), and Cynara cardunculus (34) (Supplemental Figure 18). To explore the lineage-specific expansion of the TPS gene family in A. argyi, we used the TPS genes from H. annuus and A. annua to construct a phylogenetic tree because these two species represent gene retention after or without a lineage-specific WGD, respectively.

According to the phylogenetic relationships, we determined the nodes leading to A. argyi-, A. annua-, and H. annuus-specific clades and predicted that these nodes represented a divergence point and the most recent common ancestor of the three species (Figure 3D). A total of 97 such gene nodes were identified across the three species, including 90 with high confidence values (≥50%) and 7 with low confidence values (<50%) (Figure 3D). Among the 97 nodes, the TPS-a and TPS-b subfamilies contained 48 and 27 ancestral gene nodes, respectively (Figure 3E). Compared with those of the other two species, the common ancestral genes of A. argyi experienced more expansion and less loss, especially for the TPS-a and TPS-b subfamilies, which contributed the largest number of TPS genes in A. argyi. Among the TPS genes of A. annua, only 28 genes remained in the TPS-a subfamily because of significant gene loss (Figure 3E). Compared with ancestral genes, the expansion and loss of genes after species divergence were not significant in the TPS-c, TPS-e/f, and TPS-g subfamilies of A. argyi. Based on analysis of TPS gene duplication type in A. argyi, 16 orthologs were lost in the TPS-a subfamily, and 37 paralogs were obtained, mainly through WGD or TD (Figure 3E). In particular, almost all TPS-a and TPS-b genes originated from gene duplication events, which were possibly the most important contributors to the diversity of volatile terpenoids in A. argyi.

Characterization of key genes for biosynthesis of important volatile terpenoids

Germacrenes are common sesquiterpenoids in Asteraceae plants and play an important role in defense and signal transduction for plant environmental adaptation (Li et al., 2021). A total of 11 homologous genes annotated as germacrene synthases were identified in the A. argyi genome, including 10 germacrene D synthases (GDSs) and one single-copy germacrene A synthase (GAS). Interestingly, a gene cluster containing two GDS clades (A and B) and other genes was present in the A. argyi genome. These genes were evenly distributed into two modules (1 and 2) located within the 142.54- to 143.11-Mb region of chr17 (Figure 4A). Specifically, AarTPS111, AarTPS112, AarTPS114, and AarTPS115 were clustered into clade A, whereas AarTPS113 and AarTPS116 were clustered into clade B (Figure 4A). The relative expression patterns of the two modules were similar (Figure 4A). On the basis of their phylogenetic relationships, chromosome location, and gene expression profiles, we inferred that these two modules probably derived from a direct tandem duplication event. In addition, other genes (MIP1 and RING/U-box) were harbored within the two modules, and their expression levels were similar, which was sufficient to demonstrate the characteristics of tandem duplication (Figure 4A). In the duplicated gene cluster, AarTPS114 had a relatively high expression level among the GDS homologs, and we therefore examined the enzymatic activities of AarTPS114 (GDS) in vitro (Figure 4B). AarTPS114 catalyzed the production of a broad range of sesquiterpene products from farnesyl pyrophosphate (FPP), including the primary product germacrene D, followed by γ-elemene, β-ylangene, β-copaene, e-muurolene, bicyclogermacrene, γ-muurolene, and germacrene D-4-ol (Figure 4B).

By contrast, the orthologous genes of a single-copy GAS in the A. annua genome were located only on chr06 of A. argyi (AarTPS34), and there was no duplication of AarTPS34 on the homologous chromosome (chr14) (Figure 4C). AarTPS34 converted the substrate FPP, rather than geranyl pyrophosphate (GPP) or geranylgeranyl diphosphate, into the single sesquiterpene product β-elemene (Figure 4D and Supplemental Figure 19), which possesses prominent anti-tumor activities (Bai et al., 2021). β-Elemene was the conversion of germacrene A because of Cope rearrangement under a high injection port temperature of 250°C (Rinkel and Dickschat, 2019). Subcellular localization assays demonstrated that AarTPS34 and AarTPS114 were located in the chloroplasts (Figure 4E).

Borneol and camphor are desirable and valuable monoterpenoids from A. argyi with effective anti-inflammatory, analgesic, and antimicrobial effects (Chinese Pharmacopoeia Commission, 2020; Sokolova et al., 2021). To illustrate the (+)-borneol and (+)-camphor biosynthetic pathway in A. argyi, eight bornyl diphosphate synthase (BPPS) and eight bornyl dehydrogenase (BDH) genes were identified in the A. argyi genome on the basis of sequence similarity and phylogenetic analyses (Figure 4F and Supplemental Figures 20 and 21). One BPPS (AarTPS89) produced (+)-borneol as a single product from GPP in vitro (Figure 4G and Supplemental Figure 22). Two TD gene pairs of BDH homologs (AarBDH4/AarBDH8 and AarBDH5/AarBDH7) were located on the homologous chromosomes chr04 and chr12 (Figure 4H) of the A. argyi genome. Only AarBDH4 and AarBDH5 were expressed in A. argyi, whereas AarBDH8 and AarBDH7 showed almost no expression (Figure 4F). The functional identification of these two TD genes (AarBDH4 and AarBDH5) showed that both of the BDHs used (+)-borneol as a substrate and NAD⁺ as a cofactor to yield (+)-camphor (Figure 4I and Supplemental Figure 23), indicating that they have the same function in A. argyi. Thus, the entire biosynthetic pathway of (+)-borneol and (+)-camphor was characterized in A. argyi.

Absence of ADS in the A. argyi genome

Amorpha-4,11-diene synthase (ADS) is a key enzyme for artemisinin biosynthesis in A. annua. The ADS genes exist in the form of a tandemly duplicated gene cluster with six or four copies in the genomes of two A. annua strains (HAN1 and LQ-9) (Liao et al., 2022). Although A. argyi and A. annua belong to the same genus, artemisinin-related compounds have never been detected in A. argyi. To determine the reason for the lack of artemisinin production in A. argyi, we first examined the ADS gene in A. argyi. The syntenic regions associated with the A. annua ADS gene cluster were determined in the A. argyi genome via collinearity analysis (Figure 5A). However, the corresponding syntenic genes of the A. annua ADS cluster were not present in the syntenic region of the A. argyi homologous chromosomes (chr05 and chr13) (Figure 5A). Interestingly, an ADS gene fragment consisting of only a single exon and a 3′ UTR was found in the syntenic region of chr05, providing evidence for partial deletion of the ancestral ADS gene in the A. argyi genome (Figure 5B and Supplemental Figure 24).

Partial deletion of the ADS gene and functional loss of ADS homologs in *A. argyi*.

**(A)** Intergenomic syntenic blocks of the homologous chromosome pairs (chr05 and chr13) in *A. argyi* and chromosome regions containing the ADS gene cluster in two *A. annua* strains. The ADS fragment (indicated with a red dashed box) was identified only on chr05 but not on chr13.

**(B)** Gene structure of the ADS fragment on AarChr05 and 10 functional identified ADS genes in two *A. annua* strains.

**(C)** Phylogenetic tree of β-caryophyllene synthase (QHS), α-bisabolol synthase (BOS), koidzumiol synthase (KOS), ADS, and TPS genes from *A. absinthium*, *A. kurramensis, A. maritima, A. annua*, and *A. argyi.* The heatmap on the right shows the fragments per kilobase million (FPKM) values of gene expression in the roots, stems, leaves, and flowers of *A. argyi*. The red stars represent genes that were functionally characterized in this study.

**(D)** GC-MS traces illustrate representative compound peaks for the target products of AarTPS58 and AarTPS76 in *A. argyi.*

The homologs of ADS were identified in the A. argyi TPS gene family based on sequence similarity to the A. annua ADS and ADS homologs functionally identified in other Artemisia plants (Artemisia absinthium, Artemisia kurramensis, and Artemisia maritima) (Muangphrom et al., 2016). Among the 12 genes phylogenetically adjacent to the ADS homologs (Figure 5C), only AarTPS3, AarTPS22, AarTPS23, and AarTPS58 were highly expressed in the roots of A. argyi, and the other genes showed negligible expression. AarTPS3, AarTPS58, and AarTPS76 were cloned for a catalytic assay, together with the incomplete open reading frame of AarTPS3. In an Escherichia coli heterologous expression system, only AarTPS76 catalyzed the production of a trace amount of α-elemol from endogenous FPP (Figure 5D). Consequently, partial deletion of the ADS gene and the loss of function of ADS homologs may have led to a lack of artemisinin production in A. argyi.

Discussion

A. argyi, also known as Chinese mugwort, is one of the most widely used Chinese medicinal plants from the genus Artemisia (Liu et al., 2021). We generated a chromosome-scale assembly of the A. argyi genome using long reads produced by the PacBio platform and Hi-C technology (Figure 1D; Supplemental Tables 2–5; Supplemental Figure 4). From the assembly and analyses of the A. argyi genome, we deduced its diploid ancestral genome before the lineage-specific WGD event and inferred the unbiased evolution of subgenomes in units of ACAs (Figure 2). The AarTPS gene family was markedly expanded by WGD and tandem duplication (Figure 3D and 3E; Supplemental Figures 17 and 18), playing an essential role in formation of abundant volatile compounds (Figure 3B). Furthermore, the lack of artemisinin production in A. argyi appeared to arise from the absence of ADS in its genome (Figure 5).

Polyploidy is very common in the Artemisia genus. Two basic chromosome numbers have been detected in Artemisia, with ploidy levels ranging from diploid to dodecaploid for x = 9 and from diploid to hexaploid for x = 8 (Wang, 2004; Pellicer et al., 2007, 2010). Because of a lack of genomic information about the ancestral Artemisia, the origin of polyploidy can be directly determined in very few Artemisia species. Characterization of subgenome evolution contributes to revealing the origin of polyploidy. Subgenome dominance has often been reported in the genomes of allopolyploids (Schnable et al., 2011; Xu et al., 2020; Zhang et al., 2021), whereas autopolyploids tend to undergo unbiased subgenome evolution (Liu et al., 2017; Zhao et al., 2017; Li et al., 2019). From the GB analysis, we deduced that the genome of the diploid ancestor of A. argyi consisted of nine chromosomes, which is consistent with the prevalent chromosome base in Artemisia species (Figure 2D; Supplemental Table 16 and Supplemental Table 17). Taking the ancestral chromosome as a unit, the unbiased evolutionary characteristics of the two A. argyi subgenomes were validated in terms of gene loss, gene expression level, gene mutation rate, and specific TE accumulation. Including the k-mer analysis, all of these results support an autopolyploid origin of the A. argyi genome (Figure 2 and Supplemental Figures 12–15). Similarly, a chromosomal fusion was identified in the assembled genome of A. argyi (Supplemental Figure 16), consistent with the result reported by Miao et al. (2022). This chromosomal fusion was not duplicated in the two sets of subgenomes (Figure 2D), supporting the theory that the fusion event occurred after the WGD event.

WGD events and a high retention rate of extant duplicated gene pairs have contributed to an abundance of duplicated genes in plant genomes (Van de Peer et al., 2009; Tank et al., 2015). Duplicated genes may have several different fates, including silencing of one duplicated copy (nonfunctionalization), divergence leading to new functions (neofunctionalization), or acquisition of different tissue specificities (subfunctionalization) (Lynch and Conery, 2000). The novel functions gained by recently duplicated TPS genes might be correlated with evolved terpenoid diversity (Wang et al., 2021). By analyzing the evolutionary history of the TPS gene family, we found that the genome of A. argyi retained the largest numbers of genes in the TPS-a and TPS-b subfamilies (Figure 3 and Supplemental Figures 17 and 18), possibly contributing to the mass and mixed production of volatile monoterpenes and sesquiterpenes (Figure 3B). WGD and tandem duplication were the main mechanisms of TPS family expansion in A. argyi (Figure 3E and Supplemental Figure 17). In particular, a GDS gene cluster on chr17 consists of two repeated modules derived from tandem duplication (Figure 4A). In this gene cluster, AarTPS114 encodes a functional GDS with a high expression level among the six TPS genes (Figure 4A). In general, expression variation is considered to be an initial step in the functional divergence of duplicated genes, increasing the probability of the existence of duplicate genes in the genome (Li et al., 2005). Differences in the expression profiles of TPS genes in the GDS gene cluster indicate that the catalytic functions of these A. argyi enzymes may differ (Figure 4A). In the two paralogous BDH gene pairs generated by the WGD event (AarBDH4/AarBDH8 and AarBDH5/AarBDH7), only AarBDH4 and AarBDH5 were expressed (Figure 4H and Supplemental Figure 21). AarBDH4 and AarBDH5 are tandem-duplicated genes with different expression patterns and the same catalytic function, suggesting that these genes have undergone subfunctionalization (Figure 4H and Supplemental Figure 23).

Previous studies have shown that gene retention patterns after WGD are not random and are biased toward genes encoding proteins that play key roles in gene networks and signaling cascades (Jiang et al., 2013). Given that single-copy genes are generally more highly expressed than multi-copy genes, single-copy genes exhibit higher sequence conservation across species (De Smet et al., 2013). In the syntenic blocks between homoeologous chromosome pairs (chr06 and chr14), only one single-copy gene (AarTPS34) encoding GAS was located on chr06, with high sequence similarity to GASs identified in other Asteraceae plants (Figure 4C). This GAS specifically catalyzed the conversion of FPP to germacrene A in vivo and in vitro (Figure 4D; Supplemental Figure 19), making it a promising single-product enzyme. Encoding the rate-limiting enzyme for artemisinin biosynthesis, ADS forms a tandemly duplicated gene cluster and affects artemisinin concentration in a copy number–dependent manner in A. annua (Liao et al., 2022). However, only an ADS gene fragment consisting of one exon and a 3′ UTR remains in the A. argyi genome (Figure 5B and Supplemental Figure 24). Two ADS homologous genes (AarTPS76 and AarTPS58) with high expression levels were not involved in artemisinin precursor biosynthesis (Figure 5D), similar to homologous genes in other Artemisia species (Muangphrom et al., 2016). Loss of ADS function may have led to the lack of artemisinin production in A. argyi and may also be related to the species-specific requirements of this medicinal plant.

In summary, our study highlights the evolutionary history of the A. argyi genome and supports the use of A. argyi as an appropriate Artemisia model for the study of lineage-specific WGD events and subsequent subgenome divergence. Our findings improve the current understanding of various types of gene duplication and their important roles in plant secondary metabolite biosynthesis, laying a stable foundation for further improvement in the medicinal quality of A. argyi.

Methods

Plant materials

The sequenced individual of A. argyi was grown in the Beijing Medicinal Plant Garden of the Institute of Medicinal Plant Development (latitude 40°N, longitude 116°E) at the Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China. All of the samples were collected from an individual A. argyi plant whose Institute of Medicinal Plant Development germplasm registration number is 10107436 (http://www.cumplag.cn). Young leaves were taken for genomic DNA extraction and genome sequencing library construction. Fresh samples were collected, immediately frozen in liquid nitrogen, and then used for RNA sequencing (Supplemental Methods), gene cloning, and examination of terpenoid content. Replicates were obtained from separate clonal plants.

Genome sequencing

High-molecular-weight DNA for genome sequencing was extracted from the tender leaves of A. argyi using a modified cetyl-trimethylammonium bromide method (Allen et al., 2006). For Illumina sequencing, paired-end libraries with a 350-bp insert length were constructed and sequenced using the Illumina NovaSeq 6000 platform (Illumina, San Diego, CA, USA) to produce seven short-read sequencing libraries. Three approximately 20-kb SMRTbell libraries were constructed for PacBio sequencing on the PacBio Sequel II platform (Pacific Biosciences of California, Menlo Park, CA, USA). Three Hi-C libraries were constructed by chromatin extraction and digestion followed by DNA ligation, purification, and fragmentation (Belton et al., 2012); libraries were sequenced on the Illumina NovaSeq 6000 platform (Supplemental Methods).

Genome assembly and assessment

Before assembly, clean data were obtained after filtering out low-quality sequences from PacBio raw data, which were then error corrected and further assembled into contigs using Canu software (Koren et al., 2017). The assembled contigs were corrected for three rounds using Illumina paired-end data with Pilon software (Walker et al., 2014). High-quality Hi-C data were used to further assist with chromosome-level genome assembly. The 426.46-Gb clean reads were mapped to the contig assembly using Burrows-Wheeler Aligner software (v.0.7.10-r789) (Li and Durbin, 2009), and library quality was assessed by counting the number of unique valid paired-end reads using HiC-Pro software (v.2.10.0) (Servant et al., 2015). LACHESIS software (Burton et al., 2013) was used to cluster, reorder, and orient the contig-scale genome, and the Hi-C heatmaps were manually checked for misorientation, resulting in a chromosome-level genome. The quality and completeness of the assembled A. argyi genome were evaluated using BUSCO (v.4.0) (Simão et al., 2015), CEGMA (Parra et al., 2007), and LTR_retriever (v.2.9.0) (Ou et al., 2018; Supplemental Methods).

Gene prediction and genome annotation

We integrated de novo prediction, homology searches, and transcript-based assembly to predict protein-coding genes in the genome. The gene prediction method is described in detail in the Supplemental Methods. Genome annotation was performed on the assembled genome, including repetitive sequence annotation, coding gene and functional annotation, and non-coding RNA annotation (Supplemental Methods). We first customized a de novo repeat library for the genome using RepeatModeler2 (v.2.0.1) (Flynn et al., 2020), which automatically combines two programs, RECON (v.1.0.8) (Bao and Eddy, 2002) and RepeatScout (v.1.0.6) (Price et al., 2005). This library was used with the known Repbase (v.19.06) (Jurka et al., 2005), REXdb (v.3.0) (Neumann et al., 2019), and Dfam (v.3.2) (Wheeler et al., 2013) databases to detect repetitive sequences using RepeatMasker (v.4.1.0) (Tarailo-Graovac and Chen, 2009). Identification of high-quality intact LTR-RTs and calculation of insertion ages were carried out using LTR_retriever (v.2.9.0) (Ou and Jiang, 2018) with default parameters. Tandem repeats were annotated by Tandem Repeats Finder (TRF, v.409) (Benson, 1999) and MIcroSAtellite identification tool (MISA, v.2.1) (Beier et al., 2017).

Gene family and phylogenetic analyses

OrthoFinder software (v.2.4) (Emms and Kelly, 2019) was used to identify gene family clusters in 12 plant species (Supplemental Table 15), including A. argyi. Sequences from each orthogroup were used to construct multiple sequence alignments with MAFFT (v.7.490) (Katoh and Standley, 2013), followed by gap region removal using Gblocks software (v.0.91b) (Talavera and Castresana, 2007) (parameters: -b5 = h). IQ-TREE software (v.1.6.12) (Nguyen et al., 2015), together with the ModelFinder package (Kalyaanamoorthy et al., 2017), was used to construct a phylogenetic tree for each single-copy orthogroup with the maximum likelihood method. MCMCTree (v.4.91) implemented in the PAML package (Yang, 1997) was used to estimate the divergence times between A. argyi and 11 other species. Multiple fossil times from the TimeTree database (http://www.timetree.org/) were used for time calibrations. Gene families that had undergone expansion or contraction were identified in the 12 sequenced species using CAFE (v.4.2) (Han et al., 2013).

Gene collinearity and K_s analysis

Genome-wide interspecies collinearity analysis among A. argyi, A. annua, and H. annuus was performed using the JCVI package (https://github.com/tanghaibao/jcvi) (parameters: --minspan = 30). In brief, the A. argyi genome was compared with other plant genomes by pairwise alignment using the LAST program with default parameters (https://gitlab.com/mcfrith/last). The LAST results were subjected to c-score filtering (c-score = 0.99), and the subroutines of JCVI were used to generate collinearity plots. The results of the collinearity analysis were used to identify the WGD event in A. argyi. To estimate the phylogenetic placement of the A. argyi WGD event, the K_s values of orthologous gene pairs (between A. argyi and A. annua, H. annuus, or C. canephora) and paralogous gene pairs (within these genomes) were calculated using KaKs_Calculator (v.2.0) (Wang et al., 2010). The K_s value of A. argyi-A. annua orthologs (with a mean value of 0.0802), together with the previous divergence time point of the two species (5.4 mya), enables calculation of the number of substitutions per synonymous site per year for Artemisia with r = 7.44E−3 (divergence date = K_s/(2r)). The same value was applied to A. argyi WGD events to calculate the ages of the A. argyi WGD (K_s = 0.0486, 3.3 mya).

Reconstruction of the ancestral diploid genome and the subgenome chromosomes of A. argyi

To reconstruct the ancestral diploid genome of A. argyi, an Artemisia-specific GB system was defined according to a previous study (Xu et al., 2020). A. annua is a stable diploid species that recently diverged from A. argyi. The A. annua genome showed clear genomic synteny with the A. argyi genomes based on analysis with the JCVI package. Therefore, we selected A. annua for detailed genomic structure comparisons with A. argyi. We scanned across the nine chromosomes of A. annua to find breakpoints that occurred simultaneously in paralogous fragments of the two A. argyi subgenomes. Finally, a framework of 25 GBs (A–Y) split by 16 breakpoints was constructed for further analysis (Supplemental Table 16). We searched for associations among the GBs present in diploid ancestral chromosomes (Supplemental Table 17). These GB associations were not present in A. annua but were present in two copies in the A. argyi genome. A total of 16 such associations fused the 25 GBs into nine groups. Therefore, the diploid ancestor of A. argyi before polyploidization was determined to have nine chromosomes, and two sets of ancestral genomes were present in the A. argyi genome. We then reconstructed 18 ancestral chromosomes representing the two subgenomes of A. argyi on the basis of two principles: (1) a block inside a chromosome should not have overlapping and redundant fragments, and (2) each block should be rearranged as few times as possible according to the position of the chromosome. We performed subgenome evolution analysis in the units of ACA, including gene retention, K_s, gene expression, and TE specific accumulation analysis.

TPS family identification and phylogenetic and evolutionary analyses

To identify putative TPS genes from A. argyi, A. annua, and H. annuus, we used two Pfam domains (PF03936 and PF01397) to search against the proteomes using HMMER (v.3.0, cutoff at E < 1e−5) (Wheeler and Eddy, 2013). Pseudogenes and sequences with incomplete domains were excluded from further analyses.

A sequence alignment of TPSs from the three species was generated using MUSCLE (v.5.1) (Edgar, 2004), and the outputs were used to guide the DNA alignments with PAL2NAL (v.14) (Suyama et al., 2006); then the DNA alignments were trimmed using trimAl (v.1.4) (Capella-Gutiérrez et al., 2009). The phylogenetic tree was reconstructed by the maximum likelihood method using RAxML (v.8.2.12) (Stamatakis, 2014) with the GTRGAMMA model and 1000 bootstrap replicates.

Subsequently, the evolutionary history of lineage-specific expansion and contraction of the TPSs in the three species was investigated using the method of a previous study (Kim et al., 2006). We identified the nodes representing divergence points between the three species based on two criteria: (1) the bootstrap value was higher than 50%; and (2) the relationship among the three species-specific clades was consistent with the species tree. If these two criteria were confirmed simultaneously, we designated the clades defined by such nodes as orthologous groups derived from an ancestral TPS gene of the three species. However, clades that contained sequences for only one or two of the three species indicated that TPS gene loss occurred during evolution. Thus, more orthologous groups were also identified based on their sister group relationships to the TPS clade.

Enzyme activities of germacrene synthases

For in vitro enzyme activity characterization, the full-length gene sequences of AarTPS34 and AarTPS114 were cloned into the pET28a vector containing a His tag, and the plasmids were transformed into BL21 (DE3) cells. The recombinant His-TPS proteins were induced by 0.5 mM isopropyl β-D-thiogalactoside at 16°C overnight and purified using M5 His-Tagged Protein Purification Kit (Mei5 Biotech, Beijing, China) and elution buffer equivalent to the binding buffer except for an imidazole concentration of 500 mM. Protein concentrations were determined using the BCA Assay Kit (Mei5 Biotech, Beijing, China). The assay for TPS enzyme activity was performed in 1 ml assay buffer (30 mM HEPES, 5 mM DTT, 25 mM MgCl₂) containing 10 μg purified proteins and 10 μg FPP/GPP/geranylgeranyl diphosphate (Sigma-Aldrich, St. Louis, MO, USA) (Shang et al., 2020). The mixture was incubated at 30°C for 1 h and then at 45°C for 15 min before the synthesized volatiles were analyzed by gas chromatography-mass spectrometry (GC-MS) (described in the Supplemental Methods). The β-elemene standard (Sigma-Aldrich, St. Louis, MO, USA) was used as a positive control. The negative protein control was prepared from E. coli harboring the vector pET28a.

Enzyme activities of BPPS and BDH

Gene cloning, protein expression and purification, and GC-MS product detection of BPPS and BDH were performed as outlined in the Supplemental Methods. The BPPS enzyme assay was performed in 300 μl assay buffer containing 30 mM HEPES, 25 mM MgCl₂, 5 mM DTT, 10 μg enzyme, and 10 μg GPP and incubated for 1 h at 30°C. Then 1.5 μl calf intestinal alkaline phosphatase was added, followed by incubation for 2 h at 37°C to allow enzymatic dephosphorylation (Wang et al., 2018). The assay for BDH enzyme activities was performed in 500 μl buffer containing 10 mM NAD⁺, 10 μg enzyme, and 5 μg (+)-borneol as a substrate and incubated for 1 h at 30°C. The (+)-borneol and (+)-camphor produced were detected by GC-MS (Tian et al., 2015). The (+)-borneol (Sigma-Aldrich, St. Louis, MO, USA) and (+)-camphor standards (Shanghai Yuanye Bio-Technology, Shanghai, China) were used as positive controls. The negative protein control was prepared from E. coli harboring the vector pET28a. All primers used in this study are listed in Supplemental Table 19.

Funding

This work was supported by the National Natural Science Foundation of China (81973422 and 31570302) and the Chinese Academy of Medical Sciences (CAMS) Innovation Fund for Medical Sciences (2021-I2M-1-071).

Author contributions

Conceptualization, H.L.; methodology, H.L., H.C., M.G., and S.D.; investigation, H.C., M.G., and S.D.; resources, X.W. and L.H.; formal analysis, H.C., M.G., S.D., G.Z., and Y.J.; writing – original draft, H.C., M.G., and S.D.; writing – review & editing, H.L. and L.L.; supervision, H.L., L.L., and S.C.

Acknowledgments

We thank Dr. Baoshen Liao (Guangzhou University of Chinese Medicine), Dr. Jun Qian (Shanghai Biozeron Biotechnology Co., Ltd.), Prof. Zhichao Xu (Northeast Forestry University), and M.S. Sijie Sun (IMPLAD, Chinese Academy of Medical Sciences and Peking Union Medical College) for advice regarding bioinformatics analysis. No conflict of interest is declared.

Published: January 2, 2023

Footnotes

Published by the Plant Communications Shanghai Editorial Office in association with Cell Press, an imprint of Elsevier Inc., on behalf of CSPB and CEMPS, CAS.

Supplemental information is available at Plant Communications Online.

Contributor Information

Shilin Chen, Email: slchen@icmm.ac.cn.

Li Li, Email: ll37@cornell.edu.

Hongmei Luo, Email: hmluo@implad.ac.cn.

Supplemental information

Document S1. Supplemental Figures S1–S24, Supplemental Tables S1–S19 and Supplemental Methods

mmc1.pdf^{(3.4MB, pdf)}

Document S2. Supplemental Data 1 and 2

mmc2.xlsx^{(18.8KB, xlsx)}

Document S2. Article plus supplemental information

mmc3.pdf^{(10.6MB, pdf)}

Data availability

All raw sequence and genome assembly data in this study have been deposited in the National Genomics Data Center (https://ngdc.cncb.ac.cn) with BioProject accession number PRJCA010808.

References

Allen G.C., Flores-Vergara M.A., Krasynanski S., Kumar S., Thompson W.F. A modified protocol for rapid DNA isolation from plant tissues using cetyltrimethylammonium bromide. Nat. Protoc. 2006;1:2320–2325. doi: 10.1038/nprot.2006.384. [DOI] [PubMed] [Google Scholar]
Aubourg S., Lecharny A., Bohlmann J. Genomic analysis of the terpenoid synthase (AtTPS) gene family of Arabidopsis thaliana. Mol. Genet. Genomics. 2002;267:730–745. doi: 10.1007/s00438-002-0709-y. [DOI] [PubMed] [Google Scholar]
Badouin H., Gouzy J., Grassa C.J., Murat F., Staton S.E., Cottret L., Lelandais-Brière C., Owens G.L., Carrère S., Mayjonade B., et al. The sunflower genome provides insights into oil metabolism, flowering and Asterid evolution. Nature. 2017;546:148–152. doi: 10.1038/nature22380. [DOI] [PubMed] [Google Scholar]
Bai Z., Yao C., Zhu J., Xie Y., Ye X.-Y., Bai R., Xie T. Anti-tumor drug discovery based on natural product β-elemene: anti-tumor mechanisms and structural modification. Molecules. 2021;26:1499. doi: 10.3390/molecules26061499. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bao Z., Eddy S.R. Automated de novo identification of repeat sequence families in sequenced genomes. Genome Res. 2002;12:1269–1276. doi: 10.1101/gr.88502. [DOI] [PMC free article] [PubMed] [Google Scholar]
Beier S., Thiel T., Münch T., Scholz U., Mascher M. MISA-web: a web server for microsatellite prediction. Bioinformatics. 2017;33:2583–2585. doi: 10.1093/bioinformatics/btx198. [DOI] [PMC free article] [PubMed] [Google Scholar]
Belton J.-M., McCord R.P., Gibcus J.H., Naumova N., Zhan Y., Dekker J. Hi-C: a comprehensive technique to capture the conformation of genomes. Methods. 2012;58:268–276. doi: 10.1016/j.ymeth.2012.05.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
Benson G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 1999;27:573–580. doi: 10.1093/nar/27.2.573. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bora K.S., Sharma A. The genus Artemisia: a comprehensive review. Pharm. Biol. 2011;49:101–109. doi: 10.3109/13880209.2010.497815. [DOI] [PubMed] [Google Scholar]
Burton J.N., Adey A., Patwardhan R.P., Qiu R., Kitzman J.O., Shendure J. Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Nat. Biotechnol. 2013;31:1119–1125. doi: 10.1038/nbt.2727. [DOI] [PMC free article] [PubMed] [Google Scholar]
Capella-Gutiérrez S., Silla-Martínez J.M., Gabaldón T. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics. 2009;25:1972–1973. doi: 10.1093/bioinformatics/btp348. [DOI] [PMC free article] [PubMed] [Google Scholar]
Chinese Pharmacopoeia Commission . China Medical Science Press; 2020. Chinese Pharmacopoeia. [Google Scholar]
De Smet R., Adams K.L., Vandepoele K., Van Montagu M.C.E., Maere S., Van de Peer Y. Convergent gene loss following gene and genome duplications creates single-copy families in flowering plants. Proc. Natl. Acad. Sci. USA. 2013;110:2898–2903. doi: 10.1073/pnas.1300127110. [DOI] [PMC free article] [PubMed] [Google Scholar]
Edgar R.C. MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinf. 2004;5:113. doi: 10.1186/1471-2105-5-113. [DOI] [PMC free article] [PubMed] [Google Scholar]
Emms D.M., Kelly S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol. 2019;20:238. doi: 10.1186/s13059-019-1832-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
Schranz M.E., Mohammadin S., Edger P.P. Ancient whole genome duplications, novelty and diversification: the WGD Radiation Lag-Time Model. Curr. Opin. Plant Biol. 2012;15:147–153. doi: 10.1016/j.pbi.2012.03.011. [DOI] [PubMed] [Google Scholar]
Flagel L.E., Wendel J.F. Gene duplication and evolutionary novelty in plants. New Phytol. 2009;183:557–564. doi: 10.1111/j.1469-8137.2009.02923.x. [DOI] [PubMed] [Google Scholar]
Flynn J.M., Hubley R., Goubert C., Rosen J., Clark A.G., Feschotte C., Smit A.F. RepeatModeler2 for automated genomic discovery of transposable element families. Proc. Natl. Acad. Sci. USA. 2020;117:9451–9457. doi: 10.1073/pnas.1921046117. [DOI] [PMC free article] [PubMed] [Google Scholar]
Garsmeur O., Schnable J.C., Almeida A., Jourda C., D’Hont A., Freeling M. Two evolutionarily distinct classes of paleopolyploidy. Mol. Biol. Evol. 2014;31:448–454. doi: 10.1093/molbev/mst230. [DOI] [PubMed] [Google Scholar]
Guan X., Ge D., Li S., Huang K., Liu J., Li F. Chemical composition and antimicrobial activities of Artemisia argyi Lévl. et vant essential oils extracted by simultaneous distillation-extraction, subcritical extraction and hydrodistillation. Molecules. 2019;24:483. doi: 10.3390/molecules24030483. [DOI] [PMC free article] [PubMed] [Google Scholar]
Han M.V., Thomas G.W.C., Lugo-Martinez J., Hahn M.W. Estimating gene gain and loss rates in the presence of error in genome assembly and annotation using CAFE 3. Mol. Biol. Evol. 2013;30:1987–1997. doi: 10.1093/molbev/mst100. [DOI] [PubMed] [Google Scholar]
Inceer H., Hayirlioglu-Ayaz S. Chromosome numbers in the tribe anthemideae (Asteraceae) from north-east anatolia. Bot. J. Linn. Soc. 2007;153:203–211. [Google Scholar]
Ivănescu B., Burlec A.F., Crivoi F., Roșu C., Corciovă A. Secondary metabolites from Artemisia genus as biopesticides and innovative nano-based application strategies. Molecules. 2021;26:3061. doi: 10.3390/molecules26103061. [DOI] [PMC free article] [PubMed] [Google Scholar]
Jiang W.k., Liu Y.l., Xia E.h., Gao L.z. Prevalent role of gene features in determining evolutionary fates of whole-genome duplication duplicated genes in flowering plants. Plant Physiol. 2013;161:1844–1861. doi: 10.1104/pp.112.200147. [DOI] [PMC free article] [PubMed] [Google Scholar]
Jiang Z., Guo X., Zhang K., Sekaran G., Cao B., Zhao Q., Zhang S., Kirby G.M., Zhang X. The essential oils and eucalyptol from Artemisia vulgaris L. prevent acetaminophen-Induced liver injury by activating Nrf2–Keap1 and enhancing APAP clearance through non-toxic metabolic pathway. Front. Pharmacol. 2019;10:782. doi: 10.3389/fphar.2019.00782. [DOI] [PMC free article] [PubMed] [Google Scholar]
Jiang S.-Y., Jin J., Sarojam R., Ramachandran S. A comprehensive survey on the terpene synthase gene family provides new insight into its evolutionary patterns. Genome Biol. Evol. 2019;11:2078–2098. doi: 10.1093/gbe/evz142. [DOI] [PMC free article] [PubMed] [Google Scholar]
Jurka J., Kapitonov V.V., Pavlicek A., Klonowski P., Kohany O., Walichiewicz J. Repbase Update, a database of eukaryotic repetitive elements. Cytogenet. Genome Res. 2005;110:462–467. doi: 10.1159/000084979. [DOI] [PubMed] [Google Scholar]
Kalyaanamoorthy S., Minh B.Q., Wong T.K.F., von Haeseler A., Jermiin L.S. ModelFinder: fast model selection for accurate phylogenetic estimates. Nat. Methods. 2017;14:587–589. doi: 10.1038/nmeth.4285. [DOI] [PMC free article] [PubMed] [Google Scholar]
Katoh K., Standley D.M. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol. 2013;30:772–780. doi: 10.1093/molbev/mst010. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kim J., Shiu S.-H., Thoma S., Li W.-H., Patterson S.E. Patterns of expansion and expression divergence in the plant polygalacturonase gene family. Genome Biol. 2006;7:R87. doi: 10.1186/gb-2006-7-9-r87. [DOI] [PMC free article] [PubMed] [Google Scholar]
Koren S., Walenz B.P., Berlin K., Miller J.R., Bergman N.H., Phillippy A.M. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res. 2017;27:722–736. doi: 10.1101/gr.215087.116. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kshirsagar S.G., Rao R.V. Antiviral and immunomodulation effects of Artemisia. Medicina. 2021;57:217. doi: 10.3390/medicina57030217. [DOI] [PMC free article] [PubMed] [Google Scholar]
Li H., Durbin R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics. 2009;25:1754–1760. doi: 10.1093/bioinformatics/btp324. [DOI] [PMC free article] [PubMed] [Google Scholar]
Li W.-H., Yang J., Gu X. Expression divergence between duplicate genes. Trends Genet. 2005;21:602–607. doi: 10.1016/j.tig.2005.08.006. [DOI] [PubMed] [Google Scholar]
Li Q., Qiao X., Yin H., Zhou Y., Dong H., Qi K., Li L., Zhang S. Unbiased subgenome evolution following a recent whole-genome duplication in pear (Pyrus bretschneideri Rehd.) Hortic. Res. 2019;6:12–34. doi: 10.1038/s41438-018-0110-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
Li J., Hu H., Chen Y., Xie J., Li J., Zeng T., Wang M., Luo J., Zheng R., Jongsma M.A., Wang C. Tissue specificity of (E)-β-farnesene and germacrene D accumulation in pyrethrum flowers. Phytochemistry. 2021;187:112768. doi: 10.1016/j.phytochem.2021.112768. [DOI] [PubMed] [Google Scholar]
Liang Z., Schnable J.C. Functional divergence between subgenomes and gene pairs after whole genome duplications. Mol. Plant. 2018;11:388–397. doi: 10.1016/j.molp.2017.12.010. [DOI] [PubMed] [Google Scholar]
Liao B., Shen X., Xiang L., Guo S., Chen S., Meng Y., Liang Y., Ding D., Bai J., Zhang D., et al. Allele-aware chromosome-level genome assembly of Artemisia annua reveals the correlation between ADS expansion and artemisinin yield. Mol. Plant. 2022;15:1310–1328. doi: 10.1016/j.molp.2022.05.013. [DOI] [PubMed] [Google Scholar]
Liu Y., Wang J., Ge W., Wang Z., Li Y., Yang N., Sun S., Zhang L., Wang X. Two highly similar poplar paleo-subgenomes suggest an autotetraploid ancestor of salicaceae plants. Front. Plant Sci. 2017;8:571. doi: 10.3389/fpls.2017.00571. [DOI] [PMC free article] [PubMed] [Google Scholar]
Liu Y., He Y., Wang F., Xu R., Yang M., Ci Z., Wu Z., Zhang D., Lin J. From longevity grass to contemporary soft gold: explore the chemical constituents, pharmacology, and toxicology of Artemisia argyi H.Lév. & vaniot essential oil. J. Ethnopharmacol. 2021;279:114404. doi: 10.1016/j.jep.2021.114404. [DOI] [PubMed] [Google Scholar]
Lynch M., Conery J.S. The evolutionary fate and consequences of duplicate genes. Science. 2000;290:1151–1155. doi: 10.1126/science.290.5494.1151. [DOI] [PubMed] [Google Scholar]
Miao Y., Luo D., Zhao T., Du H., Liu Z., Xu Z., Guo L., Chen C., Peng S., Li J.X., et al. Genome sequencing reveals chromosome fusion and extensive expansion of genes related to secondary metabolism in Artemisia argyi. Plant Biotechnol. J. 2022;20:1902–1915. doi: 10.1111/pbi.13870. [DOI] [PMC free article] [PubMed] [Google Scholar]
Moore R.C., Purugganan M.D. The evolutionary dynamics of plant duplicate genes. Curr. Opin. Plant Biol. 2005;8:122–128. doi: 10.1016/j.pbi.2004.12.001. [DOI] [PubMed] [Google Scholar]
Muangphrom P., Seki H., Suzuki M., Komori A., Nishiwaki M., Mikawa R., Fukushima E.O., Muranaka T. Functional analysis of amorpha-4, 11-diene synthase (ADS) homologs from non-artemisinin-producing Artemisia species: the discovery of novel koidzumiol and (+)-α-bisabolol synthases. Plant Cell Physiol. 2016;57:1678–1688. doi: 10.1093/pcp/pcw094. [DOI] [PubMed] [Google Scholar]
Neumann P., Novák P., Hoštáková N., Macas J. Systematic survey of plant LTR-retrotransposons elucidates phylogenetic relationships of their polyprotein domains and provides a reference for element classification. Mob. DNA. 2019;10:1. doi: 10.1186/s13100-018-0144-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
Nguyen L.-T., Schmidt H.A., von Haeseler A., Minh B.Q. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 2015;32:268–274. doi: 10.1093/molbev/msu300. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ober D. Seeing double: gene duplication and diversification in plant secondary metabolism. Trends Plant Sci. 2005;10:444–449. doi: 10.1016/j.tplants.2005.07.007. [DOI] [PubMed] [Google Scholar]
Ou S., Jiang N. LTR_retriever: a highly accurate and sensitive program for identification of long terminal repeat retrotransposons. Plant Physiol. 2018;176:1410–1422. doi: 10.1104/pp.17.01310. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ou S., Chen J., Jiang N. Assessing genome assembly quality using the LTR Assembly Index (LAI) Nucleic Acids Res. 2018;46:e126. doi: 10.1093/nar/gky730. [DOI] [PMC free article] [PubMed] [Google Scholar]
Panchy N., Lehti-Shiu M., Shiu S.-H. Evolution of gene duplication in plants. Plant Physiol. 2016;171:2294–2316. doi: 10.1104/pp.16.00523. [DOI] [PMC free article] [PubMed] [Google Scholar]
Parra G., Bradnam K., Korf I. CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics. 2007;23:1061–1067. doi: 10.1093/bioinformatics/btm071. [DOI] [PubMed] [Google Scholar]
Pellicer J., Garcia S., Garnatje T., Hidalgo O., Korobkov A.A., Dariimaa S., Vallès J. Chromosome counts in Asian Artemisia L. (Asteraceae) species: from diploids to the first report of the highest polyploid in the genus. Bot. J. Linn. Soc. 2007;153:301–310. [Google Scholar]
Pellicer J., Garcia S., Canela M.Á., Garnatje T., Korobkov A.A., Twibell J.D., Vallès J. Genome size dynamics in Artemisia L. (Asteraceae): following the track of polyploidy. Plant Biol. 2010;12:820–830. doi: 10.1111/j.1438-8677.2009.00268.x. [DOI] [PubMed] [Google Scholar]
Price A.L., Jones N.C., Pevzner P.A. De novo identification of repeat families in large genomes. Bioinformatics. 2005;21:i351–i358. doi: 10.1093/bioinformatics/bti1018. [DOI] [PubMed] [Google Scholar]
Qiao X., Li Q., Yin H., Qi K., Li L., Wang R., Zhang S., Paterson A.H. Gene duplication and evolution in recurring polyploidization–diploidization cycles in plants. Genome Biol. 2019;20:38. doi: 10.1186/s13059-019-1650-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
Renny-Byfield S., Gong L., Gallagher J.P., Wendel J.F. Persistence of subgenomes in paleopolyploid cotton after 60 My of evolution. Mol. Biol. Evol. 2015;32:1063–1071. doi: 10.1093/molbev/msv001. [DOI] [PubMed] [Google Scholar]
Reyes-Chin-Wo S., Wang Z., Yang X., Kozik A., Arikit S., Song C., Xia L., Froenicke L., Lavelle D.O., Truco M.-J., et al. Genome assembly with in vitro proximity ligation data and whole-genome triplication in lettuce. Nat. Commun. 2017;8:14953–15011. doi: 10.1038/ncomms14953. [DOI] [PMC free article] [PubMed] [Google Scholar]
Rinkel J., Dickschat J.S. Addressing the chemistry of germacrene a by isotope labeling experiments. Org. Lett. 2019;21:2426–2429. doi: 10.1021/acs.orglett.9b00725. [DOI] [PubMed] [Google Scholar]
Schnable J.C., Springer N.M., Freeling M. Differentiation of the maize subgenomes by genome dominance and both ancient and ongoing gene loss. Proc. Natl. Acad. Sci. USA. 2011;108:4069–4074. doi: 10.1073/pnas.1101368108. [DOI] [PMC free article] [PubMed] [Google Scholar]
Servant N., Varoquaux N., Lajoie B.R., Viara E., Chen C.-J., Vert J.-P., Heard E., Dekker J., Barillot E. HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biol. 2015;16:259. doi: 10.1186/s13059-015-0831-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
Shang J., Tian J., Cheng H., Yan Q., Li L., Jamal A., Xu Z., Xiang L., Saski C.A., Jin S., et al. The chromosome-level wintersweet (Chimonanthus praecox) genome provides insights into floral scent biosynthesis and flowering in winter. Genome Biol. 2020;21:200. doi: 10.1186/s13059-020-02088-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
Shin N.-R., Ryu H.-W., Ko J.-W., Park S.-H., Yuk H.-J., Kim H.-J., Kim J.-C., Jeong S.-H., Shin I.-S. Artemisia argyi attenuates airway inflammation in ovalbumin-induced asthmatic animals. J. Ethnopharmacol. 2017;209:108–115. doi: 10.1016/j.jep.2017.07.033. [DOI] [PubMed] [Google Scholar]
Simão F.A., Waterhouse R.M., Ioannidis P., Kriventseva E.V., Zdobnov E.M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015;31:3210–3212. doi: 10.1093/bioinformatics/btv351. [DOI] [PubMed] [Google Scholar]
Sokolova A.S., Kovaleva K.S., Yarovaya O.I., Bormotov N.I., Shishkina L.N., Serova O.A., Sergeev A.A., Agafonov A.P., Maksuytov R.A., Salakhutdinov N.F. (+)-Camphor and (−)-borneol derivatives as potential anti-orthopoxvirus agents. Arch. Pharm. 2021;354:2100038. doi: 10.1002/ardp.202100038. [DOI] [PubMed] [Google Scholar]
Soltis P.S., Soltis D.E. Ancient WGD events as drivers of key innovations in angiosperms. Curr. Opin. Plant Biol. 2016;30:159–165. doi: 10.1016/j.pbi.2016.03.015. [DOI] [PubMed] [Google Scholar]
Song X., Wen X., He J., Zhao H., Li S., Wang M. Phytochemical components and biological activities of Artemisia argyi. J. Funct.Foods. 2019;52:648–662. [Google Scholar]
Stamatakis A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics. 2014;30:1312–1313. doi: 10.1093/bioinformatics/btu033. [DOI] [PMC free article] [PubMed] [Google Scholar]
Suyama M., Torrents D., Bork P. PAL2NAL: robust conversion of protein sequence alignments into the corresponding codon alignments. Nucleic Acids Res. 2006;34:W609–W612. doi: 10.1093/nar/gkl315. [DOI] [PMC free article] [PubMed] [Google Scholar]
Talavera G., Castresana J. Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments. Syst. Biol. 2007;56:564–577. doi: 10.1080/10635150701472164. [DOI] [PubMed] [Google Scholar]
Tank D.C., Eastman J.M., Pennell M.W., et al. Nested radiations and the pulse of angiosperm diversification: increased diversification rates often follow whole genome duplications. New Phytol. 2015;207:454–467. doi: 10.1111/nph.13491. [DOI] [PubMed] [Google Scholar]
Tarailo-Graovac M., Chen N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr. Protoc. Bioinforma. 2009;25:4.10.1–4.10.14. doi: 10.1002/0471250953.bi0410s25. [DOI] [PubMed] [Google Scholar]
Tian N., Tang Y., Xiong S., Tian D., Chen Y., Wu D., Liu Z., Liu S. Molecular cloning and functional identification of a novel borneol dehydrogenase from Artemisia annua L. Ind. Crops Prod. 2015;77:190–195. [Google Scholar]
Van de Peer Y., Maere S., Meyer A. The evolutionary significance of ancient genome duplications. Nat. Rev. Genet. 2009;10:725–732. doi: 10.1038/nrg2600. [DOI] [PubMed] [Google Scholar]
Walker B.J., Abeel T., Shea T., Priest M., Abouelliel A., Sakthikumar S., Cuomo C.A., Zeng Q., Wortman J., Young S.K., Earl A.M. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS One. 2014;9:e112963. doi: 10.1371/journal.pone.0112963. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wang W.M. On the origin and development of Artemisia (Asteraceae) in the geological past. Bot. J. Linn. Soc. 2004;145:331–336. [Google Scholar]
Wang W., Zhang X.k., Wu N., Fu Y.j., Zu Y.g. Antimicrobial activities of essential oil from Artemisiae argyi leaves. J. For. Res. (Harbin). 2006;17:332–334. [Google Scholar]
Wang D., Zhang Y., Zhang Z., Zhu J., Yu J. KaKs_Calculator 2.0: a toolkit incorporating gamma-series methods and sliding window strategies. Dev. Reprod. Biol. 2010;8:77–80. doi: 10.1016/S1672-0229(10)60008-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wang H., Ma D., Yang J., Deng K., Li M., Ji X., Zhong L., Zhao H. An integrative volatile terpenoid profiling and transcriptomics analysis for gene mining and functional characterization of AvBPPS and AvPS involved in the monoterpenoid biosynthesis in Amomum villosum. Front. Plant Sci. 2018;9:846. doi: 10.3389/fpls.2018.00846. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wang X., Gao Y., Wu X., Wen X., Li D., Zhou H., Li Z., Liu B., Wei J., Chen F., et al. High-quality evergreen azalea genome reveals tandem duplication-facilitated low-altitude adaptability and floral scent evolution. Plant Biotechnol. J. 2021;19:2544–2560. doi: 10.1111/pbi.13680. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wheeler T.J., Eddy S.R. nhmmer: DNA homology search with profile HMMs. Bioinformatics. 2013;29:2487–2489. doi: 10.1093/bioinformatics/btt403. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wheeler T.J., Clements J., Eddy S.R., Hubley R., Jones T.A., Jurka J., Smit A.F.A., Finn R.D. Dfam: a database of repetitive DNA based on profile hidden Markov models. Nucleic Acids Res. 2013;41:D70–D82. doi: 10.1093/nar/gks1265. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wu P., Zhang L., Zhang K., Yin Y., Liu A., Zhu Y., Fu Y., Sun F., Zhao S., Feng K., et al. The adaptive evolution of Euryale ferox to the aquatic environment through paleo-hexaploidization. Plant J. 2022;110:627–645. doi: 10.1111/tpj.15717. [DOI] [PMC free article] [PubMed] [Google Scholar]
Xu W., Zhang Q., Yuan W., Xu F., Muhammad Aslam M., Miao R., Li Y., Wang Q., Li X., Zhang X., et al. The genome evolution and low-phosphorus adaptation in white lupin. Nat. Commun. 2020;11:1069. doi: 10.1038/s41467-020-14891-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
Yang Z. PAML: a program package for phylogenetic analysis by maximum likelihood. Comput. Appl. Biosci. 1997;13:555–556. doi: 10.1093/bioinformatics/13.5.555. [DOI] [PubMed] [Google Scholar]
Zhang W.-J., You C.-X., Yang K., Chen R., Wang Y., Wu Y., Geng Z.-F., Chen H.-P., Jiang H.-Y., Su Y., et al. Bioactivity of essential oil of Artemisia argyi Lévl. et Van. and its main compounds against Lasioderma serricorne. J. Oleo Sci. 2014;63:829–837. doi: 10.5650/jos.ess14057. [DOI] [PubMed] [Google Scholar]
Zhang G., Ge C., Xu P., Wang S., Cheng S., Han Y., Wang Y., Zhuang Y., Hou X., Yu T., et al. The reference genome of Miscanthus floridulus illuminates the evolution of Saccharinae. Nat. Plants. 2021;7:608–618. doi: 10.1038/s41477-021-00908-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhao M., Zhang B., Lisch D., Ma J. Patterns and consequences of subgenome differentiation provide insights into the nature of paleopolyploidy in plants. Plant Cell. 2017;29:2974–2994. doi: 10.1105/tpc.17.00595. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Document S1. Supplemental Figures S1–S24, Supplemental Tables S1–S19 and Supplemental Methods

mmc1.pdf^{(3.4MB, pdf)}

Document S2. Supplemental Data 1 and 2

mmc2.xlsx^{(18.8KB, xlsx)}

Document S2. Article plus supplemental information

mmc3.pdf^{(10.6MB, pdf)}

Data Availability Statement

All raw sequence and genome assembly data in this study have been deposited in the National Genomics Data Center (https://ngdc.cncb.ac.cn) with BioProject accession number PRJCA010808.

[bib1] Allen G.C., Flores-Vergara M.A., Krasynanski S., Kumar S., Thompson W.F. A modified protocol for rapid DNA isolation from plant tissues using cetyltrimethylammonium bromide. Nat. Protoc. 2006;1:2320–2325. doi: 10.1038/nprot.2006.384. [DOI] [PubMed] [Google Scholar]

[bib2] Aubourg S., Lecharny A., Bohlmann J. Genomic analysis of the terpenoid synthase (AtTPS) gene family of Arabidopsis thaliana. Mol. Genet. Genomics. 2002;267:730–745. doi: 10.1007/s00438-002-0709-y. [DOI] [PubMed] [Google Scholar]

[bib3] Badouin H., Gouzy J., Grassa C.J., Murat F., Staton S.E., Cottret L., Lelandais-Brière C., Owens G.L., Carrère S., Mayjonade B., et al. The sunflower genome provides insights into oil metabolism, flowering and Asterid evolution. Nature. 2017;546:148–152. doi: 10.1038/nature22380. [DOI] [PubMed] [Google Scholar]

[bib4] Bai Z., Yao C., Zhu J., Xie Y., Ye X.-Y., Bai R., Xie T. Anti-tumor drug discovery based on natural product β-elemene: anti-tumor mechanisms and structural modification. Molecules. 2021;26:1499. doi: 10.3390/molecules26061499. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib5] Bao Z., Eddy S.R. Automated de novo identification of repeat sequence families in sequenced genomes. Genome Res. 2002;12:1269–1276. doi: 10.1101/gr.88502. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib6] Beier S., Thiel T., Münch T., Scholz U., Mascher M. MISA-web: a web server for microsatellite prediction. Bioinformatics. 2017;33:2583–2585. doi: 10.1093/bioinformatics/btx198. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib7] Belton J.-M., McCord R.P., Gibcus J.H., Naumova N., Zhan Y., Dekker J. Hi-C: a comprehensive technique to capture the conformation of genomes. Methods. 2012;58:268–276. doi: 10.1016/j.ymeth.2012.05.001. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib8] Benson G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 1999;27:573–580. doi: 10.1093/nar/27.2.573. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib9] Bora K.S., Sharma A. The genus Artemisia: a comprehensive review. Pharm. Biol. 2011;49:101–109. doi: 10.3109/13880209.2010.497815. [DOI] [PubMed] [Google Scholar]

[bib10] Burton J.N., Adey A., Patwardhan R.P., Qiu R., Kitzman J.O., Shendure J. Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Nat. Biotechnol. 2013;31:1119–1125. doi: 10.1038/nbt.2727. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib11] Capella-Gutiérrez S., Silla-Martínez J.M., Gabaldón T. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics. 2009;25:1972–1973. doi: 10.1093/bioinformatics/btp348. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib12] Chinese Pharmacopoeia Commission . China Medical Science Press; 2020. Chinese Pharmacopoeia. [Google Scholar]

[bib13] De Smet R., Adams K.L., Vandepoele K., Van Montagu M.C.E., Maere S., Van de Peer Y. Convergent gene loss following gene and genome duplications creates single-copy families in flowering plants. Proc. Natl. Acad. Sci. USA. 2013;110:2898–2903. doi: 10.1073/pnas.1300127110. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib14] Edgar R.C. MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinf. 2004;5:113. doi: 10.1186/1471-2105-5-113. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib15] Emms D.M., Kelly S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol. 2019;20:238. doi: 10.1186/s13059-019-1832-y. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib16] Schranz M.E., Mohammadin S., Edger P.P. Ancient whole genome duplications, novelty and diversification: the WGD Radiation Lag-Time Model. Curr. Opin. Plant Biol. 2012;15:147–153. doi: 10.1016/j.pbi.2012.03.011. [DOI] [PubMed] [Google Scholar]

[bib17] Flagel L.E., Wendel J.F. Gene duplication and evolutionary novelty in plants. New Phytol. 2009;183:557–564. doi: 10.1111/j.1469-8137.2009.02923.x. [DOI] [PubMed] [Google Scholar]

[bib18] Flynn J.M., Hubley R., Goubert C., Rosen J., Clark A.G., Feschotte C., Smit A.F. RepeatModeler2 for automated genomic discovery of transposable element families. Proc. Natl. Acad. Sci. USA. 2020;117:9451–9457. doi: 10.1073/pnas.1921046117. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib19] Garsmeur O., Schnable J.C., Almeida A., Jourda C., D’Hont A., Freeling M. Two evolutionarily distinct classes of paleopolyploidy. Mol. Biol. Evol. 2014;31:448–454. doi: 10.1093/molbev/mst230. [DOI] [PubMed] [Google Scholar]

[bib20] Guan X., Ge D., Li S., Huang K., Liu J., Li F. Chemical composition and antimicrobial activities of Artemisia argyi Lévl. et vant essential oils extracted by simultaneous distillation-extraction, subcritical extraction and hydrodistillation. Molecules. 2019;24:483. doi: 10.3390/molecules24030483. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib21] Han M.V., Thomas G.W.C., Lugo-Martinez J., Hahn M.W. Estimating gene gain and loss rates in the presence of error in genome assembly and annotation using CAFE 3. Mol. Biol. Evol. 2013;30:1987–1997. doi: 10.1093/molbev/mst100. [DOI] [PubMed] [Google Scholar]

[bib22] Inceer H., Hayirlioglu-Ayaz S. Chromosome numbers in the tribe anthemideae (Asteraceae) from north-east anatolia. Bot. J. Linn. Soc. 2007;153:203–211. [Google Scholar]

[bib23] Ivănescu B., Burlec A.F., Crivoi F., Roșu C., Corciovă A. Secondary metabolites from Artemisia genus as biopesticides and innovative nano-based application strategies. Molecules. 2021;26:3061. doi: 10.3390/molecules26103061. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib24] Jiang W.k., Liu Y.l., Xia E.h., Gao L.z. Prevalent role of gene features in determining evolutionary fates of whole-genome duplication duplicated genes in flowering plants. Plant Physiol. 2013;161:1844–1861. doi: 10.1104/pp.112.200147. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib25] Jiang Z., Guo X., Zhang K., Sekaran G., Cao B., Zhao Q., Zhang S., Kirby G.M., Zhang X. The essential oils and eucalyptol from Artemisia vulgaris L. prevent acetaminophen-Induced liver injury by activating Nrf2–Keap1 and enhancing APAP clearance through non-toxic metabolic pathway. Front. Pharmacol. 2019;10:782. doi: 10.3389/fphar.2019.00782. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib26] Jiang S.-Y., Jin J., Sarojam R., Ramachandran S. A comprehensive survey on the terpene synthase gene family provides new insight into its evolutionary patterns. Genome Biol. Evol. 2019;11:2078–2098. doi: 10.1093/gbe/evz142. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib27] Jurka J., Kapitonov V.V., Pavlicek A., Klonowski P., Kohany O., Walichiewicz J. Repbase Update, a database of eukaryotic repetitive elements. Cytogenet. Genome Res. 2005;110:462–467. doi: 10.1159/000084979. [DOI] [PubMed] [Google Scholar]

[bib28] Kalyaanamoorthy S., Minh B.Q., Wong T.K.F., von Haeseler A., Jermiin L.S. ModelFinder: fast model selection for accurate phylogenetic estimates. Nat. Methods. 2017;14:587–589. doi: 10.1038/nmeth.4285. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib29] Katoh K., Standley D.M. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol. 2013;30:772–780. doi: 10.1093/molbev/mst010. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib30] Kim J., Shiu S.-H., Thoma S., Li W.-H., Patterson S.E. Patterns of expansion and expression divergence in the plant polygalacturonase gene family. Genome Biol. 2006;7:R87. doi: 10.1186/gb-2006-7-9-r87. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib31] Koren S., Walenz B.P., Berlin K., Miller J.R., Bergman N.H., Phillippy A.M. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res. 2017;27:722–736. doi: 10.1101/gr.215087.116. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib32] Kshirsagar S.G., Rao R.V. Antiviral and immunomodulation effects of Artemisia. Medicina. 2021;57:217. doi: 10.3390/medicina57030217. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib33] Li H., Durbin R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics. 2009;25:1754–1760. doi: 10.1093/bioinformatics/btp324. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib34] Li W.-H., Yang J., Gu X. Expression divergence between duplicate genes. Trends Genet. 2005;21:602–607. doi: 10.1016/j.tig.2005.08.006. [DOI] [PubMed] [Google Scholar]

[bib35] Li Q., Qiao X., Yin H., Zhou Y., Dong H., Qi K., Li L., Zhang S. Unbiased subgenome evolution following a recent whole-genome duplication in pear (Pyrus bretschneideri Rehd.) Hortic. Res. 2019;6:12–34. doi: 10.1038/s41438-018-0110-6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib36] Li J., Hu H., Chen Y., Xie J., Li J., Zeng T., Wang M., Luo J., Zheng R., Jongsma M.A., Wang C. Tissue specificity of (E)-β-farnesene and germacrene D accumulation in pyrethrum flowers. Phytochemistry. 2021;187:112768. doi: 10.1016/j.phytochem.2021.112768. [DOI] [PubMed] [Google Scholar]

[bib37] Liang Z., Schnable J.C. Functional divergence between subgenomes and gene pairs after whole genome duplications. Mol. Plant. 2018;11:388–397. doi: 10.1016/j.molp.2017.12.010. [DOI] [PubMed] [Google Scholar]

[bib38] Liao B., Shen X., Xiang L., Guo S., Chen S., Meng Y., Liang Y., Ding D., Bai J., Zhang D., et al. Allele-aware chromosome-level genome assembly of Artemisia annua reveals the correlation between ADS expansion and artemisinin yield. Mol. Plant. 2022;15:1310–1328. doi: 10.1016/j.molp.2022.05.013. [DOI] [PubMed] [Google Scholar]

[bib39] Liu Y., Wang J., Ge W., Wang Z., Li Y., Yang N., Sun S., Zhang L., Wang X. Two highly similar poplar paleo-subgenomes suggest an autotetraploid ancestor of salicaceae plants. Front. Plant Sci. 2017;8:571. doi: 10.3389/fpls.2017.00571. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib40] Liu Y., He Y., Wang F., Xu R., Yang M., Ci Z., Wu Z., Zhang D., Lin J. From longevity grass to contemporary soft gold: explore the chemical constituents, pharmacology, and toxicology of Artemisia argyi H.Lév. & vaniot essential oil. J. Ethnopharmacol. 2021;279:114404. doi: 10.1016/j.jep.2021.114404. [DOI] [PubMed] [Google Scholar]

[bib41] Lynch M., Conery J.S. The evolutionary fate and consequences of duplicate genes. Science. 2000;290:1151–1155. doi: 10.1126/science.290.5494.1151. [DOI] [PubMed] [Google Scholar]

[bib42] Miao Y., Luo D., Zhao T., Du H., Liu Z., Xu Z., Guo L., Chen C., Peng S., Li J.X., et al. Genome sequencing reveals chromosome fusion and extensive expansion of genes related to secondary metabolism in Artemisia argyi. Plant Biotechnol. J. 2022;20:1902–1915. doi: 10.1111/pbi.13870. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib43] Moore R.C., Purugganan M.D. The evolutionary dynamics of plant duplicate genes. Curr. Opin. Plant Biol. 2005;8:122–128. doi: 10.1016/j.pbi.2004.12.001. [DOI] [PubMed] [Google Scholar]

[bib44] Muangphrom P., Seki H., Suzuki M., Komori A., Nishiwaki M., Mikawa R., Fukushima E.O., Muranaka T. Functional analysis of amorpha-4, 11-diene synthase (ADS) homologs from non-artemisinin-producing Artemisia species: the discovery of novel koidzumiol and (+)-α-bisabolol synthases. Plant Cell Physiol. 2016;57:1678–1688. doi: 10.1093/pcp/pcw094. [DOI] [PubMed] [Google Scholar]

[bib45] Neumann P., Novák P., Hoštáková N., Macas J. Systematic survey of plant LTR-retrotransposons elucidates phylogenetic relationships of their polyprotein domains and provides a reference for element classification. Mob. DNA. 2019;10:1. doi: 10.1186/s13100-018-0144-1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib46] Nguyen L.-T., Schmidt H.A., von Haeseler A., Minh B.Q. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 2015;32:268–274. doi: 10.1093/molbev/msu300. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib47] Ober D. Seeing double: gene duplication and diversification in plant secondary metabolism. Trends Plant Sci. 2005;10:444–449. doi: 10.1016/j.tplants.2005.07.007. [DOI] [PubMed] [Google Scholar]

[bib48] Ou S., Jiang N. LTR_retriever: a highly accurate and sensitive program for identification of long terminal repeat retrotransposons. Plant Physiol. 2018;176:1410–1422. doi: 10.1104/pp.17.01310. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib49] Ou S., Chen J., Jiang N. Assessing genome assembly quality using the LTR Assembly Index (LAI) Nucleic Acids Res. 2018;46:e126. doi: 10.1093/nar/gky730. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib50] Panchy N., Lehti-Shiu M., Shiu S.-H. Evolution of gene duplication in plants. Plant Physiol. 2016;171:2294–2316. doi: 10.1104/pp.16.00523. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib51] Parra G., Bradnam K., Korf I. CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics. 2007;23:1061–1067. doi: 10.1093/bioinformatics/btm071. [DOI] [PubMed] [Google Scholar]

[bib52] Pellicer J., Garcia S., Garnatje T., Hidalgo O., Korobkov A.A., Dariimaa S., Vallès J. Chromosome counts in Asian Artemisia L. (Asteraceae) species: from diploids to the first report of the highest polyploid in the genus. Bot. J. Linn. Soc. 2007;153:301–310. [Google Scholar]

[bib53] Pellicer J., Garcia S., Canela M.Á., Garnatje T., Korobkov A.A., Twibell J.D., Vallès J. Genome size dynamics in Artemisia L. (Asteraceae): following the track of polyploidy. Plant Biol. 2010;12:820–830. doi: 10.1111/j.1438-8677.2009.00268.x. [DOI] [PubMed] [Google Scholar]

[bib54] Price A.L., Jones N.C., Pevzner P.A. De novo identification of repeat families in large genomes. Bioinformatics. 2005;21:i351–i358. doi: 10.1093/bioinformatics/bti1018. [DOI] [PubMed] [Google Scholar]

[bib56] Qiao X., Li Q., Yin H., Qi K., Li L., Wang R., Zhang S., Paterson A.H. Gene duplication and evolution in recurring polyploidization–diploidization cycles in plants. Genome Biol. 2019;20:38. doi: 10.1186/s13059-019-1650-2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib57] Renny-Byfield S., Gong L., Gallagher J.P., Wendel J.F. Persistence of subgenomes in paleopolyploid cotton after 60 My of evolution. Mol. Biol. Evol. 2015;32:1063–1071. doi: 10.1093/molbev/msv001. [DOI] [PubMed] [Google Scholar]

[bib58] Reyes-Chin-Wo S., Wang Z., Yang X., Kozik A., Arikit S., Song C., Xia L., Froenicke L., Lavelle D.O., Truco M.-J., et al. Genome assembly with in vitro proximity ligation data and whole-genome triplication in lettuce. Nat. Commun. 2017;8:14953–15011. doi: 10.1038/ncomms14953. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib59] Rinkel J., Dickschat J.S. Addressing the chemistry of germacrene a by isotope labeling experiments. Org. Lett. 2019;21:2426–2429. doi: 10.1021/acs.orglett.9b00725. [DOI] [PubMed] [Google Scholar]

[bib60] Schnable J.C., Springer N.M., Freeling M. Differentiation of the maize subgenomes by genome dominance and both ancient and ongoing gene loss. Proc. Natl. Acad. Sci. USA. 2011;108:4069–4074. doi: 10.1073/pnas.1101368108. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib61] Servant N., Varoquaux N., Lajoie B.R., Viara E., Chen C.-J., Vert J.-P., Heard E., Dekker J., Barillot E. HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biol. 2015;16:259. doi: 10.1186/s13059-015-0831-x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib62] Shang J., Tian J., Cheng H., Yan Q., Li L., Jamal A., Xu Z., Xiang L., Saski C.A., Jin S., et al. The chromosome-level wintersweet (Chimonanthus praecox) genome provides insights into floral scent biosynthesis and flowering in winter. Genome Biol. 2020;21:200. doi: 10.1186/s13059-020-02088-y. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib63] Shin N.-R., Ryu H.-W., Ko J.-W., Park S.-H., Yuk H.-J., Kim H.-J., Kim J.-C., Jeong S.-H., Shin I.-S. Artemisia argyi attenuates airway inflammation in ovalbumin-induced asthmatic animals. J. Ethnopharmacol. 2017;209:108–115. doi: 10.1016/j.jep.2017.07.033. [DOI] [PubMed] [Google Scholar]

[bib64] Simão F.A., Waterhouse R.M., Ioannidis P., Kriventseva E.V., Zdobnov E.M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015;31:3210–3212. doi: 10.1093/bioinformatics/btv351. [DOI] [PubMed] [Google Scholar]

[bib65] Sokolova A.S., Kovaleva K.S., Yarovaya O.I., Bormotov N.I., Shishkina L.N., Serova O.A., Sergeev A.A., Agafonov A.P., Maksuytov R.A., Salakhutdinov N.F. (+)-Camphor and (−)-borneol derivatives as potential anti-orthopoxvirus agents. Arch. Pharm. 2021;354:2100038. doi: 10.1002/ardp.202100038. [DOI] [PubMed] [Google Scholar]

[bib66] Soltis P.S., Soltis D.E. Ancient WGD events as drivers of key innovations in angiosperms. Curr. Opin. Plant Biol. 2016;30:159–165. doi: 10.1016/j.pbi.2016.03.015. [DOI] [PubMed] [Google Scholar]

[bib67] Song X., Wen X., He J., Zhao H., Li S., Wang M. Phytochemical components and biological activities of Artemisia argyi. J. Funct.Foods. 2019;52:648–662. [Google Scholar]

[bib68] Stamatakis A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics. 2014;30:1312–1313. doi: 10.1093/bioinformatics/btu033. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib69] Suyama M., Torrents D., Bork P. PAL2NAL: robust conversion of protein sequence alignments into the corresponding codon alignments. Nucleic Acids Res. 2006;34:W609–W612. doi: 10.1093/nar/gkl315. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib70] Talavera G., Castresana J. Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments. Syst. Biol. 2007;56:564–577. doi: 10.1080/10635150701472164. [DOI] [PubMed] [Google Scholar]

[bib89] Tank D.C., Eastman J.M., Pennell M.W., et al. Nested radiations and the pulse of angiosperm diversification: increased diversification rates often follow whole genome duplications. New Phytol. 2015;207:454–467. doi: 10.1111/nph.13491. [DOI] [PubMed] [Google Scholar]

[bib71] Tarailo-Graovac M., Chen N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr. Protoc. Bioinforma. 2009;25:4.10.1–4.10.14. doi: 10.1002/0471250953.bi0410s25. [DOI] [PubMed] [Google Scholar]

[bib72] Tian N., Tang Y., Xiong S., Tian D., Chen Y., Wu D., Liu Z., Liu S. Molecular cloning and functional identification of a novel borneol dehydrogenase from Artemisia annua L. Ind. Crops Prod. 2015;77:190–195. [Google Scholar]

[bib73] Van de Peer Y., Maere S., Meyer A. The evolutionary significance of ancient genome duplications. Nat. Rev. Genet. 2009;10:725–732. doi: 10.1038/nrg2600. [DOI] [PubMed] [Google Scholar]

[bib74] Walker B.J., Abeel T., Shea T., Priest M., Abouelliel A., Sakthikumar S., Cuomo C.A., Zeng Q., Wortman J., Young S.K., Earl A.M. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS One. 2014;9:e112963. doi: 10.1371/journal.pone.0112963. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib75] Wang W.M. On the origin and development of Artemisia (Asteraceae) in the geological past. Bot. J. Linn. Soc. 2004;145:331–336. [Google Scholar]

[bib76] Wang W., Zhang X.k., Wu N., Fu Y.j., Zu Y.g. Antimicrobial activities of essential oil from Artemisiae argyi leaves. J. For. Res. (Harbin). 2006;17:332–334. [Google Scholar]

[bib77] Wang D., Zhang Y., Zhang Z., Zhu J., Yu J. KaKs_Calculator 2.0: a toolkit incorporating gamma-series methods and sliding window strategies. Dev. Reprod. Biol. 2010;8:77–80. doi: 10.1016/S1672-0229(10)60008-3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib78] Wang H., Ma D., Yang J., Deng K., Li M., Ji X., Zhong L., Zhao H. An integrative volatile terpenoid profiling and transcriptomics analysis for gene mining and functional characterization of AvBPPS and AvPS involved in the monoterpenoid biosynthesis in Amomum villosum. Front. Plant Sci. 2018;9:846. doi: 10.3389/fpls.2018.00846. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib79] Wang X., Gao Y., Wu X., Wen X., Li D., Zhou H., Li Z., Liu B., Wei J., Chen F., et al. High-quality evergreen azalea genome reveals tandem duplication-facilitated low-altitude adaptability and floral scent evolution. Plant Biotechnol. J. 2021;19:2544–2560. doi: 10.1111/pbi.13680. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib80] Wheeler T.J., Eddy S.R. nhmmer: DNA homology search with profile HMMs. Bioinformatics. 2013;29:2487–2489. doi: 10.1093/bioinformatics/btt403. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib81] Wheeler T.J., Clements J., Eddy S.R., Hubley R., Jones T.A., Jurka J., Smit A.F.A., Finn R.D. Dfam: a database of repetitive DNA based on profile hidden Markov models. Nucleic Acids Res. 2013;41:D70–D82. doi: 10.1093/nar/gks1265. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib82] Wu P., Zhang L., Zhang K., Yin Y., Liu A., Zhu Y., Fu Y., Sun F., Zhao S., Feng K., et al. The adaptive evolution of Euryale ferox to the aquatic environment through paleo-hexaploidization. Plant J. 2022;110:627–645. doi: 10.1111/tpj.15717. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib83] Xu W., Zhang Q., Yuan W., Xu F., Muhammad Aslam M., Miao R., Li Y., Wang Q., Li X., Zhang X., et al. The genome evolution and low-phosphorus adaptation in white lupin. Nat. Commun. 2020;11:1069. doi: 10.1038/s41467-020-14891-z. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib84] Yang Z. PAML: a program package for phylogenetic analysis by maximum likelihood. Comput. Appl. Biosci. 1997;13:555–556. doi: 10.1093/bioinformatics/13.5.555. [DOI] [PubMed] [Google Scholar]

[bib85] Zhang W.-J., You C.-X., Yang K., Chen R., Wang Y., Wu Y., Geng Z.-F., Chen H.-P., Jiang H.-Y., Su Y., et al. Bioactivity of essential oil of Artemisia argyi Lévl. et Van. and its main compounds against Lasioderma serricorne. J. Oleo Sci. 2014;63:829–837. doi: 10.5650/jos.ess14057. [DOI] [PubMed] [Google Scholar]

[bib86] Zhang G., Ge C., Xu P., Wang S., Cheng S., Han Y., Wang Y., Zhuang Y., Hou X., Yu T., et al. The reference genome of Miscanthus floridulus illuminates the evolution of Saccharinae. Nat. Plants. 2021;7:608–618. doi: 10.1038/s41477-021-00908-y. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib87] Zhao M., Zhang B., Lisch D., Ma J. Patterns and consequences of subgenome differentiation provide insights into the nature of paleopolyploidy in plants. Plant Cell. 2017;29:2974–2994. doi: 10.1105/tpc.17.00595. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

A chromosome-scale genome assembly of Artemisia argyi reveals unbiased subgenome evolution and key contributions of gene duplication to volatile terpenoid diversity

Hongyu Chen

Miaoxian Guo

Shuting Dong

Xinling Wu

Guobin Zhang

Liu He

Yuannian Jiao

Shilin Chen

Li Li

Hongmei Luo

Abstract

Introduction

Figure 1.

Results

Genome sequencing, assembly, and annotation of the A. argyi genome

The phylogenetic placement of A. argyi

Figure 2.

A recent lineage-specific WGD event of A. argyi

Deciphering the ancestral diploid genome and unbiased subgenome evolution in A. argyi

Contribution of gene duplication and terpene synthase (TPS) family expansion to volatile terpenoid diversity

Figure 3.

Characterization of key genes for biosynthesis of important volatile terpenoids

Figure 4.

Absence of ADS in the A. argyi genome

Figure 5.

Discussion

Methods

Plant materials

Genome sequencing

Genome assembly and assessment

Gene prediction and genome annotation

Gene family and phylogenetic analyses

Gene collinearity and Ks analysis

Reconstruction of the ancestral diploid genome and the subgenome chromosomes of A. argyi

TPS family identification and phylogenetic and evolutionary analyses

Enzyme activities of germacrene synthases

Enzyme activities of BPPS and BDH

Funding

Author contributions

Acknowledgments

Footnotes

Contributor Information

Supplemental information

Data availability

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

Gene collinearity and K_s analysis