Abstract
Trees in the genus Aquilaria (Thymelaeaceae) are known as lign aloes, and are native to the forests of southeast Asia. Lign aloes produce agarwood as an antimicrobial defence. Agarwood has a long history of cultural and medicinal use, and is of considerable commercial value. However, due to habitat destruction and over collection, lign aloes are threatened in the wild. We present a chromosomal‐level assembly for Aquilaria sinensis, a lign aloe endemic to China known as the incense tree, based on Illumina short‐read, 10X Genomics linked‐read, and Hi‐C sequencing data. Our 783.8 Mbp A. sinensis genome assembly is of high physical contiguity, with a scaffold N50 of 87.6 Mbp, and high completeness, with a 95.8% BUSCO score for eudicotyledon genes. We include 17 transcriptomes from various plant tissues, providing a total of 35,965 gene models. We reveal the first complete set of genes involved in sesquiterpenoid production, plant defence, and agarwood production for the genus Aquilaria, including genes involved in the biosynthesis of sesquiterpenoids via the mevalonic acid (MVA), 1‐deoxy‐D‐xylulose‐5‐phosphate (DXP), and methylerythritol phosphate (MEP) pathways. We perform a detailed repeat content analysis, revealing that transposable elements account for ~61% of the genome, with major contributions from gypsy‐like and copia‐like LTR retroelements. We also provide a comparative analysis of repeat content across sequenced species in the order Malvales. Our study reveals the first chromosomal‐level genome assembly for a tree in the genus Aquilaria and provides an unprecedented opportunity to address a variety of applied, genomic and evolutionary questions in the Thymelaeaceae more widely.
Keywords: agarwood, aquilaria, CITES, genome, thymelaeaceae, transcriptome
1. INTRODUCTION
The genus Aquilaria (family Thymelaeaceae) contains fifteen tree species, commonly known as “lign aloes”, native to the forests of southeast Asia. A special feature of lign aloes is their production of agarwood, which is also known as aloeswood or gharuwood. Agarwood is produced as an antimicrobial defence mechanism, after infection of the tree with a fungal pathogen, and involves the saturation of infected heartwood with a dark aromatic resin.
The main active compounds present in agarwood are terpenoids, specifically sesquiterpenes and derivatives of flindersiachromone (Chen et al., 2012), and the composition of oil extracted from agarwood is exceedingly complex, including over 150 chemical compounds (Naef, 2011). As a consequence of the unique fragrant properties of agarwood, it has long been traded as a highly prized cultural, religious, and medical commodity. For example, the use of agarwood as a fragrant product is recorded in Sanskrit Vedas dating to 1,400 B.C., while the Greek physician Pedanius Dioscorides recorded medical uses for agarwood in his De Materia Medica from 65 A.D., and agarwood is also highly revered as an icense in Islamic, Buddhist and Hindu ceremonies (Lopez‐Sampson & Page, 2018).
Today, the demand for agarwood remains great, not least in the form of oud oil which is distilled from agarwood for perfumery, and high grade agarwood products can reach prices as high as US$100,000/kg, with global trade estimated at US$6–US$8 billion (Akter, Islam, Zulkefeli, & Khan, 2013). A driver of the expense of agarwood products is the depletion of wild lign aloes, which has led to their inclusion on Appendix II of the Convention on International Trade in Endangered Species of Wild Fauna and Flora (CITES), in an attempt to control and monitor international trade and help limit impact, while species in the genus have been categorized as “vulnerable” and “critically endangered” by the IUCN Red List of Threatened Species (Harvey‐Brown, 2018; Newton & Soehartono, 2001).
The incense tree, Aquilaria sinensis (Lour.) Spreng, is a lign aloe endemic to the south eastern part of China (530,906 km2, Harvey‐Brown, 2018), including Hainan Province, Fujian, Guanxi, Guangdong Provinces, and Hong Kong. The tree is relatively slow growing, taking 50–100 years to reach its maximum height of ~15 m (CITES, 2015). Owing to the overexploitation of natural populations, A. sinensis is currently included on the “List of Wild Plant Under State Protection (Category II)” in mainland China (The State Council, People's Republic of China 1999), while it is protected under different ordinances in Hong Kong (Agriculture, Fisheries, & Conservation Department, 2018). In response to the conservation challenges facing A. sinensis, plantations have been initiated, including >10,000 hectares of land in south eastern China from 2006 to 2013 (Yin, Jiao, Dong, Jiang, & Zhang, 2016), and annual planting of 10,000 seedlings in the country park and other suitable habitats in Hong Kong (Agriculture, Fisheries, & Conservation Department, 2018).
In addition to its ecological and scientific importance, A. sinensis holds particular cultural significance in Hong Kong, as it is commonly believed that the species provided the region's name, which translates from Chinese as “Incense Harbour” or “Fragrant Harbour”. Meanwhile, the trading of agarwood or Chen Xiang/Cham Heong (translated from Mandarin/Cantonese) was an important industry since the Sung Dynasty (610–970) (Agriculture, Fisheries, & Conservation Department, 2013; Iu, 1983). Although the cultivation of A. sinensis for the agarwood industry in Hong Kong ceased during in the last century, remaining populations continue to persist in the countryside of Hong Kong, including lowland and broad‐leaved forests (Yip & Lai, 2004).
Despite the great scientific and cultural importance of A. sinensis, a high‐quality genome is lacking, hindering further understanding of the species, and scientifically driven conservation measures. More widely, genomic resources for the genus Aquilaria are poor, with only a chloroplast genome of A. sinesis and a draft genome of A. agallocha available currently (Chen et al., 2014; Wang et al., 2016). To address this issue, here we provide a high quality chromosomal‐level genome assembly for A. sinesis together with a large number of accompanying transcriptomes from diverse plant tissues.
2. MATERIALS AND METHODS
2.1. Sample collection and genome sequencing
Tissue samples used for genome sequencing (mature leaf) and transcriptome sequencing (young and mature leaves, young shoot, flower buds, flower, fruit, seed, aril, intact and wounded stem) were collected from incense tree individuals on the campus of The Chinese University of Hong Kong during the flowering and fruiting period in June 2019. During sample collection, healthy leaves without visible symptoms of fungal infection (e.g., leaf spot, rust or wilt) were collected for DNA extraction. Both surfaces of the leaves were cleaned and rinsed with double‐distilled water prior to DNA extraction to minimise the potential contamination. Genomic DNA (gDNA) was extracted from A. sinensis using a Dneasy Plant Mini Kit (Qiagen) following the manufacturer's protocol. Extracted gDNA were subjected to quality control using Nanodrop spectrophotometer (Thermo Scientific) and gel electrophoresis. High molecular DNA was extracted with CTAB method and had a mean molecular weight of 85 kbp. Qualifying samples were sent to Novogene, and Dovetail Genomics for library preparation and sequencing. The resulting library was sequenced on Illumina HiSeq X platform to produce 2 × 150 paired‐end sequences. The length‐weighted mean molecule length is 23,590.15 bases, and the raw data can be found at NCBI’s Sequence Read Archive (SRR10737433). Details of the sequencing data can be found in Table S1.
2.2. Chicago library preparation and sequencing
A Chicago library was prepared as described previously (Putnam et al., 2016). Briefly, ~500 ng of HMW gDNA (mean fragment length = 85 kbp) was reconstituted into chromatin in vitro and fixed with formaldehyde. Fixed chromatin was digested with DpnII, the 5′ overhangs filled in with biotinylated nucleotides, and free blunt ends were ligated. After ligation, crosslinks were reversed and the DNA purified from protein. Purified DNA was treated to remove biotin that was not internal to ligated fragments. The DNA was then sheared to ~350 bp mean fragment size and sequencing libraries were generated using NEBNext Ultra enzymes and Illumina‐compatible adapters. Biotin‐containing fragments were isolated using streptavidin beads before PCR enrichment of each library. The libraries were sequenced on an Illumina HiSeq X to produce 211 million 2 × 150 bp paired end reads, which provided 146.61 × physical coverage of the genome (1–100 kb pairs).
2.3. Dovetail HiC library preparation and sequencing
A Dovetail HiC library was prepared in a similar manner to that described previously (Lieberman‐Aiden et al., 2009). Briefly, for each library, chromatin was fixed in place with formaldehyde in the nucleus and then extracted fixed chromatin was digested with the restriction enzyme DpnII, the 5′ overhangs were filled in with biotinylated nucleotides, and free blunt ends were ligated. After ligation, crosslinks were reversed and the DNA purified from protein. Purified DNA was treated to remove biotin that was not internal to ligated fragments. The DNA was sheared to ~350 bp mean fragment size and sequencing libraries were generated using NEBNext Ultra enzymes and Illumina‐compatible adapters. Biotin‐containing fragments were isolated using streptavidin beads before PCR enrichment of each library. The libraries were sequenced on an Illumina HiSeq X sequencer to produce 193 million 2 × 150 bp paired‐end reads, which provided 66,743.81 × physical coverage of the genome (10–10,000 kb pairs).
2.4. Transcriptome sequencing
Transcriptomes of multiple developmental stages were sequenced at Novogene (Figure 2, Table S2). Total RNA from different tissues were isolated using a combination method of cetyltrimethylammonium bromide (CTAB) pretreatment (Jordon‐Thaden, Chanderbali, Gitzendanner, & Soltis, 2015) and mirVana miRNA Isolation Kit (Ambion) following the manufacturer's instructions. The extracted total RNA was subjected to quality control using a Nanodrop spectrophotometer (Thermo Scientific), gel electrophoresis, and an Agilent 2100 Bioanalyser (Agilent RNA 6000 Nano Kit). Qualifying samples underwent library construction and sequencing at Novogene; polyA‐selected RNA‐Sequencing libraries were prepared using TruSeq RNA Sample Prep Kit v2. Insert sizes and library concentrations of final libraries were determined using an Agilent 2100 bioanalyser instrument (Agilent DNA 1000 Reagents) and real‐time quantitative PCR (TaqMan Probe), respectively. Details of the sequencing data can be found in Table S1.
FIGURE 2.

Photographs of tissues from different developmental stages used for transcriptome sequencing. (a) Young and mature leaves and a young shoot used for transcriptome sequencing. Tissues used for RNA extraction are indicated in red. (b) Floral tissues used for transcriptome sequencing. Flower, flower bud and fertilized flower are indicated in yellow, green and brown respectively. (c) Fruit tissues used for transcriptome sequencing. Fruit arils, seeds and flesh (pericarp and mesocarp) are indicated in blue, pink and purple. (d) Intact and wounded stem tissues used for transcriptome sequencing. Photographs showing before and after the extraction of stem tissues (approximately 5 cm × 1 cm, indicated in red) from intact (day 0) to wounded (day 5 and 6) conditions [Colour figure can be viewed at wileyonlinelibrary.com]
2.5. Genome assembly
Linked‐Read data were assembled using the supernova assembler (v2.1.1, Marks et al., 2019; Zheng et al., 2016), using the default recommended settings (https://support.10xgenomics.com/de‐novo‐assembly/software/overview/latest/performance) to produce the a pseudohaplotype assembly outputs (‐‐style = pseudohap). The Supernova output pseudohaplotype assembly, shotgun reads, Chicago library reads, and Dovetail HiC library reads were used as input data for HiRise, a software pipeline designed specifically for scaffolding genome assemblies using proximity ligation data (Putnam et al., 2016). An iterative analysis was conducted as follows. First, Shotgun and Chicago library sequences were aligned to the draft input assembly using a modified SNAP read mapper (http://snap.cs.berkeley.edu). The separations of Chicago read pairs mapped within draft scaffolds were analysed by HiRise to produce a likelihood model for genomic distance between read pairs, and the model was used to identify and break putative misjoins, to score prospective joins, and make joins above a threshold. After aligning and scaffolding Chicago data, Dovetail HiC library sequences were aligned and scaffolded following the same method. After scaffolding, shotgun sequences were used to close gaps between contigs.
2.6. Gene model prediction
Raw sequencing reads from 17 transcriptomes were pre‐processed with quality trimmed by trimmomatic (v0.33 with parameters "ILLUMINACLIP:TruSeq3‐PE.fa:2:30:10 SLIDINGWINDOW:4:5 LEADING:5 TRAILING:5 MINLEN:25", Bolger, Lohse, & Usadel, 2014). For the nuclear genomes, the genome sequences were cleaned and masked by funannotate (v1.6.0, https://github.com/nextgenusfs/funannotate) (Palmer & Stajich, 2017), the softmasked assembly were used to run “funannotate train” with parameters “ ‐‐stranded RF ‐‐max_intronlen 350,000” to align RNA‐seq data, ran Trinity, and then ran PASA (Haas et al., 2008). The PASA gene models were used to train Augustus in “funannotate predict” step following recommended options for eukaryotic genomes (https://funannotate.readthedocs.io/en/latest/tutorials.html#non‐fungal‐genomes‐higher‐eukaryotes). Briefly, the gene models were predicted by funannotate predict using the following parameters: “‐‐repeats2evm ‐‐protein_evidence uniprot_sprot.fasta ‐‐genemark_mode ET ‐‐busco_seed_species embryophyta ‐‐optimize_augustus ‐‐busco_db embryophyta ‐‐organism other ‐‐max_intronlen 350,000”. The gene models originated from several prediction sources, including: GeneMark (Lomsadze, Ter‐Hovhannisyan, Chernoff, & Borodovsky, 2005): 76,877, HiQ: 12,640, pasa: 26,175, Augustus (Stanke et al., 2006): 18,455, GlimmerHMM (Majoros, Pertea, Antonescu, & Salzberg, 2003): 189,664, snap (Korf, 2004): 120,842, total: 444,653. Gene models were passed to Evidence Modeler (Haas et al., 2008) (EVM Weights: [GeneMark: 1, HiQ: 2, pasa: 6, proteins: 1, Augustus: 1, GlimmerHMM: 1, snap: 1, transcripts: 1]) to generate the final annotation files, and PASA (Haas et al., 2008) was used to update the EVM consensus predictions, add UTR annotations and models for alternatively spliced isoforms.
2.7. Repetitive elements annotation
Repetitive elements were identified using an in‐house pipeline. Firstly, elements were identified using repeatmasker version 4.0.8 (Smit, Hubley, & Green, 2013) with the eukarya RepBase repeat library (Jurka et al., 2005). Low‐complexity repeats were not masked and the sensitive search parameter was specified. Following this, a de novo repeat library was constructed using repeatmodeler version 1.0.11 (Smit & Hubley, 2015), including recon version 1.08 (Bao & Eddy, 2002) and repeatscout version 1.0.5 (Price, Jones, & Pevzner, 2005). Novel repeats identified by repeatmodeler were analysed using a blast, extract, extend process to improve the characterization of elements along their entire length (Platt, Blanco‐Berdugo, & Ray, 2016). Consensus sequences were generated for each family, along with classification information. The resulting de novo repeat library was utilised to identify repetitive elements using repeatmasker. Longer repeats were constructed using repeatcraft (Wong & Simakov, 2018) using ltr_finder version 1.0.5 (Xu & Wang, 2007) to defragment repeat segments. At loci where RepeatMasker annotations overlapped (i.e., where the same sequence was annotated as different repeat families), only the longest repeat was kept. This conservative approach helps avoid TE content estimates being inflated by counting the same bases multiple times and ensures a one‐to‐one matching of sequence with repeat family identity. A revised summary table was constructed with the final repeat counts. All plots were generated using rstudio version 1.2.1335 (Racine, 2013) with r version 3.5.1 (R Core Team, 2013) and ggplot2 version 3.2.1 (Wickham, 2016).
2.8. Gene family annotation and gene tree building
Potential gene family sequences involved in the mevalonic pathway, methylerythritol phosphate pathway, and jasmonic acid biosynthesis pathway were first identified by homology searching in the KEGG database, and were retrieved from the genome using tblastn. Identity of each putatively identified gene was then tested by comparison to sequences in the NCBI nr database using blastx. For phylogenetic analyses of gene families, DNA sequences were translated into amino acid sequences and aligned to other members of the gene family; gapped sites were removed from alignments using MEGA and phylogenetic trees were constructed using MEGA (Kumar, Stecher, Li, Knyaz, & Tamura, 2018).
3. RESULTS
3.1. High quality Aquilaria genome
Genomic DNA was extracted from single individuals of Aquilaria sinensis (Figure 1a), and sequenced using the Illumina short‐read and 10X Genomics linked‐read sequencing platforms (Table S1). Hi‐C libraries were also constructed and sequenced on the Illumina platform. The genome sequences were first assembled with short‐reads followed by scaffolding with Hi‐C data. The genome assembly is 783.8 Mbp with a scaffold N50 of 87.6 Mbp (Figure 1b). This high physical contiguity is matched by high completeness, with a 95.8% complete BUSCO score for eudicotyledon genes (version odb10) (Figure 1b). A total of 35,965 gene models using 17 transcriptomes from tissues collected from different developmental stages generated in this study were incorporated (Figure 2, Table S1), with mean exon length being 304 bp, mean intron length being 518 bp, and mean deduced protein length being 338 aa. The majority of the sequences assembled (~91%) were contained on eight pseudomolecules (Figure 1b, Table S2), representing the first near chromosomal‐level genome generated for species in the genus Aquilaria. This high‐quality Aquilaria genome provides an unprecedented opportunity to address the issue of a variety of applied, genomic and evolutionary questions in the Thymelaeaceae more widely.
FIGURE 1.

(a) Photograph of an incense tree Aquilaria sinensis (Lour.) Spreng taken in Hong Kong; (b) Aquilaria sinensis Hi‐C information (summary of genome assembly statistics). The x‐ and y‐axes provide the mapping positions for the first and second reads in each read pair respectively, grouped into bins. The colour of each square indicates the number of read pairs within that bin. White vertical and black horizontal lines have been added to indicate the borders between scaffolds. Scaffolds less than 1 Mb are excluded [Colour figure can be viewed at wileyonlinelibrary.com]
3.2. Transposable elements
Transposable elements (TEs) make up a proportion of most eukaryote genomes, but the contribution of TEs to genome size varies greatly across lineages. In plants, TEs may dominate the host genome, for example comprising 85% of the maize genome (Schnable et al., 2009) and potentially as much as 90% of the crown imperial (Fritillaria imperialis) genome (Ambrožová et al., 2010). At the other end of the spectrum, TEs may account for a very small proportion of the genome, such as in the carnivorous bladderwort, where TEs represent just 3% of total genomic DNA (Ibarra‐Laclette et al., 2013). However, in most plant genomes TE content lies somewhere between these extremes.
We estimated that the repeat content of the incense tree accounts for more than half of its total genome size (61.22%) (Figure 3a, Table S3), demonstrating that TE dynamics have played an important role in shaping the genome evolution of this species. Of the repeat types present in the incense tree genome, very few were annotated as small RNA, satellite, simple or low complexity repeats (~0.9% of the total genome), with the majority of the genome (~61%) consisting of TEs (i.e., SINEs, LINEs, LTR retrotransposons, DNA transposons)(Figure 3a, Table S3). By far the main contribution of TEs to the incense tree genome is from LTR elements, which account for over one third of the total assembly (36.47%, Figure 3a, Table S3). DNA transposons represent 6.4% of the incense tree genome, and there is just a small contribution from LINEs (2.1% of the genome), and very few SINEs present in the genome (0.01%) (Figure 3a, Table S3).
FIGURE 3.

(a) Comparison of repeat content among Malvales genomes. The phylogeny indicates evolutionary relationships among species in the Malvales with available genome assemblies. Pie charts illustrate the proportions of different repeat types in each genome, as indicated by the colours presented in the key. The incense tree photograph is provided under a GNU Free Documentation License to Wikipedia user Chong Fat (https://en.wikipedia.org/wiki/File:HK_Aquilaria_sinensis_Fruits.JPG), the durian photograph is provided under a GNU Free Documentation License to Wikipedia user مانفی (https://en.wikipedia.org/wiki/File:Durian_in_black.jpg), the cacao photograph is provided under a GNU Free Documentation License to Wikipedia user Adolfoovalles (https://en.wikipedia.org/wiki/File:Matadecacao.jpg), the red silk cotton tree photograph is provided under a GNU Free Documentation License to Wikipedia user Lonely Explorer (https://en.wikipedia.org/wiki/File:Bombax_ceiba_‐_Cotton_tree_‐_Shimul_Flower_(2).jpg), and the diploid cotton photograph was taken from the USDA National Cotton Germplasm Collection (NCGC) (https://www.cottongen.org/organism/Gossypium/raimondii). For the cacao genome, “other”, “simple” and “unclassified” repeat types were not reported, reflecting the differences in relation to the other genomes. (b) Transposable element (TE) accumulation history in the incense tree genome, based on a Kimura distance‐based copy divergence analysis of TEs, with Kimura substitution level (CpG adjusted) illustrated on the x‐axis, and the percentage of the genome represented by each repeat type on the y‐axis. Repeat type is indicated by the colour chart below the x‐axis [Colour figure can be viewed at wileyonlinelibrary.com]
In Figure 3a, we summarise recently reported TE analyses for sequenced members of the order, specifically: diploid cotton (Wang et al., 2012), red silk cotton tree (Gao et al., 2018), durian (The et al., 2017), and cacao (Matina 1–6) (Motamayor et al., 2013). This provides a comparative overview of TE content in species closely related to the incense tree. Differences in the sequencing strategies employed to generate available Malvales genomes, accompanying variation in assembly quality, and differences in the approaches applied to generate TE annotations all complicate comparative analysis of TEs among members of the Malvales. However, it is clear that the incense tree shows the same general trends as other species in the order (Figure 3a). In summary, TEs make up a large proportion of total genomic content across the Malvales, with LTR retrotransposons representing by far the greatest contribution to repeat content (32%–48%, Table S4). As with other members of the Malvales, gypsy‐like and copia‐like LTR elements were the most abundant superfamilies among the LTR elements identified in the incense tree genome (Motamayor et al., 2013; The et al., 2017) (Figure 3a, Table S4). DNA transposons make up the next largest contribution to Malvales genomes, but they represent a considerably lower proportion of the genome (1%–9%) (Figure 3a, Table S4). Among DNA transposons, MULE elements were the most abundant superfamily, followed by En/Spm elements from the CMC superfamily (Table S4). LINEs make up a very small proportion of total TE content (0.17%–2.7%), with SINEs almost completely absent from Malvales genomes (0.09% in G. raimondii, 0.01% in A. sinensis) (Figure 3a). Overall, the observed patterns demonstrate that TEs represent a major force for shaping genome size among members of the Malvales, with LTR elements being by far the most important repeat class.
A Kimura distance‐based copy divergence analysis for the incense tree revealed that the most frequent TE sequence divergence relative to TE consensus sequence was 1%–10%, mainly due to increased distances among LTR elements (and unclassified elements), suggesting a relatively recent burst of sustained activity, followed by a gradual decline (Figure 3b).
3.3. Genes involved in plant defence and agarwood production
Formation of agarwood is usually associated with fungal infection or physical wounding, where resin composed of mixtures of sesquiterpenes and 2‐(2‐phenylethyl)chromones (PECs) are secreted by the tree as a defence mechanism (Chen et al., 2012; Naef, 2011). Over time, the accumulation of volatile compounds and sesquiterpenoids lead to the formation of fragrant agarwood (Fazila & Halim, 2012; Hashim, Ismail, & Abbas, 2014).
In the A. sinensis genome, genes involved in the biosynthesis of sesquiterpenoids from isoprenoid precursors using the mevalonic acid (MVA) pathway in the cytosol and the 1‐deoxy‐D‐xylulose‐5‐phosphate (DXP) or methylerythritol phosphate (MEP) pathway in the plastid were all found to be present (Figure 4, Table S5). Further, these two pathways biosynthesise C5 homoallylic isoprenoid precursors including isopentenyl pyrophosphate (IPP) and dimethylallyl pyrophosphate (DMAPP), and the genes encoding enzymes for production of IPP and DMAPP could all be identified (Figure 4, Table S5). These C5 isoprene units are later converted to C10 geranyl pyrophosphate (GPP) and C15 farnesyl pyrophosphate (FPP) in the presence of the key‐limiting enzymes GPP synthase and FPP synthase (Figure 4, Table S5). In the final stage of sesquiterpenes production, the necessary sesquiterpene synthase enzymes (SesTPS) can also be identified in 74 loci (Figure 4 Tables S5 and S6). Further, the jasmonic acid (JA) signalling pathway has been reported to be involved in plant defence and it plays a significant role in the wound‐induced signalling mechanism in regulating SesTPS expression (Tan, Isa, Ismail, & Zainal, 2019; Xu et al., 2017), and genes involved in its biosynthesis pathway could all be identified (Figure 4, Table S5).
FIGURE 4.

Genes involved in plant defence and agarwood production in Aquilaria sinensis, including the mevalonic acid (MVA) pathway, 1‐deoxy‐D‐xylulose‐5‐phosphate (DXP) or methylerythritol phosphate (MEP) pathway and jasmonic acid (JA) biosynthesis pathway. Green ticks represent genes that can be identified in the genome, and brown ticks depict genes that can also be identified in the wounded stem transcriptome data. Abbreviations: (MVA pathway) ACC, Acetyl‐CoA; HMG‐CoA, hydroxymethylglutaryl‐CoA; MVAP, phosphomevalonate; MVAPP, diphosphomevalonate; (DXP/MEP pathway) CDP‐ME, 4‐diphosphocytidyl‐2‐C‐methyl‐D‐erythritol; MEcPP, 2‐C‐Methyl‐D‐erythritol‐2,4‐cyclopyrophosphate; HMBPP, (E)‐4‐Hydroxy‐3‐methyl‐but‐2‐enyl pyrophosphate; (JA biosynthesis pathway) PLA2G, secretory phospholipase A2; LOX2S, lipoxygenase; 13‐HOPT, 13(S)‐hydroperoxy‐octadecatrienoic acid; AOS, allene oxide synthase; 12,13‐EOT, 12,13(S)‐epoxy‐octadecatrienoic acid; AOC, allene oxide cyclase; cis‐(+)‐OPDA, cis‐(+)‐12‐oxo‐phytodienoic acid; OPR, 12‐oxophytodienoic acid reductase; OPC‐8:0, 3‐oxo‐2‐(2′(Z)‐pentenyl)‐cyclopentane‐1‐octanoic acid; OPCL1, OPC‐8:0 CoA ligase 1; ACX, acyl‐CoA oxidase; MFP, multifunctional protein; KAT, 3‐ketoacyl‐CoA thiolase [Colour figure can be viewed at wileyonlinelibrary.com]
Interestingly, we have also found that genes from the all of the aforementioned pathways, including the MVA pathway, DXP or MEP pathway, and JA biosynthesis pathway, are all expressed in at least one of the wounded stem transcriptomic data sets (stem 02 and 03) (Figure 4). These data represent the first complete set of genes documented to be involved in sesquiterpenoid production, plant defence, and agarwood production in a single species of Aquilaria.
In conclusion, this study presents the first high‐quality genome assembled for a plant in the genus Aquilaria. The A. sinensis genome holds important scientific, commercial and conservation relevance, and our work provides details of the first complete set of genes involved in sesquiterpenoid and agarwood production in Aquilaria. More broadly, this high quality A. sinesis genome provides a useful reference point for further understanding of Thymelaeaceae biology and evolution.
AUTHOR CONTRIBUTIONS
J.H.L.H. conceived and supervised the study. W.N., S.L.T.S., A.Y.P.W., T.S., T.B., A.H., and J.H.L.H. performed the genome assembly, gene model prediction, gene annotation, and analyses. W.N., S.T.S.L., A.Y.P.W., T.S., T.B., A.H., L.M.C., D.T.W.L., and J.H.L.H. wrote the manuscript.
Supporting information
TablesS1‐S6
ACKNOWLEDGEMENTS
This study was supported by The Chinese University of Hong Kong Direct Grant (4053248), and the Agriculture, Fisheries and Conservation Department, The Government of the Hong Kong Special Administrative Region (AFCD/SQ/60/18) (JHLH). A.H. is supported by a Biotechnology and Biological Sciences Research Council (BBSRC) David Phillips Fellowship (BB/N020146/1). T.B. is supported by a studentship from the Biotechnology and Biological Sciences Research Council‐funded South West Biosciences Doctoral Training Partnership (BB/M009122/1). S.T.S.L. is supported by a postgraduate studentship from The Chinese University of Hong Kong.
Nong W, Law STS, Wong AYP, et al. Chromosomal‐level reference genome of the incense tree Aquilaria sinensis . Mol Ecol Resour. 2020;20:971–979. 10.1111/1755-0998.13154
Nong and Law contributed equally to this work.
Contributor Information
Wenyan Nong, Email: jeromehui@cuhk.edu.hk.
Sean T. S. Law, Email: jeromehui@cuhk.edu.hk.
DATA AVAILABILITY STATEMENT
The raw genome and RNA sequencing data have been deposited in the SRA under Bioproject numbers SRR10737433 and PRJNA534170. The final chromosome assembly was submitted to NCBI Assembly under accession number VZPZ00000000 in NCBI.
REFERENCES
- Agriculture, Fisheries and Conservation Department (2013). Status of Aquilaria sinensis (Incense Tree) in Hong Kong. Retrieved from http://www.epd.gov.hk/epd/sites/default/files/epd/english/boards/advisory_council/files/ncsc_paper03_2013.pdf [Google Scholar]
- Agriculture, Fisheries and Conservation Department (2018). Incense tree (Aquilaria sinensis) species action plan 2018–2022. Agriculture, Fisheries, and Conservation Department Press Releases. Retrieved from https://www.afcd.gov.hk/english/conservation/con_flo/con_flo_con/files/Incense_Tree_SAP_final..pdf [Google Scholar]
- Akter, S. , Islam, M. T. , Zulkefeli, M. , & Khan, S. I. (2013). Agarwood production – A multidisciplinary field to be explored in Bangladesh. International Journal of Pharmacy and Life Sciences, 2, 22–32. [Google Scholar]
- Ambrožová, K. , Mandáková, T. , Bureš, P. , Neumann, P. , Leitch, I. J. , Koblížková, A. , … Lysak, M. A. (2010). Diverse retrotransposon families and an AT‐rich satellite DNA revealed in giant genomes of Fritillaria lilies. Annals of Botany, 107, 255–268. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bao, Z. , & Eddy, S. R. (2002). Automated de novo identification of repeat sequence families in sequenced genomes. Genome Research, 12, 1269–1276. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bolger, A. M. , Lohse, M. , & Usadel, B. (2014). Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics, 30(15), 2114–2120. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen, C.‐H. , Kuo, T.‐Y. , Yang, M.‐H. , Chien, T.‐Y. , Chu, M.‐J. , Huang, L.‐C. , … Chen, L.‐F. (2014). Identification of cucurbitacins and assembly of a draft genome for Aquilaria agallocha. BMC Genomics, 15, 578. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen, H.‐Q. , Wei, J.‐H. , Yang, J.‐S. , Zhang, Z. , Yang, Y. , Gao, Z.‐H. , … Gong, B. (2012). Chemical constituents of agarwood originating from the endemic genus Aquilaria plants. Chemistry and Biodiversity, 9, 236–250. [DOI] [PubMed] [Google Scholar]
- CITES (2015). Report on NDF of agarwood for sustainability harvest in Indonesia. Retrieved from https://www.cites.org/sites/default/files/ndf_material/AGARWOOD_IN_INDONESIA_NDF%5B1%5D.pdf [Google Scholar]
- Fazila, K. N. , & Halim, K. H. K. (2012). Effects of soaking on yield and quality of agarwood oil. Journal of Tropical Forest Science, 24, 557–564. [Google Scholar]
- Gao, Y. , Wang, H. , Liu, C. , Chu, H. , Dai, D. , Song, S. , & Tang, L. (2018). De novo genome assembly of the red silk cotton tree (Bombax ceiba). GigaScience, 7(5), 1‐7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Haas, B. J. , Salzberg, S. L. , Zhu, W. , Pertea, M. , Allen, J. E. , Orvis, J. , … Wortman, J. R. (2008). Automated eukaryotic gene structure annotation using EVidenceModeler and the program to assemble spliced alignments. Genome Biology, 9(1), R7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Harvey‐Brown, Y. (2018). Aquilaria sinensis. The IUCN red list of threatened. Species, 2018, e.T32382A2817115. [Google Scholar]
- Hashim, Y. , Ismail, N. , & Abbas, P. (2014). Analysis of chemical compounds of agarwood oil from different species by gas chromatography mass spectrometry (GCMS). IIUM Engineering Journal, 15, 55–60. [Google Scholar]
- Ibarra‐Laclette, E. , Lyons, E. , Hernández‐Guzmán, G. , Pérez‐Torres, C. A. , Carretero‐Paulet, L. , Chang, T.‐H. , … Herrera‐Estrella, L. (2013). Architecture and evolution of a minute plant genome. Nature, 498, 94–98. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Iu, K. C. (1983). The cultivation of the “Incense tree” (Aquilaria sinensis). Royal Asiatic Society Hong Kong Branch, 23, 247–249. [Google Scholar]
- Jordon‐Thaden, I. E. , Chanderbali, A. S. , Gitzendanner, M. A. , & Soltis, D. E. (2015). Modified CTAB and TRIzol protocols improve RNA extraction from chemically complex Embryophyta. Applications in Plant Sciences, 3(5), 1400105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jurka, J. , Kapitonov, V. V. , Pavlicek, A. , Klonowski, P. , Kohany, O. , & Walichiewicz, J. (2005). Repbase Update, a database of eukaryotic repetitive elements. Cytogenetic and Genome Research, 110, 462–467. [DOI] [PubMed] [Google Scholar]
- Korf, I. (2004). Gene finding in novel genomes. BMC Bioinformatics, 5(1), 59. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kumar, S. , Stecher, G. , Li, M. , Knyaz, C. , & Tamura, K. (2018). MEGA X: Molecular evolutionary genetics analysis across computing platforms. Molecular Biology and Evolution, 35(6), 1547–1549. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lieberman‐Aiden, E. , van Berkum, N. L. , Williams, L. , Imakaev, M. , Ragoczy, T. , Telling, A. , … Dekker, J. (2009). Comprehensive mapping of long‐range interactions reveals folding principles of the human genome. Science, 326, 289. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lomsadze, A. , Ter‐Hovhannisyan, V. , Chernoff, Y. O. , & Borodovsky, M. (2005). Gene identification in novel eukaryotic genomes by self‐training algorithm. Nucleic Acids Research, 33(20), 6494–6506. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lopez‐Sampson, A. , & Page, T. (2018). History of use and trade of agarwood. Economic Botany, 72, 107–129. [Google Scholar]
- Majoros, W. H. , Pertea, M. , Antonescu, C. , & Salzberg, S. L. (2003). GlimmerM, exonomy and unveil: Three ab initio eukaryotic genefinders. Nucleic Acids Research, 31(13), 3601–3604. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Marks, P. , Garcia, S. , Barrio, A. M. , Belhocine, K. , Bernate, J. , Bharadwaj, R. , … Church, D. M. (2019). Resolving the full spectrum of human genome variation using Linked‐Reads. Genome Research, 29(4), 635–645. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Motamayor, J. C. , Mockaitis, K. , Schmutz, J. , Haiminen, N. , Iii, D. L. , Cornejo, O. , … Kuhn, D. N. (2013). The genome sequence of the most widely cultivated cacao type and its use to identify candidate genes regulating pod color. Genome Biology, 14(6), r53. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Naef, R. (2011). The volatile and semi‐volatile constituents of agarwood, the infected heartwood of Aquilaria species: A review. Flavour and Fragrance Journal, 26(2), 73–87. [Google Scholar]
- Newton, A. C. , & Soehartono, T. (2001). CITES and the conservation of tree species: The case of Aquilaria in Indonesia. The International Forestry Review, 3, 27–33. [Google Scholar]
- Palmer, J. , & Stajich, J. E. (2017). Funannotate: Eukaryotic genome annotation pipeline. Retrieved from https://funannotate.readthedocs.io/en/latest/ [Google Scholar]
- Platt, R. N. , Blanco‐Berdugo, L. , & Ray, D. A. (2016). Accurate transposable element annotation is vital when analyzing new genome assemblies. Genome Biology and Evolution, 8, 403–410. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Price, A. L. , Jones, N. C. , & Pevzner, P. A. (2005). De novo identification of repeat families in large genomes. Bioinformatics, 21, 351–358. [DOI] [PubMed] [Google Scholar]
- Putnam, N. H. , O'Connell, B. L. , Stites, J. C. , Rice, B. J. , Blanchette, M. , Calef, R. , … Green, R. E. (2016). Chromosome‐scale shotgun assembly using an in vitro method for long‐range linkage. Genome Research, 26(3), 342–350. [DOI] [PMC free article] [PubMed] [Google Scholar]
- R Core Team (2013). R: A language and environment for statistical computing. Retrieved from http://www.r‐project.org [Google Scholar]
- Racine, J. S. (2013). RStudio: A platform‐independent IDE for R and Sweave. Journal of Applied Econometrics, 27, 167–172. [Google Scholar]
- Schnable, P. S. , Ware, D. , Fulton, R. S. , Stein, J. C. , Wei, F. , Pasternak, S. , … Wilson, R. K. (2009). The B73 maize genome: Complexity, diversity, and dynamics. Science, 326, 1112–1115. [DOI] [PubMed] [Google Scholar]
- Smit, A. , & Hubley, R. (2015). RepeatModeler Open‐1.0., http://repeatmasker.org. [Google Scholar]
- Smit, A. F. A. , Hubley, R. R. , & Green, P. R. (2013). RepeatMasker Open‐4.0. Retrieved from http://repeatmasker.org
- Stanke, M. , Keller, O. , Gunduz, I. , Hayes, A. , Waack, S. , & Morgenstern, B. (2006). AUGUSTUS: Ab initio prediction of alternative transcripts. Nucleic Acids Research, 34(Suppl_2), W435–W439. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tan, C. S. , Isa, N. M. , Ismail, I. , & Zainal, Z. (2019). Agarwood induction: Current developments and future perspectives. Frontiers in Plant Science, 10, 122. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Teh, B. T. , Lim, K. , Yong, C. H. , Ng, C. C. Y. , Rao, S. R. , Rajasegaran, V. , … Tan, P. (2017). The draft genome of tropical fruit durian (Durio zibethinus). Nature Genetics, 49, 1633–1641. [DOI] [PubMed] [Google Scholar]
- Wang, K. , Wang, Z. , Li, F. , Ye, W. , Wang, J. , Song, G. , … Yu, S. (2012). The draft genome of a diploid cotton Gossypium raimondii . Nature Genetics, 44, 1098–1103. [DOI] [PubMed] [Google Scholar]
- Wang, Y. , Zhan, D. F. , Jia, X. , Mei, W. L. , Dai, H. F. , Chen, X. T. , & Peng, S. Q. (2016). Complete chloroplast genome sequence of Aquilaria sinensis (Lour.) Gilg and evolution analysis within the Malvales Order. Frontiers Plant Science, 7, 280. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wickham, H. (2016). ggplot2: Elegant graphics for data analysis. New York, NY: Springer. [Google Scholar]
- Wong, W. Y. , & Simakov, O. (2018). RepeatCraft: A meta‐pipeline for repetitive element de‐fragmentation and annotation. Bioinformatics, 35, 1051–1052. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Xu, Y.‐H. , Liao, Y.‐C. , Lv, F.‐F. , Zhang, Z. , Sun, P.‐W. , Gao, Z.‐H. , … Wei, J.‐H. (2017). Transcription factor AsMYC2 controls the jasmonate‐responsive expression of ASS1 regulating sesquiterpene biosynthesis in Aquilaria sinensis (Lour.) Gilg. Plant and Cell Physiology, 58, 1924–1933. [DOI] [PubMed] [Google Scholar]
- Xu, Z. , & Wang, H. (2007). LTR_FINDER: An efficient tool for the prediction of full‐length LTR retrotransposons. Nucleic Acids Research, 35, W265–W268. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yin, Y. , Jiao, L. , Dong, N. , Jiang, X. , & Zhang, S. (2016). Wood resources, identification, and utilization of agarwood in China In Mohamed R. (Ed.), Agarwood science behind the fragrance (pp. 21–38). Singapore: Springer. [Google Scholar]
- Yip, K. L. , & Lai, C. C. (2004). The nationally rare and endangered plant, Aquilaria sinensis: Its status in Hong Kong. Hong Kong Biodiversity, 7, 14–16. [Google Scholar]
- Zheng, G. X. Y. , Lau, B. T. , Schnall‐Levin, M. , Jarosz, M. , Bell, J. M. , Hindson, C. M. , … Ji, H. P. (2016). Haplotyping germline and cancer genomes with high‐throughput linked‐read sequencing. Nature Biotechnology, 34(3), 303. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
TablesS1‐S6
Data Availability Statement
The raw genome and RNA sequencing data have been deposited in the SRA under Bioproject numbers SRR10737433 and PRJNA534170. The final chromosome assembly was submitted to NCBI Assembly under accession number VZPZ00000000 in NCBI.
