Abstract
The complete chloroplast genome of Artemisia annua (Asteraceae), the primary source of artemisinin, was sequenced and analyzed. The A. annua cp genome is 150,995 bp, and harbors a pair of inverted repeat regions (IRa and IRb), of 24,850 bp each that separate large (LSC, 82,988 bp) and small (SSC, 18,267 bp) single-copy regions. Our annotation revealed that the A. annua cp genome contains 113 genes and 18 duplicated genes. The gene order in the SSC region of A. annua is inverted; this fact is consistent with the sequences of chloroplast genomes from three other Artemisia species. Fifteen (15) forward and seventeen (17) inverted repeats were detected in the genome. The existence of rich SSR loci in the genome suggests opportunities for future population genetics work on this anti-malarial medicinal plant. In A. annua cpDNA, the rps19 gene was found in the LSC region rather than the IR region, and the rps19 pseudogene was absent in the IR region. Sequence divergence analysis of five Asteraceae species indicated that the most highly divergent regions were found in the intergenic spacers, and that the differences between A. annua and A. fukudo were very slight. A phylogenetic analysis revealed a sister relationship between A. annua and A. fukudo. This study identified the unique characteristics of the A. annua cp genome. These results offer valuable information for future research on Artemisia species identification and for the selective breeding of A. annua with high pharmaceutical efficacy.
Keywords: Artemisia annua, chloroplast genome, phylogeny
1. Introduction
Artemisia annua, an herbaceous annual with a strong volatile aroma, belongs to the genus Artemisia (Asteraceae). It is the sole natural source of the antimalarial drug artemisinin [1], and is cultivated as a high-value medicinal plant (Qing hao). Anti-malarial artemisinin combination therapy (ACT) has received strong interest from the global health community because of the efficacy of artemisinin and its derivatives [2]. Furthermore, the 2015 Nobel Prize for Physiology or Medicine was awarded to Professor Youyou Tu for the discovery of artemisinin [3]. However, there are concerns that the production of high-quality artemisinin may not be sufficient to meet future demand [2].
A. annua has a broad, global distribution and has many distinct locally-adapted ecotypes [4]. Beyond China, A. annua is also present in Eastern Europe, North America, and elsewhere in Asia [5]. However, the artemisinin content of A. annua ecotypes varies widely from region to region [5]. With the exception of a few rare high-artemisinin ecotypes found in China, the artemisinin content in A. annua ecotypes are generally insufficient (i.e., <1%) for commercialized extraction [6], and no other species been found to be suitable for mass production of artemisinin [1,7]. Oxygen released from chloroplasts in A. annua can upregulate the expression of genes involved in artemisinin biosynthesis, and can also catalyze artemisinin synthesis from dihydroartemisinin [8,9].
In addition to their role in photosynthesis, chloroplasts are also involved in cytoplasmic male sterility (CMS) [10] and secondary metabolic activities [11]. The chloroplast (cp) genome has a conserved quadripartite structure: a large single-copy (LSC) region, a small single-copy (SSC) region, and two inverted repeat (IR) regions. The majority of angiosperm cp genomes exhibit significant conservation of gene order and contents [12]. However, large-scale genome rearrangements and intron gains and losses have been identified in several angiosperm lineages [13,14,15]. A draft cp genome assembly for A. annua is of great importance for exploring putative links between A. annua’s chloroplast function and its adaptability and phytochemical characteristics.
The transcriptome sequences and genetic map of A. annua have been previously reported [16,17,18], but little is known about its cp genomic structure. Here we report the complete chloroplast genome sequence of A. annua, along with a characterization of long repeats and SSRs, and comparative analyses of the cp genome as a whole. Comparative analyses among cp genomes of other Asteraceae species revealed significant variation in genome size, highly divergent regions in intergenic spacers, as well as gene loss. Comprehensive cp genomic analyses will help to identify Artemisia species, provide insight into its evolutionary history, and improve the development of A. annua as a pharmacological resource [19,20].
2. Results and Discussion
2.1. Characteristics of A. annua cpDNA
The complete cp genome of A. annua is 150,995 bp in size, with a pair of IR regions of 24,850 bp that separate a LSC region of 82,988 bp from a SSC region of 18,267 bp (Table 1 and Figure 1). The overall GC and AT content of the A. annua cp genome is 37.5% and 62.5%, respectively, which is similar to the cp genomes of other Asteraceae spp. [21,22,23]. The IR regions possess higher GC content (43%) than do the LSC (35.5%) or SSC regions (30.8%) (Table 1). Within the protein-coding regions (CDS), the AT content of the first, second, and third codon positions, is 54.6%, 62.4%, and 70.0%, respectively (Table 1). The bias toward a higher AT representation at the third codon position has been found to be common in other plant cp genomes [15,24], and this bias is used to discriminate cpDNA from nuclear and mitochondrial DNA [25]. The coding regions constitute 52.6% of the genome, and therefore the non-coding regions—including introns, pseudogenes, and intergenic spacers—account for 47.4%.
Table 1.
Region | T (U) (%) | C (%) | A (%) | G (%) | Length (bp) | |
---|---|---|---|---|---|---|
LSC | 32.4 | 17.5 | 32.1 | 18.0 | 82,988 | |
SSC | 34.2 | 16.1 | 35.0 | 14.7 | 18,267 | |
IRA | 28.5 | 20.8 | 28.3 | 22.3 | 24,850 | |
IRB | 28.3 | 22.3 | 28.5 | 20.8 | 24,850 | |
Total | 31.3 | 18.7 | 31.2 | 18.8 | 150,955 | |
CDS | 31.6 | 17.6 | 30.7 | 20.1 | 79,335 | |
1st position | 24.0 | 18.9 | 30.6 | 26.7 | 26,445 | |
2nd position | 33.0 | 20.2 | 29.4 | 17.7 | 26,445 | |
3rd position | 38.0 | 13.8 | 32.0 | 16.0 | 26,445 |
CDS: protein-coding regions.
The A. annua cp genome encodes 113 predicted functional genes, including 80 protein-coding genes, 29 tRNA genes, and four rRNA genes (Table S1). In addition, there are 18 genes duplicated in the IR, making a total of 131 genes present in the A. annua cp genome (Figure 1). These genes have also been observed in Artemisia frigida [26]. Among these genes, seven protein-coding, seven tRNA, and all four rRNA genes are duplicated in the IR regions. The LSC region contains 62 protein-coding and 22 tRNA genes, whereas the SSC region contains one tRNA gene and 12 protein-coding genes.
Based on the sequences of protein-coding and tRNA genes, the frequency of codon usage was estimated for the A. annua cp genome and is summarized in Table 2. Together, all genes in the A. annua cp genome are encoded by 26,445 codons. Among these, leucine, with 2853 (10.7%) of the codons, is the most frequent amino acid in the cp genome, and cysteine, with 293 (1.1%), is the least frequent (Table 2). A- and U-ending codons were common. Except for trnL-CAA, all types of preferred synonymous codons (RSCU > 1) ended with A or U.
Table 2.
Amino Acid | Codon | No. | RSCU | tRNA | Amino Acid | Codon | No. | RSCU | tRNA |
---|---|---|---|---|---|---|---|---|---|
Phe | UUU | 993 | 1.32 | Tyr | UAU | 811 | 1.64 | ||
Phe | UUC | 510 | 0.68 | trnF-GAA | Tyr | UAC | 178 | 0.36 | trnY-GUA |
Leu | UUA | 890 | 1.87 | Stop | UAA | 52 | 1.77 | ||
Leu | UUG | 579 | 1.22 | trnL-CAA | Stop | UAG | 21 | 0.72 | |
Leu | CUU | 622 | 1.31 | His | CAU | 471 | 1.51 | ||
Leu | CUC | 198 | 0.42 | His | CAC | 151 | 0.49 | trnH-GUG | |
Leu | CUA | 368 | 0.77 | Gln | CAA | 732 | 1.52 | trnQ-UUG | |
Leu | CUG | 196 | 0.41 | Gln | CAG | 230 | 0.48 | ||
Ile | AUU | 1092 | 1.47 | Asn | AAU | 1017 | 1.56 | ||
Ile | AUC | 433 | 0.58 | trnI-CAU | Asn | AAC | 287 | 0.44 | |
Ile | AUA | 706 | 0.95 | Lys | AAA | 1042 | 1.47 | ||
Met | AUG | 633 | 1.00 | trnM-CAU | Lys | AAG | 371 | 0.53 | |
Val | GUU | 512 | 1.44 | Asp | GAU | 868 | 1.61 | ||
Val | GUC | 174 | 0.49 | trnV-GAC | Asp | GAC | 213 | 0.39 | trnD-GUC |
Val | GUA | 546 | 1.54 | Glu | GAA | 1001 | 1.50 | trnE-UUC | |
Val | GUG | 188 | 0.53 | Glu | GAG | 337 | 0.50 | ||
Ser | UCU | 588 | 1.74 | Cys | UGU | 202 | 1.38 | ||
Ser | UCC | 324 | 0.96 | trnS-GGA | Cys | UGC | 91 | 0.62 | trnC-GCA |
Ser | UCA | 417 | 1.23 | trnS-UGA | Stop | UGA | 15 | 0.51 | |
Ser | UCG | 167 | 0.49 | Trp | UGG | 462 | 1.00 | trnW-CCA | |
Pro | CCU | 441 | 1.58 | Arg | CGU | 350 | 1.33 | trnR-ACG | |
Pro | CCC | 188 | 0.67 | Arg | CGC | 107 | 0.41 | ||
Pro | CCA | 329 | 1.18 | trnP-UGG | Arg | CGA | 343 | 1.30 | |
Pro | CCG | 159 | 0.57 | Arg | CGG | 124 | 0.47 | ||
Thr | ACU | 535 | 1.63 | Arg | AGA | 485 | 1.84 | trnR-UCU | |
Thr | ACC | 246 | 0.75 | trnT-GGU | Arg | AGG | 174 | 0.66 | |
Thr | ACA | 411 | 1.25 | trnT-UGU | Ser | AGU | 410 | 1.21 | |
Thr | ACG | 124 | 0.38 | Ser | AGC | 122 | 0.36 | trnS-GCU | |
Ala | GCU | 617 | 1.74 | Gly | GGU | 589 | 1.32 | ||
Ala | GCC | 228 | 0.64 | Gly | GGC | 189 | 0.42 | trnG-GCC | |
Ala | GCA | 415 | 1.17 | Gly | GGA | 707 | 1.58 | ||
Ala | GCG | 158 | 0.45 | Gly | GGG | 306 | 0.68 |
RSCU: Relative Synonymous Codon Usage.
In total, there are 17 intron-containing genes, 15 (nine protein-coding and six tRNA genes) of which contain one intron, and two of which (ycf3 and clpP) contain two introns (Table 3). The trnK-UUU has the largest intron (1860 bp), which itself contains the matK gene. The rps12 gene is a trans-spliced gene with the 5′ end located in the LSC region and the duplicated 3′ ends in the IR regions. Ycf3 is required for the stable accumulation of the photosystem I complex [27,28]. The intron gain in ycf3 of A. annua may be useful for further studies of the mechanism of photosynthesis evolution, and of variation in singlet oxygen released by chloroplasts in from Artemisia.
Table 3.
Gene | Location | Exon I (bp) | Intron I (bp) | Exon II (bp) | Intron II (bp) | Exon III (bp) |
---|---|---|---|---|---|---|
trnK-UUU | LSC | 37 | 1860 | 35 | ||
trnG-UCC | LSC | 23 | 729 | 47 | ||
trnL-UAA | LSC | 37 | 424 | 50 | ||
trnV-UAC | LSC | 38 | 572 | 37 | ||
trnI-GAU | IR | 42 | 777 | 35 | ||
trnA-UGC | IR | 38 | 812 | 35 | ||
rps12 * | LSC | 232 | 535 | 26 | 114 | |
rps16 | LSC | 40 | 876 | 185 | ||
rpl16 | LSC | 9 | 1015 | 399 | ||
rpl2 | IR | 394 | 626 | 470 | ||
rpoC1 | LSC | 430 | 734 | 1640 | ||
ndhA | SSC | 556 | 1064 | 539 | ||
ndhB | IR | 777 | 670 | 756 | ||
ycf3 | SSC | 127 | 700 | 230 | 735 | 153 |
petB | LSC | 6 | 747 | 642 | ||
atpF | LSC | 145 | 699 | 410 | ||
clpP | LSC | 71 | 796 | 292 | 606 | 228 |
* The rps12 gene is a trans-spliced gene with the 5′ end located in the LSC region and the duplicated 3′ ends in the IR regions.
Introns may contain “old code”—i.e., the part of a gene that loses its function during evolution. Several unicellular eukaryotes seem to experience selective pressures to lose introns. Therefore, the fact of intron gain and/or intron loss requires an evolutionary explanation. A common partial explanation for the range of intron densities is the random accumulation of introns in nuclear genomes over time after inheritance from an intron-poor ancestor. More experimental evidence is required to reveal whether the variation of the introns in the A. annua cp genome is related to adaptation to environmental stresses, or to facilitate artemisinin biosynthesis.
2.2. Long Repeat and SSR Analysis
For repeat structure analysis, 15 forward and 17 inverted repeats were detected in the A. annua cp genome (Table 4). Most of these repeats show lengths between 30 and 39 bp, while the ycf2 gene possesses the two longest inverted repeats at 60 bp. Two repeats relevant to psa genes (No. 4 and 5) and three forward and three inverted repeats (No. 1–3, No. 16–18) in the intergenic spacers are distributed in the LSC region. Moreover, two forward and eight inverted repeats (No. 11 and 12, No. 22–29) associated with ycf2, two forward and two inverted repeats (No. 14 and 15, No. 31 and 32) in the intergenic spacers, are distributed in the IR region.
Table 4.
ID | Repeat Start 1 | Type | Size (bp) | Repeat Start 2 | Mismatch (bp) | E-Value | Gene | Region |
---|---|---|---|---|---|---|---|---|
1 | 8544 | F | 32 | 34,909 | −3 | 4.65E-05 | IGS | LSC |
2 | 28,063 | F | 31 | 29,661 | −3 | 1.69E-04 | IGS | LSC |
3 | 28,070 | F | 30 | 29,666 | −2 | 2.18E-05 | IGS | LSC |
4 | 38,054 | F | 32 | 40,278 | −2 | 1.55E-06 | psaB; psaA | LSC |
5 | 38,065 | F | 30 | 40,289 | −3 | 6.09E-04 | psaB; psaA | LSC |
6 | 43,070 | F | 41 | 96,883 | −1 | 1.63E-13 | ycf3 (intron); IGS | LSC; IRA |
7 | 43,072 | F | 39 | 118,107 | −1 | 2.48E-12 | ycf3 (intron); ndhA (intron) | LSC; SSC |
8 | 43,075 | F | 35 | 93,834 | −3 | 9.59E-07 | ycf3 (intron); ndhB (intron) | LSC; IRA |
9 | 66,346 | F | 30 | 98,046 | −2 | 2.18E-05 | IGS | LSC; IRA |
11 | 86,539 | F | 30 | 147,378 | −3 | 6.09E-04 | ycf2 | IRA; IRB |
12 | 90,121 | F | 30 | 90,157 | −1 | 5.00E-07 | ycf2 | IRA |
13 | 96,885 | F | 39 | 118,107 | 0 | 2.12E-14 | IGS; ndhA (intron) | IRA; SSC |
14 | 105,777 | F | 30 | 105,809 | −2 | 2.18E-05 | IGS | IRA |
15 | 128,104 | F | 30 | 128,136 | −2 | 2.18E-05 | IGS | IRB |
16 | 8548 | I | 30 | 44,753 | −2 | 2.18E-05 | IGS | LSC |
17 | 29,662 | I | 30 | 29,881 | −2 | 2.18E-05 | IGS | LSC |
18 | 34,911 | I | 30 | 44,755 | −1 | 5.00E-07 | IGS | LSC |
19 | 43,070 | I | 41 | 137,019 | −1 | 1.63E-13 | ycf3 (intron); IGS | LSC; IRB |
20 | 43,075 | I | 35 | 140,074 | −3 | 9.59E-07 | ycf3 (intron); ndhB (intron) | LSC; IRB |
21 | 66,346 | I | 30 | 135,867 | −2 | 2.18E-05 | IGS | LSC; IRB |
22 | 90,109 | I | 60 | 143,756 | −2 | 7.68E-23 | ycf2 | IRA; IRB |
23 | 90,109 | I | 42 | 143,756 | −2 | 2.57E-12 | ycf2 | IRA; IRB |
24 | 90,121 | I | 30 | 143,756 | −1 | 5.00E-07 | ycf2 | IRA; IRB |
25 | 90,124 | I | 45 | 143,756 | 0 | 5.18E-18 | ycf2 | IRA; IRB |
26 | 90,127 | I | 60 | 143,774 | −2 | 7.68E-23 | ycf2 | IRA; IRB |
27 | 90,142 | I | 45 | 143,774 | 0 | 5.18E-18 | ycf2 | IRA; IRB |
28 | 90,145 | I | 42 | 143,792 | −2 | 2.57E-12 | ycf2 | IRA; IRB |
29 | 90,157 | I | 30 | 143,792 | −1 | 5.00E-07 | ycf2 | IRA; IRB |
30 | 105,777 | I | 30 | 128,104 | −2 | 2.18E-05 | IGS | IRA; IRB |
31 | 105,809 | I | 30 | 128,136 | −2 | 2.18E-05 | IGS | IRA; IRB |
32 | 118,107 | I | 39 | 137,019 | 0 | 2.12E-14 | ndhA (intron); rps12 (CDS) | SSC; IRB |
F: Forward; I: Inverted; IGS: intergenic space; CDS: protein-coding regions.
SSRs, well-known as microsatellites, are short (1–6 bp), tandemly repeated DNA sequences that are widely distributed throughout the genome. cpSSRs, uniparental in inheritance, have been widely employed in the analysis of plant population structure, diversity, differentiation and maternity analysis [29,30,31]. Here, the distribution of SSRs was analyzed for the A. annua cp genome, and 35 SSRs, most of them distributed in LSC, were identified. These included 31 mononucletide SSRs (88.57%), two dinucleotide SSRs (5.71%), and two trinucleotide SSR (5.71%) (Table 5). Sixteen of the 35 SSR loci were found in the intergenic regions, while the other 19 SSRs were located in genes. All 31 mononucleotide SSRs belonged to the A/T type. Our results are consistent with the hypothesis that cpSSRs are generally composed of short polyadenine (polyA) or polythymine (polyT) repeats and rarely contain tandem guanine (G) or cytosine (C) repeats. Thus, these SSRs contribute to the AT richness of cp genomes. cpSSRs have been important resources for the study of economically important plants and their relatives. Furthermore, the potential of cpSSRs to offer unique insights into species identification, genetic diversity, and evolutionary processes in wild plant species is quite tremendous [32]. Our results will provide cpSSR markers that can be used to examine genetic diversity in A. annua and its relative species, and to provide an efficient means by which to select germplasm with anti-malarial pharmaceutical efficacy.
Table 5.
cpSSR ID | Repeat Motif | Length (bp) | Start | End | Region | Annotation |
---|---|---|---|---|---|---|
1 | (A)15 | 15 | 3204 | 3218 | LSC | matK |
2 | (A)14 | 14 | 3708 | 3721 | LSC | |
3 | (A)10 | 10 | 6121 | 6130 | LSC | |
4 | (T)10 | 10 | 9944 | 9953 | LSC | |
5 | (A)10 | 10 | 13,630 | 13,639 | LSC | rpoB |
6 | (A)12 | 12 | 20,826 | 20,837 | LSC | rpoC2 |
7 | (T)10 | 10 | 23,027 | 23,036 | LSC | rpoC2 |
8 | (A)11 | 11 | 26,289 | 26,299 | LSC | atpH |
9 | (A)14 | 14 | 28,513 | 28,526 | LSC | atpA |
10 | (A)11 | 11 | 39,312 | 39,322 | LSC | psaA |
11 | (A)10 | 10 | 48,206 | 48,215 | LSC | |
12 | (AT)6 | 12 | 52,028 | 52,039 | LSC | |
13 | (T)14 | 14 | 53,085 | 53,098 | LSC | atpB |
14 | (A)17 | 17 | 53,306 | 53,322 | LSC | atpB |
15 | (A)19 | 19 | 54,902 | 54,920 | LSC | rbcL |
16 | (A)10 | 10 | 56,832 | 56,841 | LSC | |
17 | (A)14 | 14 | 57,920 | 57,933 | LSC | accD |
18 | (A)11 | 11 | 59,654 | 59,664 | LSC | ycf4 |
19 | (T)10 | 10 | 59,775 | 59,784 | LSC | ycf4 |
20 | (T)10 | 10 | 64,476 | 64,485 | LSC | |
21 | (T)10 | 10 | 64,902 | 64,911 | LSC | |
22 | (A)11 | 11 | 66,255 | 66,265 | LSC | |
23 | (T)10 | 10 | 69,525 | 69,534 | LSC | |
24 | (A)14 | 14 | 70,210 | 70,223 | LSC | |
25 | (T)10 | 10 | 71,655 | 71,664 | LSC | psbB |
26 | (TA)6 | 12 | 72,640 | 72,651 | LSC | psbB |
27 | (T)14 | 14 | 73,210 | 73,223 | LSC | psbN |
28 | (A)15 | 15 | 80,929 | 80,943 | LSC | |
29 | (T)10 | 10 | 81,209 | 81,218 | LSC | |
30 | (T)11 | 11 | 101,234 | 101,244 | IRA | |
31 | (GAA)5 | 15 | 108,039 | 108,053 | SSC | ndhF |
32 | (TAA)5 | 15 | 117,240 | 117,254 | SSC | ndhI |
33 | (T)10 | 10 | 118,903 | 118,912 | SSC | |
34 | (A)14 | 14 | 121,936 | 121,949 | SSC | ycf1 |
35 | (A)11 | 11 | 132,700 | 132,710 | IRB |
2.3. Comparative Chloroplast Genomic Analysis
The whole cp genome sequence of A. annua was compared to those of Artemisia fukudo, Lactuca sativa, Jacobaea vulgaris, and Cynara cornigera. The cp genome size of A. annua is the second smallest among the five completed Asteraceae cp genomes. It is larger than J. vulgaris (150,689 bp) (Table S2), but smaller than the cp genomes of A. fukudo, C. cornigera, and L. sativa by 56 bp, 1595 bp, 1817 bp, respectively. A. annua has the smallest SSC region (18,267 bp) among these sequenced Asteraceae cp genomes. The next smallest SSC region is from J. vulgaris, with a size of 18,276 bp. There are no significant differences in sequence length between SSC or IR, and the variation in sequence length is the main reason that there is a difference in the length of the LSC region.
Comparative genome analysis [33] permits the examination of how DNA sequences diverge among related species. The whole sequence identity of the five Asteraceae cp genomes was plotted using mVISTA, with the annotated A. annua cp genome as a reference (Figure 2). The comparison shows that the two IR regions are less divergent than the LSC and SSC regions. In addition, the coding regions are more conserved than the non-coding regions, and the highly divergent regions among the five cp genomes occur in the intergenic spacers, including rnH-psbA, psbM-petN, trnC-GCA-petN, trnE-UUC-rpoB, trnY-GUA-trnE-UUC, trnV-UAC-ndhC, rbcL-accD, accD–psaI, and rpl32-trnL-UAG in LSC, as well as ndhI-ndhG and ycf1-rps15 in SSC. Similar results have been observed in other plant cp genomes [21,34]. Moreover, the most divergent coding regions are the ndhF, ycf1, and ycf2 genes in five Asteraceae cp genomes. However, there is only a very slight difference between A. annua and A. fukudo. In our study, we observed that all eight rRNA genes are highly conserved.
2.4. IR Contraction and Expansion in the A. annua cp Genome
Although IRs are the most conserved regions of the cp genomes, contraction and expansion at the borders of IR regions are common evolutionary events, and are hypothesized to explain size differences between cp genomes [35,36]. Detailed comparisons of the IR-SSC and IR-LSC boundaries among four Asteraceae cp genomes (Artemisia annua, Artemisia fukudo, Artemisi frigida, and Artemisia montana) are presented in Figure 3. The IRb/SSC border is generally positioned between the ycf1 pseudogene and the ndhF gene. The ycf1 pseudogene has proven to be useful for analyzing cp genome variation in higher plants and algae [37]. The ndhF gene, related to photosynthesis, was found to be 56 bp, 58 bp, 60 bp, and 75 bp away from the IRb/SSC border, in A. montana, A. annua, A. fukudo, and A. frigida, respectively. However, some unique structural differences exist in the A. annua cp genome: the trnH gene is present at the longest distance (114 bp) from the LSC edge; the rps19 pseudogene is absent in A. annua due to the contraction of the borders of the IR regions; the rps19 gene was present in the LSC region due to the expansion of LSC. It has been reported that the rps19 gene is one of the most abundant transcripts in the chloroplast’s genome [38]. The IR/LSC boundaries are not static among the cp genome in Artemisia species, but are dynamic processes confined to conservative expansions and contractions, which is similar to what has been found in other plants [39].
The comparison of cp genome size among examined Asteraceae species is displayed in Table S3. The length of the IR (24,850 bp) in A. annua is 106 bp smaller than that of A. fukudo, 122 bp smaller than that of A. frigida, and 109 bp smaller than that of A. montana. These differences may be related to the loss of rps19 and rps19 pseudogenes in A. annua IR regions. However, there are no significant differences in the length of the whole cp genome among the four Asteraceae cp genomes. The cp genome of A. annua (150,955 bp) is 56 bp smaller than that of A. fukudo, 121 bp smaller than that of A. frigida, and 175 bp smaller than that of A. montana. Non-functional DNA is rapidly deleted, resulting in the failure of pseudogenes to accumulate, which is the likely cause of this variation.
Pairwise cp genomic alignment between A. annua and the three Artemisia cp genomes (A. frigida, A. fukudo, and A. montana) revealed a high degree of synteny (Figures S1–S3). Previous work had reported that the cp genome of A. frigida had two inversion events in the LSC region, and at least one re-inversion event in the SSC [26]. Our results suggest that A. annua has similar sequence rearrangements. To further confirm the accuracy of the assembly and the gene order of the SSC in A. annua, four primers were designed to amplify the junctions of IRs and the LSC/SSC. These primers would create an amplicon by PCR amplification, which could then be analyzed via Sanger sequencing using the primers listed in Table S4. The inversion and re-inversion events in A. annua suggest that the SSC may be an active region for sequence rearrangements in plant cp genomes. Outside the Asteraceae [40,41], other angiosperms have been found to have an inverted SSC region, including Piper cenocladum [42], Dioscorea elephantipes, and Chloranthus spicatus [43]. Although chloroplast gene order is generally conserved in land plant genomes [44], many sequence rearrangements have been reported in cp genomes from a wide variety of different plant species, including inversions in the LSC region [45,46,47], IR contraction or expansions with inversions [48], and re-inversion in the SSC region. It has been proposed that sequence rearrangements in cp genomes are caused by intramolecular recombination events [49]. Sequence rearrangements that alter cp genome structure in related species may also provide genetic diversity information that can be used for molecular classification and evolution studies.
2.5. Phylogenetic Analysis
A. annua belongs to the tribe Anthemideae in the Asteraceae. Several studies have reported analyzes of the phylogenetic relationships within the Asteraceae based on chloroplast coding or non-coding sequences [50,51]. The availability of a completed A. annua cp genome provides us with sequence information that can be used to study the molecular evolution and phylogeny of A. annua. We performed multiple sequence alignments using 50 protein-coding genes commonly present in cp genome sequences in 20 Asteraceae species. One additional cp genome, Berberis bealei (Berberidaceae), was included as an outgroup (Figure 4). On the basis of a GTR + G + I nucleotide substitution model with 100% bootstrap values, as recommended by Jmodeltest, the ML phylogenetic results strongly supported the hypothesis that A. annua is the sister of the closely related species Artemisia fukudo. Furthermore, we hypothesized that Artemisia fukudo may have similar phytochemical properties [52].
3. Materials and Methods
3.1. DNA Sequencing, cp Genome Assembly, and Validation
Fresh A. annua leaves were collected from tissue cultured seedlings. Total DNA was extracted from approximately 10 g of fresh leaf tissue using the modified CTAB method [53]. The DNA concentration for each sample was estimated by measuring A260 using an ND-2000 spectrometer [54] (Nanodrop Technologies, Wilmington, DE, USA), and visual quality was assessed using agarose gel electrophoresis. Pure DNA was used to construct shotgun libraries (250 bp) according to the manufacturer’s instructions. Sequencing was performed by an Illumina Hiseq 1500 platform (San Diego, CA, USA). This resulted in approximately 100 Gb data. First, raw reads were trimmed by Fastqc. Next, we performed BLASTs between trimmed reads and reference sequences (Artemisia frigida) to extract cp-like reads [55]. Finally, the cp-like reads were used for sequence assembly with SOAPdenovo [56]. Sequence extension was executed using SSPACE [57], and gaps were filled using GapCloser [58]. To verify the assembly, the four junction regions between the IR regions and LSC/SSC were confirmed by PCR amplification and Sanger sequencing, using the primers listed in Table S4. The final cp genome of A. annua was submitted to GenBank (Accession Number: MF623173).
3.2. Gene Annotation and Sequence Analyses
The initial gene annotation was performed with CPGAVAS [59] (http://www.herbalgenomics.org/cpgavas) and further confirmation was performed using BLAST and DOGMA [60]. tRNA genes were identified by tRNAscanSE [61]. The circular cp genome map was drawn using the OGDRAWv1.2 [62] program (http://ogdraw.mpimp-golm.mpg.de/). To analyze the characteristics of variations in synonymous codon usage, relative synonymous codon usage values (RSCU), codon usage, and AT content were determined using MEGA5.2 [63].
3.3. Genome Comparison
MUMmer [64] was used to perform pairwise cp genomic alignment. The mVISTA [65] program in the Shuffle-LAGAN mode [66], was employed to compare the cp genome of A. annua with the cp genomes of Artemisia fukudo, Lactuca sativa, Jacobaea vulgaris, and Cynara cornigera (KU360270, AP007232, HQ234669 and KP842707), using the annotation of A. annua as the reference. MISA [67] was used to visualize the SSRs and REPuter [68] was used to visualize forward and inverted repeats.
3.4. Phylogenetic Analysis
A total of 19 complete cp genome sequences were downloaded from the NCBI Organelle Genome and Nucleotide Resources database. For the phylogenetic analysis, a set of 50 protein-coding genes shared in all 20 analyzed genomes was used. Genes were aligned by clustalw2 [69]. Jmodeltest 3.7 [70] was used to select the best model for ML (Maximum likelihood) analysis, and the phylogenetic tree was plotted using RAxML-HPC 2.7.6.3 on XSEDE at the CIPRES Science Gateway (http://www.phylo.org/). Bootstrap analysis was executed with 1000 replicates and TBR branch swapping. In addition, Berberis bealei was set as the outgroup.
4. Conclusions
Here we report the first complete cpDNA sequence of A. annua, an important medicinal plant. Compared to the cp genomes of three related Artemisia species, the cp genome of A. annua has the smallest size, while the genome structure and composition are similar. In addition, the cp genome of A. annua has an inverted SSC region, and is similar in that respect to most Asteraceae. However, a re-inversion event in the SSC region of the A. annua lineage suggests that the SSC might be an active region for inversion events in Asteraceae species. Repeated sequences, together with the aforementioned SSRs, are informative sources for the development of new molecular markers. Phylogenetic relationships among 20 Asteraceae species strongly supported the known taxonomic status of A. annua in Asteraceae and the sisterhood of the closely related species A. fukudo. The comprehensive data presented in this study provide insight into the evolutionary relationships between species of the genus Artemisia, and provide an assembly of a whole cp genome of A. annua, which may be useful for future breeding and further biological discoveries.
Acknowledgments
This work is supported by the grants from the National Nature Science Foundation of China (81403053 and 81503469) and from the China Academy of Chinese Medical Sciences Special Fund for Health Service Development of Chinese Medicine (ZZ0908067).
Supplementary Materials
Table S1. Gene contents in the Artemisia annua chloroplast genome. (113 genes). Table S2. Size comparison of Artemisia annua chloroplast genomic regions and three other Asteraceae chloroplast genomes. Table S3. Size comparison of Artemisia annua chloroplast genomic regions and three other Artemisia chloroplast genomes. Table S4. Primers used for assembly validation. Figure S1. Chloroplast genomic alignment between Artemisia annua and Artemisia frigida. Figure S2. Chloroplast genomic alignment between Artemisia annua and Artemisia fukudo. Figure S3. Chloroplast genomic alignment between Artemisia annua and Artemisia montana.
Author Contributions
S.C. and J.X. conceived and designed the research framework; X.S., Z.L., S.X., and R.B. prepared the sample and performed the experiments; B.L., and M.W. analyzed the data; X.S. wrote the paper. X.L. and B.Z. made revisions to the final manuscript. All authors have read and approved the final manuscript.
Conflicts of Interest
The authors declare no conflict of interest.
Footnotes
Sample Availability: Sequence data of Artemisia annua are available from the authors.
References
- 1.Klayman D.L. Qinghaosu (artemisinin): An antimalarial drug from China. Science. 1985;228:1049–1055. doi: 10.1126/science.3887571. [DOI] [PubMed] [Google Scholar]
- 2.Arrow K.J., Panosian C.B., Gelband H. Saving Lives, Buying Time: Economics of Malaria Drugs in an Age of Resistance. National Academies Press; Washington, DC, USA: 2004. [PubMed] [Google Scholar]
- 3.Tu Y.Y. Artemisinin—A fift from traditional chinese medicine to the world (nobel lecture) Angew. Chem. Int. Ed. Engl. 2016;55:10210–10226. doi: 10.1002/anie.201601967. [DOI] [PubMed] [Google Scholar]
- 4.Mert A., Krc S., Ayanoğlu F. The effects of different plant densities on yield, yield components and quality of Artemisia annua L. Ecotypes. J. Herbs Spices Med. Plants. 2002;9:413–418. doi: 10.1300/J044v09n04_20. [DOI] [Google Scholar]
- 5.Delabays N., Simonnet X., Gaudin M. The genetics of artemisinin content in Artemisia annua L. and the breeding of high yielding cultivars. Curr. Med. Chem. 2001;8:1795–1801. doi: 10.2174/0929867013371635. [DOI] [PubMed] [Google Scholar]
- 6.Zhong G.Y., Zhou H.R., Lun Y., Hu M., Zhao P.P. Studies on quality germplasm resources of Artemisia annua. Chin. Herbal Med. 1998;29:264–267. [Google Scholar]
- 7.Hu S.L., Xu Q.C., Liu J.F., Gu Y.X. Studies on plant resources of artemisinin. China J. Chin. Mater. Med. 1981;2:13–16. [Google Scholar]
- 8.Guo X.X., Yang X.Q., Yang R.Y. Salicylic acid and methyl jasmonate but not Rose Bengal enhance artemisinin production through invoking burst of endogenous singlet oxygen. Plant Sci. 2010;178:390–397. doi: 10.1016/j.plantsci.2010.01.014. [DOI] [Google Scholar]
- 9.Zeng Q.P., Zeng X.M., Yang R.Y. Singlet oxygen as a signaling transducer for modulating artemisinin biosynthetic genes in Artemisia annua. Biol. Plantarum. 2011;55:69–674. doi: 10.1007/s10535-011-0166-8. [DOI] [Google Scholar]
- 10.Sun C., Fan C., Zhang F., Niu T., Sun Y., Guo X. Cloning and sequence analysis of ps1A1 and ps1A2 genes amplified specifically from the chloroplast and of maintainer of CMS Sorghum. Chin. J. Appl. Environ. Biol. 2003;9:501–505. [Google Scholar]
- 11.Nielsen A.Z., Ziersen B., Jensen K., Lassne L.M., Olsen C.E., Moller B.L., Jensen P.E. Redirecting photosynthetic reducing power toward bioactive natural product synthesis. ACS Synth. Biol. 2013;2:308–315. doi: 10.1021/sb300128r. [DOI] [PubMed] [Google Scholar]
- 12.Wicke S., Schneeweiss G.M., Depamphilis C.W., Kai F.M., Quandt D. The evolution of the plastid chromosome in land plants: Gene content, gene order, gene function. Plant Mol. Biol. 2011;76:273–297. doi: 10.1007/s11103-011-9762-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Wolfe K.H., Mordent C.W., Ems S.C., Palmer J.D. Rapid evolution of the plastid translational apparatus in a nonphotosynthetic plant: Loss or accelerated sequence evolution of tRNA and ribosomal protein genes. Mol. Evol. 1992;35:304–317. doi: 10.1007/BF00161168. [DOI] [PubMed] [Google Scholar]
- 14.Jansen R.K., Cai Z.Q., Raubeson L.A., Daniell H., dePamphilis C.W., Leebeans-Mack J., Müller K.F., Guisinger-Bellian M., Haberle R.C., Hansen A.K., et al. Analysis of 81 genes from 64 plastid genomes resolves relationships in angiosperms and identifies genome-scale evolutionary patterns. Proc. Natl. Acad. Sci. USA. 2007;104:19369–19374. doi: 10.1073/pnas.0709121104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.He L., Qian J., Sun Z.Y., Xu X.L., Chen S.L. Complete chloroplast genome of medicinal plant Lonicera japonica: Genome rearrangement, intron gain and loss, and implications for phylogenetic studies. Molecules. 2017;22:249. doi: 10.3390/molecules22020249. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Soetaert S., Van Nieuwerburgh F., Brodelius P., Goossens A., Deforce D. Transcriptome analysis of apical and sub-apical cells of Artemisia annua trichomes with next-generation-sequencing; Proceedings of the 10th International Meeting on All Aspects of the Chemistry and Biology of Terpenes and Isoprenoids (Terpnet 2011): Biosynthesis and Function of Isoprenoids in Plants, Microorganisms and Parasites; Kalmar, Sweden. 22–26 May 2011; p. 170. [Google Scholar]
- 17.Soetaert S.S., Neste C.M.V., Vandewoestyne M.L., Head S.R., Goossens A., Van Nieuwerburgh F.C., Deforce D.L. Differential transcriptome analysis of glandular and filamentous trichomes in Artemisia annua. BMC Plant Biol. 2013;13:220. doi: 10.1186/1471-2229-13-220. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Graham I.A., Besser K., Blumer S., Branigan C.A., Czechowski T., Elias L., Guterman I., Harvey D., Issac P.G., Khan A.M., et al. The genetic map of Artemisia annua L. identifies loci affecting yield of the antimalarial drug artemisinin. Science. 2010;327:327–331. doi: 10.1126/science.1182612. [DOI] [PubMed] [Google Scholar]
- 19.Chen S.L., Song J.Y. Herbgenomics. China J. Chin. Mater. Med. 2016;41:3881–3889. doi: 10.4268/cjcmm20162101. [DOI] [PubMed] [Google Scholar]
- 20.Chen S.L., Song J.Y., Sun C., Xu J., Zhu Y.J., Verpoorte R., Fan T.P. Herbal genomics: Examining the biology of traditional medicines. Science. 2015;347:27–29. [Google Scholar]
- 21.Nie X., Lv S., Zhang Y., Du X., Wang L., Biradar S.S., Tan X., Wan F., Weining S. Complete chloroplast genome sequence of a major invasive species, crofton weed (Ageratina adenophora) PLoS ONE. 2012;7:e36869. doi: 10.1371/journal.pone.0036869. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Ding P., Shao Y., Li Q., Gao J., Zhang R., Lai X., Wang D., Zhang H. The complete chloroplast genome sequence of the medicinal plant Andrographis paniculata. Mitochondr. DNA. 2016;27:2347–2348. doi: 10.3109/19401736.2015.1025258. [DOI] [PubMed] [Google Scholar]
- 23.Jia Y., Yang J., He Y.L., He Y., Niu C., Gong L.-L., Li Z.-H. Characterization of the whole chloroplast genome sequence of Acer davidii Franch (Aceraceae) Conserv. Genet. Resour. 2016;8:141–143. doi: 10.1007/s12686-016-0530-2. [DOI] [Google Scholar]
- 24.Xiang B., Li X., Qian J., Wang L., Ma L., Tian X., Wang Y. The complete chloroplast genome sequence of the medicinal plant Swertia mussotii. Using the PacBio RS II platform. Molecules. 2016;21:1029. doi: 10.3390/molecules21081029. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Clegg M.T., Gaut B.S., Learn G.H., Morton B.R. Rates and patterns of chloroplast DNA evolution. Proc. Natl. Acad. Sci. USA. 1994;91:6795–6801. doi: 10.1073/pnas.91.15.6795. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Liu Y., Huo N., Dong L., Wang Y., Zhang S., Yooung H.A., Feng X., Gu Y.Q. Complete chloroplast genome sequences of Mongolia medicine Artemisia frigida and phylogenetic relationships with other plants. PLoS ONE. 2013;8:e57533. doi: 10.1371/journal.pone.0057533. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Boudreau E., Takahashi Y., Lemieux C., Turmel M., Rochaix J.D. The chloroplast ycf3 and ycf4 open reading frames of Chlamydomonas reinhardtii are required for the accumulation of the photosystem I complex. EMBO J. 1997;16:6095–6104. doi: 10.1093/emboj/16.20.6095. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Naver H., Boudreau E., Rochaix J.D. Functional studies of Ycf3: Its role in assembly of photosystem I and interactions with some of its subunits. Plant Cell. 2001;13:2731–2745. doi: 10.1105/tpc.13.12.2731. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Bryan G.J., McNicol J.W., Meyer R.C., Ramsay G., De Jong W.S. Polymorphic simple sequence repeat markers in chloroplast genomes of Solanaceous plants. Theor. Appl. Genet. 1999;99:859–867. doi: 10.1007/s001220051306. [DOI] [Google Scholar]
- 30.Provan J. Novel chloroplast microsatellites reveal cytoplasmic variation in Arabidopsis thaliana. Mol. Ecol. 2000;9:2183–2185. doi: 10.1046/j.1365-294X.2000.105316.x. [DOI] [PubMed] [Google Scholar]
- 31.Flannery M.L., Mitchell F.J., Coyne S., Kavanagh T.A., Burke J.I., Salamin N., Dowding P., Hodkinson T.R. Plastid genome characterisation in Brassica and Brassicaceae using a new set of nine SSRs. Theor. Appl. Genet. 2006;113:1221–1231. doi: 10.1007/s00122-006-0377-0. [DOI] [PubMed] [Google Scholar]
- 32.Ebert D., Peakall R. Chloroplast simple sequence repeats (cpSSRs): Technical resources and recommendations for expanding cpSSR discovery and applications to a wide array of plant species. Mol. Ecol. Resour. 2009;9:673–690. doi: 10.1111/j.1755-0998.2008.02319.x. [DOI] [PubMed] [Google Scholar]
- 33.Zhihai H., Jiang X., Shuiming X., Baosheng L., Yuan G., Chaochao Z., Xiaohui Q., Wen X., Shilin C. Comparative optical genome analysis of two pangolin species: Manis pentadactyla and Manis javanica. Gigascience. 2016;5:1–5. doi: 10.1093/gigascience/giw001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Ni L.H., Zhao Z.L., Xu H.X., Chen S.L., Dorje G. The complete chloroplast genome of Gentiana straminea (Gentianaceae), an endemic species to the Sino-Himalayan subregion. Gene. 2016;577:281–288. doi: 10.1016/j.gene.2015.12.005. [DOI] [PubMed] [Google Scholar]
- 35.Raubeson L.A., Peery R., Chumley T.W., Dziubek C., Fourcade H.M., Boorem J.L., Jansen R.K. Comparative chloroplast genomics: Analyses including new sequences from the angiosperms Nuphar advena and Ranunculus macranthus. BMC Genom. 2007;8:174–201. doi: 10.1186/1471-2164-8-174. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Wang R.J., Cheng C.L., Chang C.C., Wu C.L., Su T.M., Chaw S.M. Dynamics and evolution of the inverted repeat-large single copy junctions in the chloroplast genomes of monocots. BMC Evol. Biol. 2008;8:36–50. doi: 10.1186/1471-2148-8-36. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.De Cambiaire J.C., Otis C., Lemieux C., Turmel M. The complete chloroplast genome sequence of the chlorophycean green alga Scenedesmus obliquus reveals a compact gene organization and a biased distribution of genes on the two DNA strands. BMC Evol. Biol. 2006;6:37–52. doi: 10.1186/1471-2148-6-37. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Lee J., Kang Y., Shin S.C., Park H., Lee H. Combined analysis of the chloroplast genome and transcriptome of the Antarctic vascular plant Deschampsia antarctica Desv. PLoS ONE. 2014;9:e92501. doi: 10.1371/journal.pone.0092501. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Ma J., Yang B., Zhu W., Sun L., Tian J., Wang X. The complete chloroplast genome sequence of Mahonia bealei (Berberidaceae) reveals a significant expansion of the inverted repeat and phylogenetic relationship with other angiosperms. Gene. 2013;528:120–131. doi: 10.1016/j.gene.2013.07.037. [DOI] [PubMed] [Google Scholar]
- 40.Kim K.J., Choi K.S., Jansen R.K. Two chloroplast DNA inversions originated simultaneously during the early evolution of the sunflower family (Asteraceae) Mol. Biol. Evol. 2005;22:1783–1792. doi: 10.1093/molbev/msi174. [DOI] [PubMed] [Google Scholar]
- 41.Timme R.E., Kuehl J.V., Boore J.L., Jansen R.K. A comparative analysis of the Lactuca and Helianthus (Asteraceae) plastid genomes: Identification of divergent regions and categorization of shared repeats. Am. J. Bot. 2007;94:302–312. doi: 10.3732/ajb.94.3.302. [DOI] [PubMed] [Google Scholar]
- 42.Cai Z., Penaflor C., Kuehl J.V., Leebens-Mack J., Carlson J.E., dePamphilis C.W., Boore J.L., Jansen R.K. Complete plastid genome sequences of Drimys, Liriodendron, and Piper: Implications for the phylogenetic relationships of magnoliids. BMC Evol. Biol. 2006;6:77–97. doi: 10.1186/1471-2148-6-77. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Hansen D.R., Dastidar S.G., Cai Z., Penaflor C., Kuehl J.V., Boore J.L., Janse K. Phylogenetic and evolutionary implications of complete chloroplast genome sequences of four early-diverging angiosperms: Buxus (Buxaceae), Chloranthus (Chloranthaceae), Dioscorea (Dioscoreaceae), and Illicium (Schisandraceae) Mol. Phylogenet. Evol. 2007;45:547–563. doi: 10.1016/j.ympev.2007.06.004. [DOI] [PubMed] [Google Scholar]
- 44.Raubeson L.A., Jansen R.K. Chloroplast DNA evidence on the ancient evolutionary split in vascular land plants. Science. 1992;255:1697–1699. doi: 10.1126/science.255.5052.1697. [DOI] [PubMed] [Google Scholar]
- 45.Kumar S., Hahn F.M., Mcmahan C.M., Cornish K., Whalen M.C. Comparative analysis of the complete sequence of the plastid genome of Parthenium argentatum and identification of DNA barcodes to differentiate Parthenium species and lines. BMC Plant Biol. 2009;9:131–143. doi: 10.1186/1471-2229-9-131. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Jansen R.K., Palmer J.D. A chloroplast DNA inversion marks an ancient evolutionary split in the sunflower family (Asteraceae) Proc. Natl. Acad. Sci. USA. 1987;84:5818–5822. doi: 10.1073/pnas.84.16.5818. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Doyle J.J., Davis J.I., Soreng R.J., Garvin D., Anderson M.J. Chloroplast DNA inversions and the origin of the grass family (Poaceae) Proc. Natl. Acad. Sci. USA. 1992;89:7722–7726. doi: 10.1073/pnas.89.16.7722. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Palmer J.D., Nugent J.M., Herbon L.A. Unusual structure of geranium chloroplast DNA: A triple-sized inverted repeat, extensive gene duplications, multiple inversions, and two repeat families. Proc. Natl. Acad. Sci. USA. 1987;84:769–773. doi: 10.1073/pnas.84.3.769. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Ogihara Y., Terachi T., Sasakuma T. Intramolecular recombination of chloroplast genome mediated by short direct-repeat sequences in wheat species. Proc. Natl. Acad. Sci. USA. 1988;85:8573–8577. doi: 10.1073/pnas.85.22.8573. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Panero J.L., Funk V.A. The value of sampling anomalous taxa in phylogenetic studies: Major clades of the Asteraceae revealed. Mol. Phylogenet. Evol. 2008;47:757–782. doi: 10.1016/j.ympev.2008.02.011. [DOI] [PubMed] [Google Scholar]
- 51.Fernandez I.A., Aguilar J.F., Panero J.L., Feliner G.N. A phylogenetic analysis of Doronicum (Asteraceae, Senecioneae) based on morphological, nuclear ribosomal (ITS), and chloroplast (trnL-F) evidence. Mol. Phylogenet. Evol. 2001;20:41–64. doi: 10.1006/mpev.2001.0954. [DOI] [PubMed] [Google Scholar]
- 52.Chen S.B., Peng Y., Chen S.L., Xiao P.G. Introduction of Pharmaphylogeny. Mod. Tradit. Chin. Med. Mater. Med. World Sci. Technol. 2005;7:97–103. [Google Scholar]
- 53.Shi Q.H., Yao Z.P., Zhang H., Xu L., Dai P.H. Comparison of four methods of DNA extraction from Chickpea. J. Xinjiang Agric. Univ. 2009;1:64–67. [Google Scholar]
- 54.Urreizti R., Garcia-Giralt N., Riancho J.A., Gibzakez-Macias J., Civit S., Guerris R., Yoskovitz G., Sarrion P., Mellivobsky L., Diez-Perez A., et al. COL1A1, haplotypes and hip fracture. J. Bone Miner. Res. 2012;27:950–953. doi: 10.1002/jbmr.1536. [DOI] [PubMed] [Google Scholar]
- 55.Deng P., Wang L., Cui L., Feng K., Liu F., Du X., Tong W., Niu X., Ji W., Weining S. Global identification of MicroRNAs and their targets in barley under salinity stress. PLoS ONE. 2015;10:e0137990. doi: 10.1371/journal.pone.0137990. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Gogniashvili M., Naskidashvili P., Bedoshvili D., Kotorashcili N., Kotaria N., Beridze T. Complete chloroplast DNA sequences of Zanduri wheat (Triticum, spp.) Genet. Resour. Crop Evol. 2015;62:1269–1277. doi: 10.1007/s10722-015-0230-x. [DOI] [Google Scholar]
- 57.Boetzer M., Henkel C.V., Jansen H.J., Butler D., Pirovano W. Scaffolding pre-assembled contigs using SSPACE. Bioinformatics. 2011;27:578–579. doi: 10.1093/bioinformatics/btq683. [DOI] [PubMed] [Google Scholar]
- 58.Acemel R.D., Tena J.J., Irastorzaazcarate I., Marletaz F., Comez-Marin C., de la Calle-Mustienes E., Bertrand S., Diaz S.G., Aldea D., Aury J.M., et al. A single three-dimensional chromatin compartment in amphioxus indicates a stepwise evolution of vertebrate Hox bimodal regulation. Nat. Genet. 2016;48:336–341. doi: 10.1038/ng.3497. [DOI] [PubMed] [Google Scholar]
- 59.Liu C., Shi L., Zhu Y., Chen H., Zhang J., Lin X., Guan X. CpGAVAS, an integrated web server for the annotation, visualization, analysis, and GenBank submission of completely sequenced chloroplast genome sequences. BMC Genom. 2012;13:715. doi: 10.1186/1471-2164-13-715. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Wyman S.K., Jansen R.K., Boore J.L. Automatic annotation of organellar genomes with DOGMA. Bioinformatics. 2004;20:3252–3255. doi: 10.1093/bioinformatics/bth352. [DOI] [PubMed] [Google Scholar]
- 61.Schattner P., Brooks A.N., Lowe T.M. The tRNAscan-SE, snoscan and snoGPS web servers for the detection of tRNAs and snoRNAs. Nucleic Acids Res. 2005;33:686–689. doi: 10.1093/nar/gki366. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Lohse M., Drechsel O., Bock R. Organellar Genome DRAW (OGDRAW): A tool for the easy generation of high-quality custom graphical maps of plastid and mitochondrial genomes. Curr. Genet. 2007;52:267–274. doi: 10.1007/s00294-007-0161-y. [DOI] [PubMed] [Google Scholar]
- 63.Tamura K., Peterson D., Peterson N., Stecher G., Nei M., Kumar S. MEGA5: Molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol. Biol. Evol. 2011;28:2731–2739. doi: 10.1093/molbev/msr121. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Kurtz S., Phillippy A., Delcher A.L., Smoot M., Shumway M., Antonescu C., Salzberg S.L. Versatile and open software for comparing large genomes. Genome Biol. 2004;5:R12. doi: 10.1186/gb-2004-5-2-r12. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Mayor C., Brudno M., Schwartz J.R., Poliakov A., Rubin E.M., Frazer K.A., Pachter L.S., Dubchak I. VISTA: Visualizing global DNA sequence alignments of arbitrary length. Bioinformatics. 2000;16:1046–1047. doi: 10.1093/bioinformatics/16.11.1046. [DOI] [PubMed] [Google Scholar]
- 66.Frazer K.A., Pachter L., Poliakov A., Rubin E.M., Dubchak I. VISTA: Computational tools for comparative genomics. Nucleic Acids Res. 2004;32:273–279. doi: 10.1093/nar/gkh458. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Yang X.M., Sun J.T., Xue X.F., Zhu W.C., Hong X.Y. Development and characterization of 18 novel EST-SSRs from the Western Flower Thrips, Frankliniella occidentalis (Pergande) Int. J. Mol. Sci. 2012;13:2863–2876. doi: 10.3390/ijms13032863. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Kurtz S., Choudhuri J.V., Ohlebusch E., Schleiermacher C., Stoye J., Giegerich R. REPuter: The manifold applications of repeat analysis on a genomic scale. Nucleic Acids Res. 2001;29:4633–4642. doi: 10.1093/nar/29.22.4633. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Larkin M.A., Blackshields G., Brown N.P., Chenna R., McGettigan P.A., McWilliam H., Valentin F., Wallace I.M., Wilm A., Lopez Z., et al. Clustal W and Clustal X version 2.0. Bioinformatics. 2007;23:2947–2948. doi: 10.1093/bioinformatics/btm404. [DOI] [PubMed] [Google Scholar]
- 70.Posada D. jModelTest: Phylogenetic model averaging. Mol. Biol. Evol. 2008;25:1253–1259. doi: 10.1093/molbev/msn083. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.