Abstract
This study reports the whole chloroplast genome of Fagus crenata (subgenus Fagus), a foundation tree species of Japanese temperate forests. The genome has a total of 158,227 bp containing 111 genes, including 76 protein-coding genes, 31 tRNA genes and 4 ribosomal RNA genes. Comparison with the only other published Fagus chloroplast genome, F. engeleriana (subgenus Engleriana) shows that the genomes are relatively conserved with no inversions or rearrangements observed while the proportion of nucleotide sites differing between the two species was equal to 0.0018. The six most variable regions were, in increasing order of variability, psbK-psbI, trnG-psbfM, rpl32, trnV, ndhI-ndh and ndhD-psaC. These highly variable chloroplast regions in addition to 160 chloroplast microsatellites identified (of which 46 were variable between the two species) will provide useful genetic resources for studies of the inter- and intra-specific genetic structure and diversity of this important northern hemisphere tree genus.
Keywords: Beech, Chloroplast SSRs, Fagaceae phylogeny, Fagus crenata, Whole chloroplast genome, Chloroplast microsatellites
Introduction
The genus Fagus is a major tree of temperate forests of the northern hemisphere with two informal subgenera recognized (Shen, 1992): Engleriana with three species and Fagus with seven species (Oh, 2015; Renner et al., 2016). The genus has been the focus of intensive genetic studies over the last 30 years enabling insights into relationships of the extant species (Denk, Grimm & Hemleben, 2005), the impact of the interglacial-glacial cycles on extant genetic diversity (Fujii et al., 2002; Magri et al., 2006) and predictions of the impacts of ongoing climate change (Csilléry et al., 2014). However, despite the significance of the genus there remains a dearth of Next Generation Sequencing based-genetic resources for Fagus, including for the chloroplast genome, with the whole chloroplast genome of only a single species, the Chinese endemic F. engleriana of subgenus Engleriana (Yang et al., 2018), so far published.
This study reports the whole chloroplast genome of the Japanese endemic Fagus crenata, the first reported of subgenus Fagus. This species is a foundation tree of Japan’s cool temperate forest ecosystem and is distributed widely from the mountains of southern Kyushu (31.4° N 130.8° E) to southern Hokkaido (42.8° N 140.2° E). Phylogeographic studies based on Sanger sequencing of small portions of the chloroplast genome have revealed strong geographic structuring of chloroplast haplotypes (Fujii et al., 2002; Okaura & Harada, 2002) that, combined with fossil pollen data (Tsukada, 1982), suggests the species persisted in multiple coastal refugia and has occupied most of its current wide geographic range in the postglacial. Here we report the whole chloroplast genome sequence of F. crenata and compare it to the genome of F. engleriana (subgenus Engleriana). This data will be a useful genetic resource for investigating the phylogenetic relationship of Fagus and for developing chloroplast-based genetic markers, including both single nucleotide polymorphism- and microsatellite-based markers.
Materials and Methods
Next Generation Sequencing and chloroplast genome assembly
Whole genomic DNA was extracted from a single sample of F. crenata collected from Daisengen Peak, Hokkaido, Japan (41.616° N–140.1333° E) representing the F. crenata chloroplast haplotype A (following Fujii et al., 2002) using a modified CTAB protocol (Doyle, 1990). DNA concentration and quality were assessed by agarose gel electrophoresis and a Qubit 2.0 fluorometer (Life Technologies). A total of 9 µg of DNA was sent to the Beijing Genomic Institute where short-size Truseq DNA libraries were constructed and paired-end sequencing (2 × 100 bp) was performed on an Illumina HiSeq2000 Genome Analyser resulting in a total of 7,223,910 reads (the raw sequence reads are deposited in NCBI BioProject Database Accession number: PRJNA528838).
Assembly of chloroplast DNA from the whole genomic sequencing data was undertaken in Novoplasty 2.6.3 (Dierckxsens, Mardulyn & Smits, 2016), a seed-and-extend algorithm that is designed for the specific purpose of assembling chloroplast genomes from whole genome sequencing data, starting from a chloroplast seed sequence (trnK-matK of haplotype A: Genbank accession AB046492). This resulted in nine chloroplast contigs varying in length from 2,748 to 43,982 bp constructed from 230,360 chloroplast reads (3.19% of the total reads) with an average read coverage of the chloroplast genome of 145. The nine contigs were ordered and oriented using the F. engleriana whole chloroplast genome (KX852398) as a reference and the complete chloroplast sequence of F. crenata was constructed by connecting overlapping terminal sequences. Sanger sequencing was undertaken to check the accuracy of assembly of the joins of the nine contigs and the inverted repeat and single copy regions and also the sequences of the most diverged sites between F. crenata and F. engleriana (see “Results and Discussion”). A total of 8,146 bp was sequenced using 15 primer pairs and no differences were observed with the F. crenata genome apart from those due to inaccurate sequence at the terminal ends of the Sanger sequences.
Chloroplast genome annotation
The annotation of the chloroplast genome was performed using the online program Dual Organellar Genome Annotator (Wyman, Jansen & Boore, 2004). Initial annotation, putative starts, stops and intron positions were determined according to comparisons with homologous genes of F. engleriana chloroplast genome using Geneious v9.0.5 (Biomatters, Auckland, New Zealand). A circular gene map was drawn by the OrganellaGenomeDRAW tool (OGDRAW) followed by manual modification (Lohse, Drechsel & Bock, 2007).
Phylogenetic analysis and assessment of divergent regions
A multiple sequence alignment of F. crenata, F. engleriana, representative whole chloroplast genomes of the Fagaceae family and outgroups from Betulaceae, Juglandaceae and Myricaceae obtained from Genbank was constructed using T-Coffee using default parameters (Notredame, Higgins & Heringa, 2000). Subsequently, Gblocks v0.91b (Castresana, 2000) was used to identify homologous blocks of DNA and remove poorly aligned and divergent regions of the chloroplast genomes. RAxML NG (Kozlov et al., 2018) was then used to construct a maximum likelihood phylogenetic tree using the most appropriate DNA substitution model, TVM+I+G, as estimated in jModelTest 2.1.10 (Darriba et al., 2012) and 1,000 bootstrap replicates.
Pairwise nucleotide differences (p-distance) between the sequences of the Gblocks alignment were calculated in Mega 7 (Kumar, Stecher & Tamura, 2016) excluding parts of the sequence alignment with gaps. The coding genes, non-coding regions and intron regions were compared between the alignment of the two Fagus chloroplast genomes to detect divergence hotspots. We examined 101 regions (39 coding genes, 52 intergenic spacers and 10 intron regions) of the two Fagus species for nucleotide variability (Pi) values calculated in DnaSP v5.0 (Librado & Rozas, 2009).
Identification of chloroplast microsatellites
Chloroplast microsatellite regions shared in both F. crenata and F. engleriana were searched for in an alignment of the two full chloroplast genomes (constructed by MAFFT v7.308 (Katoh et al., 2002) under default settings) using Phobos Tandem Repeat Finder (Mayer, 2008) implemented in Geneious v9.0.5. Microsatellite in either of the sequences with a repeat unit length of 1–2 bp were searched for using a minimum length of 10 bp while those with a repeat length of 3–6 bp were selected if they displayed three or more repeats.
Results and Discussion
The assembled whole chloroplast genome of F. crenata has a total of 158,227 bp (Fig. 1: Genbank accession number MH171101) and consisted of an 87,557 bp large single copy region, a 18,928 bp small single copy region and two inverted repeats 25,871 bp in length. The genome contained 111 genes, including 76 protein-coding genes, 31 tRNA genes and 4 ribosomal RNA genes (see DatasetS1 for the genbank file of the chloroplast genome). The Gblocks alignment consisted of 143,882 bp of non-gapped sequence of which 11.48% of sites were variable (see DatasetS2 for Gblocks alignment). The resulting best ML tree had similar relationships to previous studies with Fagus as sister to all other Fagaceae (Manos & Steele, 1997) (Fig. 2). Fagus crenata and F. engleriana formed a strongly diverged clade consistent with previous evidence of the large divergence of Fagus from all other Fagaceae genera (Heenan & Smissen, 2013). The proportion of nucleotide sites that differed (p-distance) between F. crenata and F. engleriana was 0.0018 which was lower than any other pairwise differences observed including between five Quercus species which had values between 0.0035 and 0.0047 (average = 0.0042) (see DatasetS3 for a matrix of p-distances).
The two Fagus chloroplast genomes were relatively conserved (Fig. 3) with the IR region more conserved than both the large single copy (LSC) and small single copy (SSC) regions. We did not detect either inversions or translocations among the two genome sequences, and no rearrangement occurred in gene organization after verification (Fig. 4). There was high variation in nucleotide diversity values observed between the 101 regions of the two Fagus species with values ranging from 0.0003 (ycf2 gene) to 0.0781(ndhD-psaC) (Fig. 5). The six most variable regions were, in increasing order of variability, psbK-psbI, trnG-psbfM, rpl32, trnV, ndhI-ndh and ndhD-psaC of which four are located in the LSC region and two in the SSC region (Fig. 5). The nucleotide diversities of these variable regions between F. crenata and F. engleriana were higher than observed within some other studies of Fagaceae genera including East Asian (Yan et al., 2018) and Mediterranean oaks (Vitelli et al., 2017).
A total of 160 chloroplast microsatellites with a repeat unit length between 1 and 6 bp were identified based on the selection criteria in the two species of which mono- and tri-nucleotide repeat microsatellites were the most abundant with a frequency of 38.7% and 43.1%, respectively. This abundance of mono- and tri-repeats in the chloroplast is similar to a range of other angiosperms (Melotto-Passarin et al., 2011). Of these microsatellites, 46 displayed size variation between F. crenata and F. engleriana (see DatasetS4 for a table with details of all 46 variable chloroplast microsatellites). The majority (66.1%) of the variable chloroplast microsatellites were mono-nucleotide repeats while 20% of di-nucleotide repeats and both of the two hexa-nucleotide repeats were variable. On the other hand, zero of the tri-, tetra- and penta-nucleotide repeats showed size variation between the two species (Fig. 6). The length of variable versus non-variable chloroplast microsatellites was similar but with a greater length variation for variable microsatellites in both F. crenata and F. engleriana (Fig. 7).
Conclusion
Overall, the chloroplast genome of F. crenata will provide a useful genetic resource for future genetic studies into the foundation temperate tree genus Fagus. Specifically, the chloroplast genomes of both informal subgenera will provide useful references and sources of molecular markers to investigate phylogeographic patterns of the chloroplast within and between Fagus species. Some major questions are yet to be resolved in Fagus, including resolving taxonomic boundaries of western Eurasian Fagus populations which has remained a recalcitrant problem due to low marker resolution and high within-species genetic diversity (Denk et al., 2002) and the non-monophyly of the chloroplast of East Asian species as suggested by Sanger sequence-based data (Manos & Stanford, 2001; Okaura & Harada, 2002).
Supplemental Information
Acknowledgments
We would like to thank fellow lab members for their advice on this study and H. Kanehara for her assistance in the lab.
Funding Statement
This work was supported by the Japanese Society for the Promotion of Science Grant-in-Aid for Young Scientists A (Grant number 16748931); and a Forestry and Forest Products Research Institute grant (Grant number 201430). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Additional Information and Declarations
Competing Interests
The authors declare that they have no competing interests.
Author Contributions
James R. P. Worth conceived and designed the experiments, performed the experiments, analyzed the data, prepared figures and/or tables, authored or reviewed drafts of the paper, approved the final draft.
Luxian Liu conceived and designed the experiments, analyzed the data, prepared figures and/or tables, authored or reviewed drafts of the paper, approved the final draft.
Fu-Jin Wei analyzed the data, prepared figures and/or tables, approved the final draft.
Nobuhiro Tomaru conceived and designed the experiments, contributed reagents/materials/analysis tools, authored or reviewed drafts of the paper, approved the final draft.
DNA Deposition
The following information was supplied regarding the deposition of DNA sequences:
Data are available at GenBank, accession number: MH171101.
Data Availability
The following information was supplied regarding data availability:
Data are available at the BioProject database:
BioProject ID: PRJNA528838.
References
- Castresana (2000).Castresana J. Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Molecular Biology and Evolution. 2000;17(4):540–552. doi: 10.1093/oxfordjournals.molbev.a026334. [DOI] [PubMed] [Google Scholar]
- Csilléry et al. (2014).Csilléry K, Lalagüe H, Vendramin GG, González-Martínez SC, Fady B, Oddou-Muratorio S. Detecting short spatial scale local adaptation and epistatic selection in climate-related candidate genes in European beech (Fagus sylvatica) populations. Molecular Ecology. 2014;23(19):4696–4708. doi: 10.1111/mec.12902. [DOI] [PubMed] [Google Scholar]
- Darling et al. (2004).Darling ACE, Mau B, Blattner FR, Perna NT. Mauve: multiple alignment of conserved genomic sequence with rearrangements. Genome Research. 2004;14(7):1394–1403. doi: 10.1101/gr.2289704. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Darriba et al. (2012).Darriba D, Taboada GL, Doallo R, Posada D. jModelTest2: more models, new heuristics & parallel computing. Nature Methods. 2012;9(8):772. doi: 10.1038/nmeth.2109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Denk, Grimm & Hemleben (2005).Denk T, Grimm GW, Hemleben V. Patterns of molecular and morphological differentiation in Fagus (Fagaceae): phylogenetic implications. American Journal of Botany. 2005;92(6):1006–1016. doi: 10.3732/ajb.92.6.1006. [DOI] [PubMed] [Google Scholar]
- Denk et al. (2002).Denk T, Grimm G, Stögerer K, Langer M, Hemleben V. The evolutionary history of Fagus in western Eurasia: evidence from genes, morphology & the fossil record. Plant Systematics and Evolution. 2002;232(3–4):213–236. doi: 10.1007/s006060200044. [DOI] [Google Scholar]
- Dierckxsens, Mardulyn & Smits (2016).Dierckxsens N, Mardulyn P, Smits G. NOVOPlasty: de novo assembly of organelle genomes from whole genome data. Nucleic Acids Research. 2016;45(4):e18. doi: 10.1093/nar/gkw955. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Doyle (1990).Doyle JJ. Isolation of plant DNA from fresh tissue. Focus. 1990;12:13–15. [Google Scholar]
- Fujii et al. (2002).Fujii N, Tomaru N, Okuyama K, Koike T, Mikami T, Ueda K. Chloroplast DNA phylogeography of Fagus crenata (Fagaceae) in Japan. Plant Systematics and Evolution. 2002;232(1–2):21–33. doi: 10.1007/s006060200024. [DOI] [Google Scholar]
- Heenan & Smissen (2013).Heenan PB, Smissen RD. Revised circumscription of Nothofagus and recognition of the segregate genera Fuscospora. Lophozonia, & Trisyngyne (Nothofagaceae) Phytotaxa. 2013;146(1):1–31. [Google Scholar]
- Katoh et al. (2002).Katoh K, Misawa K, Kuma K, Miyata T. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Research. 2002;30(14):3059–3066. doi: 10.1093/nar/gkf436. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kozlov et al. (2018).Kozlov A, Darriba D, Flouri T, Morel B, Stamatakis A. RAxML-NG: a fast, scalable, & user-friendly tool for maximum likelihood phylogenetic inference. bioRxiv. 2018:447110. doi: 10.1093/bioinformatics/btz305. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kumar, Stecher & Tamura (2016).Kumar S, Stecher G, Tamura K. MEGA7: molecular evolutionary genetics analysis version 7.0 for bigger datasets. Molecular Biology and Evolution. 2016;33(7):1870–1874. doi: 10.1093/molbev/msw054. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Librado & Rozas (2009).Librado P, Rozas J. DnaSP v5: a software for comprehensive analysis of DNA polymorphism data. Bioinformatics. 2009;25(11):1451–1452. doi: 10.1093/bioinformatics/btp187. [DOI] [PubMed] [Google Scholar]
- Lohse, Drechsel & Bock (2007).Lohse M, Drechsel O, Bock R. OrganellarGenomeDRAW (OGDRAW): a tool for the easy generation of high-quality custom graphical maps of plastid and mitochondrial genomes. Current Genetics. 2007;52(5–6):267–274. doi: 10.1007/s00294-007-0161-y. [DOI] [PubMed] [Google Scholar]
- Magri et al. (2006).Magri D, Vendramin GG, Comps B, Dupanloup I, Geburek T, Gömöry D, Latałowa M, Litt T, Paule L, Roure JM, Tantau I, van der Knaap WO, Petit RJ, de Beaulieu J-L. A new scenario for the Quaternary history of European beech populations: palaeobotanical evidence and genetic consequences. New Phytologist. 2006;171(1):199–221. doi: 10.1111/j.1469-8137.2006.01740.x. [DOI] [PubMed] [Google Scholar]
- Manos & Stanford (2001).Manos PS, Stanford AM. The historical biogeography of Fagaceae: tracking the tertiary history of temperate and subtropical forests of the northern hemisphere. International Journal of Plant Sciences. 2001;162(S6):S77–S93. doi: 10.1086/323280. [DOI] [Google Scholar]
- Manos & Steele (1997).Manos PS, Steele KP. Phylogenetic analyses of “higher” Hamamelididae based on plastid sequence data. American Journal of Botany. 1997;84(10):1407–1419. doi: 10.2307/2446139. [DOI] [PubMed] [Google Scholar]
- Mayer (2008).Mayer C. Phobos, a tandem repeat search tool for complete genomes. Version 3:12http://www.ruhr-uni-bochum.de/ecoevo/cm/cm_phobos.htm 2008
- Melotto-Passarin et al. (2011).Melotto-Passarin DM, Tambarussi EV, Dressano K, De Martin VF, Carrer H. Characterization of chloroplast DNA microsatellites from Saccharum spp and related species. Genetics and Molecular Research. 2011;10(3):2024–2033. doi: 10.4238/vol10-3gmr1019. [DOI] [PubMed] [Google Scholar]
- Notredame, Higgins & Heringa (2000).Notredame C, Higgins DG, Heringa J. T-Coffee: a novel method for fast and accurate multiple sequence alignment. Journal of Molecular Biology. 2000;302(1):205–217. doi: 10.1006/jmbi.2000.4042. [DOI] [PubMed] [Google Scholar]
- Oh (2015).Oh S-H. Sea, wind, or bird: origin of Fagus multinervis (Fagaceae) inferred from chloroplast DNA sequences. Korean Journal of Plant Taxonomy. 2015;45(3):213–220. doi: 10.11110/kjpt.2015.45.3.213. [DOI] [Google Scholar]
- Okaura & Harada (2002).Okaura T, Harada K. Phylogeographical structure revealed by chloroplast DNA variation in Japanese Beech (Fagus crenata Blume) Heredity. 2002;88(4):322–329. doi: 10.1038/sj.hdy.6800048. [DOI] [PubMed] [Google Scholar]
- Renner et al. (2016).Renner SS, Grimm GW, Kapli P, Denk T. Species relationships and divergence times in beeches: new insights from the inclusion of 53 young and old fossils in a birth–death clock model. Philosophical Transactions of the Royal Society B: Biological Sciences. 2016;371(1699):20150135. doi: 10.1098/rstb.2015.0135. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shen (1992).Shen C-F. A monograph of the genus Fagus Tourn. Ex L. (Fagaceae) 1992. PhD Dissertation. City University of New York.
- Tsukada (1982).Tsukada M. Late-Quaternary development of the Fagus forest in the Japanese Archipelago. Japanese Journal of Ecology. 1982;32:113–118. [Google Scholar]
- Vitelli et al. (2017).Vitelli M, Vessella F, Cardoni S, Pollegioni P, Denk T, Grimm GW, Simeone MC. Phylogeographic structuring of plastome diversity in Mediterranean oaks (Quercus Group Ilex, Fagaceae) Tree Genetics & Genomes. 2017;13(1):3. doi: 10.1007/s11295-016-1086-8. [DOI] [Google Scholar]
- Wyman, Jansen & Boore (2004).Wyman SK, Jansen RK, Boore JL. Automatic annotation of organellar genomes with DOGMA. Bioinformatics. 2004;20(17):3252–3255. doi: 10.1093/bioinformatics/bth352. [DOI] [PubMed] [Google Scholar]
- Yan et al. (2018).Yan M, Xiong Y, Liu R, Deng M, Song J. The application & limitation of universal chloroplast markers in discriminating East Asian evergreen oaks. Frontiers in Plant Science. 2018;9:569. doi: 10.3389/fpls.2018.00569. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yang et al. (2018).Yang Y, Zhu J, Feng L, Zhou T, Bai G, Yang J, Zhao G. Plastid genome comparative & phylogenetic analyses of the key genera in Fagaceae: highlighting the effect of codon composition bias in phylogenetic inference. Frontiers in Plant Science. 2018;9:82. doi: 10.3389/fpls.2018.00082. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The following information was supplied regarding data availability:
Data are available at the BioProject database:
BioProject ID: PRJNA528838.