Abstract
Bactris gasipaes var. gasipaes (Arecaceae, Palmae) is an economically and socially important plant species for populations across tropical South and Central America. It has been domesticated from its wild variety, B. gasipaes var. chichagui, since pre-Columbian times. In this study, we sequenced the plastome of the cultivated variety, B. gasipaes Kunth var. gasipaes and compared it with the published plastome of the wild variety. The chloroplast sequence obtained was 156,580 bp. The cultivated chloroplast sequence was conserved compared to the wild type sequence with 99.8% of nucleotide identity. We did, however, identify multiple Single Nucleotide Variants (SNVs), insertions, microsatellites and a resolved region of missing nucleotides. A SNV in one of the core barcode markers (matK) was detected between the wild and cultivated accessions. Phylogenetic analysis was carried out across the Arecaceae family and compared to previous reports, resulting in an identical topology. This study is a step forward in understanding the genome evolution of this species.
Keywords: Chloroplast, plastome, Novoplasty, peach palm, phylogeny
Introduction
The palm family Arecaceae (Palmae) consists of more than 2,500 species (Dransfield et al. 2008), including macroeconomical important taxa. The only fully-domesticated palm from the Neotropics since pre-Columbian times is Bactris gasipaes Kunth (Clement 1988). This species is cultivated from Brazil to Mexico, where it is important for local populations and a staple food for Ameridian people (Clement 1988; Graefe et al. 2013). Two varieties are recognized within the species: the cultivated or domesticated variety B. gasipaes var. gasipaes and the wild variety B. gasipaes var. chichagui (Henderson 2000; Couvreur et al. 2007). Both varieties are quite similar in their overall vegetative morphology. However, the fruits from the domesticated type are much larger (3–8 cm in diameter versus 1–2 cm in the wild type, Henderson 2000) with a thicker mesocarp, being up to two hundred times heavier than the wild fruit, which represents a clear domestication syndrome (Clement et al. 2021). The fruits of Bactris gasipaes have traditionally been consumed as a source of carbohydrates and lipids throughout the Neotropics, and are generally prepared as fermented drinks (e.g. chicha), flours or eaten as such after cooking (Clement and Urpí 1897). The more recent and modern commercial exploitation of B. gasipaes palm hearts is widely extended into tropical lowlands of Central and South America, as well as the use of its wood for furniture and construction (Montúfar and Rosas 2013, Couvreur et al. 2006).
Molecular studies have focused on characterizing its local diversity and germplasm collections, as well as exploring the origins of the domestication process using molecular markers (SSRs, RAPD) and chloroplastic sequences (Hernández-Ugalde et al. 2011; Rodrigues et al. 2005; Galluzzi et al. 2015; Clement et al. 2017; Santos da Silva et al. 2021) or to understand the genetic relationships and gene flow between both varieties (Couvreur et al. 2006, Couvreur et al. 2007; Hernández-Ugalde et al. 2011). In this context, it is necessary to develop new genomic tools to explore evolutionary, ecological and agricultural issues, in particular to better unravel its intriguing domestication history across the Neotropics (Galluzzi et al. 2015; Clement et al. 2017). A complete chloroplast sequence for B. gasipaes Kunth var. chichagui (the wild variety) was recently published (Santos da Silva et al. 2021) and opened the way to explore the origins of its domestication. The goal of this work was (i) to characterize the complete plastome of B. gasipaes var. gasipaes (domesticated variety), (ii) compare it with the plastome of B. gasipaes var. chichagui (wild variety), and (iii) reconstruct a phylogenetic tree using this newly acquired plastome with different species of the Arecaceae family.
Materials and methods
We sampled a domesticated individual of Bactris gasipaes Kunth var. gasipaes from North-Western Ecuador, in the Maship area (farm of Alejandro Solano, 0°10′54.1″N 78°54′37.1″W). The fruits of this specimen were also collected but were immature at the time, and thus no measurements were made. The young palm heart was collected in the field and immediately conserved in liquid nitrogen until total DNA was extracted the protocol of Mariac et al. (1970). The NGS library preparation follows Mariac et al. (2014). Total DNA extracted from leaves was sequenced (paired end, 150 bp) using Novaseq 6000 Illumina platform at the Novogene Co., Ltd. facilities. Sequence data were submitted to NCBI SRA section under the BioSample accession SAMN27503645.
Reads obtained were filtered by quality using Fastp. Kraken2 was used to filter possible contamination in the reads from other organisms using a database (PlusPFP) (Wood et al. 2019). NOVOPLASTY (Dierckxsens et al. 2017) was used to assemble the Bactris gasipaes var. gasipaes chloroplast sequence based on the Elaeis guineensis chloroplast reference genome (NC_017602.1). Ten million pair-end reads were sampled and used.
CPGAVAS2 (Shi et al. 2019) was used to annotate the chloroplast sequence, graphical representation was obtained using Chloroplot (Zheng et al. 2020). A dot-plot was constructed to compare B. gasipaes var. gasipaes and B. gasipaes var. chichagui chloroplast sequences using Gepard (Krumsiek et al. 2007). A pairwise alignment using BLASTn and diffseq (EMBOSS) was constructed with B. gasipaes Kunth var. chichagui, to analyze the presence of Single Nucleotide Variants (SNVs), insertions and deletions (Altschul et al. 1990; Aggeli et al. 2018). IRscope was used to analyze chloroplast junctions between inverted repeats and single copy regions (Amiryousefi et al. 2018). Finally, a phylogenetic tree was constructed between closed species of the family Arecaceae.
We sampled 17 outgroup palm species covering all subfamilies (Baker et al. 2009), and one species from the sister family to palms Dasypogonaceae (Dasypogon bromeliifolius) (Givnish et al. 2018). We also included the recently sequenced plastome of the wild variety B. gasipaes var. chichagui. No large rearrangement was identified between the sequences using dot-plot alignments. Plastomes were aligned using MAFFT version 7 (Katoh et al. 2019). Phylogenetic inferences were carried out using RAxML version 7.2.7 using GTRCAT substitution model with all sites of the chloroplast sequence without gaps using the maximum likelihood method with bootstrap of 1,000 replicates (Stamatakis 2015).
The botanically vouchered specimen was deposited at the Herbario QCA (https://bioweb.puce.edu.ec/QCA; Pontificia Universidad Católica del Ecuador, Quito; thomas.couvreur@ird.fr) and WAG (Naturalis, Leiden, The Netherlands) herbaria under the number Couvreur & Tranbarger 1192 (collected 28 august 2018) and the DNA is deposited at the IRD center Montpellier, France (http://www.ird.fr; UMR DIADE; thomas.couvreur@ird.fr)
Results and discussion
This work exploits the ability of NGS sequencing to produce a large quantity of reads in a very short time from total DNA. These reads can be used to explore genome composition, identify variations and SNP markers, or assemble chloroplast genomes. Here, we obtained the chloroplast sequence of the domesticated variety of B. gasipaes. This is an important step toward understanding the evolution of this species.
Among the 10 million pair end reads, 174,750 reads were retained for assembly, giving an average depth of coverage of 190 X. The size of the reconstructed chloroplast genome of B. gasipaes Kunth var. gasipaes was 156,580 bp (Figure 1). A comparison between the B. gasipaes Kunth var. gasipaes and B. gasipaes Kunth var. chichagui genomes showed a contiguity through all the sequence and the presence of two inverted repetitions, common to the majority of plant chloroplast genomes (Supplementary Figure 1; Heinhorst and Cannon 1993). Even though the chloroplast sequences between these two varieties are highly similar, we observed 20 SNVs (Single Nucleotide Variants), 17 insertions of 1 base, 3 insertions of 2 bases, 2 insertions of 3 bases, 1 insertion of 4 bases and 2 insertions of 6 bases in the B. gasipaes Kunth var. gasipaes sequence. Also, we observed seven regions with mismatches, including a region with 20 unidentified nucleotides in the B. gasipaes Kunth var. chichagui that was resolved in the B. gasipaes Kunth var. gasipaes sequence assembly. We identified ten of these mutations in different coding sequences including matK, rpoB and psaA gene, among others (Supplementary table 1). CPGAVAS2 has identified 61 microsatellites for the B. gasipaes Kunth var. chichagui and 68 for B. gasipaes Kunth var. gasipaes. Among those, four are specific to the var. chichagui plastome and nine are specific to the var. gasipaes. Eighteen are conserved between both plastome, but with a length variation (Supplementary table 2).
Based on this sequence comparison between the B. gasipaes Kunth var. chichagui and B. gasipaes Kunth var. gasipaes plastomes, discrimination between these two accessions can be considered at the molecular level. DNA barcoding relies on coding and non-coding plastid markers to identify species. Generally, it is recommended to use the two plastids rbcL+matk coding regions (core markers) with eventually additional markers (CBOL Plant Working Group 2009) such as trnH-psbA, atpF-atpH, psbK-psbI or trnL (Hollingsworth et al. 2011). For this species, two barcodes rbcL (NCBI accessions JQ590428, JQ590427, JQ590426) and matK (JQ586697, JQ586696, JQ586695) have been developed by the International Barcode of Life project (iBOL; http://ibol.org). Of these, only matk shows a variation at one base and therefore could potentially be used to discriminate between wild and cultivated B. gasipaes. However, our sampling size is minimal and more samples should be sequenced to confirm this. A study using several accessions of both wild and cultivated B. gasipaes individuals did not find any variation for two non-coding plastid markers trnD-trnT and trnQ-rps16 (Couvreur et al. 2007). Alternatively, full plastomes could be used as ultra-barcodes to distinguished more reliably wild and cultivated accessions as was done in Cacao (Kane et al. 2012). Finally, different predicted chloroplast microsatellite markers could also be used for this purpose but would still need to be tested and validated.
Moreover, we annotated 89 genes, 37 tRNAs and 8 rRNAs. When comparing junctions between inverted repeats and single copy regions, we observed differences in distance between genes and junctions compared with B. gasipaes Kunth var. chichagui and other related species. These positions are crucial to understand chloroplast genome evolution because they are related with chloroplast sequence expansion or contraction (Amiryousefi et al. 2018).
The phylogenetic analysis was based on 138,382 aligned sites with no gaps and we identified previously described relationships (Figure 2), congruent with previous studies with the family (Baker et al. 2009). Indeed, phylogenetic relations between subfamilies were well supported (bootstrap support > 94). Both varieties of B. gasipaes were recovered with maximum support as sister varieties within the Bactridineae tribe as found in previous phylogenetic studies of short plastid markers (Couvreur et al. 2007).
This resource will be useful for unraveling the domestication history of the cultivated variety (Clement et al. 2017), in particular from the perspective of seed dispersal.
Supplementary Material
Funding Statement
The authors thank the LMI BIO_INCA (http://bioinca.org) and labex CEBA (http://www.labex-ceba.fr) for financial support.
Ethics permission
We are grateful to Alejandro Solano to have allowed us access to the cultivated individual sequenced in this study. This research was done with the authorization of the Ecuadorian Ministry of Environment (MAE-DNB-CM-2018-0082).
Authors’ contributions
Maria Camila Buitrago: Bioinformatic analysis and paper drafting; Rommel Montúfar: field work, supervision of the project and manuscript writing; Romain Guyot: Interpretation of data and manuscript writing, Cedric Mariac: Technical support, DNA extraction; Timothy J. Tranbarger: Field work, DNA extraction, manuscript writing, Silvia Restrepo: manuscript writing, Thomas L.P. Couvreur: Supervision of the project, field work, DNA extraction and manuscript writing.
Disclosure statement
No potential conflict of interest was reported by the author(s).
Data availability statement
The genome sequence data that support the findings of this study are openly available in Zenodo (doi: 10.5281/zenodo.6337604) and GenBank of NCBI under the accession number: OM047178. The associated Bioproject, Bio-Sample and SRA, are PRJNA825158, SAMN27503645, and SRR18697342, respectively.
References
- Aggeli D, Karas VO, Sinnott-Armstrong NA, Varghese V, Shafer RW, Greenleaf WJ, Sherlock G.. 2018. Diff-seq: a high throughput sequencing-based mismatch detection assay for DNA variant enrichment and discovery. Nucleic Acids Res. 46(7):e42. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ.. 1990. Basic local alignment search tool. J Mol Biol. 215(3):403–410. [DOI] [PubMed] [Google Scholar]
- Amiryousefi A, Hyvönen J, Poczai P.. 2018. IRscope: an online program to visualize the junction sites of chloroplast genomes. Bioinformatics. 34(17):3030–3031. [DOI] [PubMed] [Google Scholar]
- Baker WJ, Savolainen V, Asmussen-Lange CB, Chase MW, Dransfield J, Forest F, Harley MM, Uhl NW, Wilkinson M.. 2009. Complete generic-level phylogenetic analyses of palms (Arecaceae) with comparisons of supertree and supermatrix approaches. Syst Biol. 58(2):240–256. [DOI] [PubMed] [Google Scholar]
- Barrett CF, Baker WJ, Comer JR, Conran JG, Lahmeyer SC, Leebens-Mack JH, Li J, Lim GS, Mayfield-Jones DR, Perez L, et al. . 2016. Plastid genomes reveal support for deep phylogenetic relationships and extensive rate variation among palms and other commelinid monocots. New Phytol. 209(2):855–870. [DOI] [PubMed] [Google Scholar]
- Barrett CF, Davis JI, Leebens-Mack J, Conran JG, Stevenson DW.. 2013. Plastid genomes and deep relationships among the commelinid monocot angiosperms. Cladistics. 29(1):65–87. [DOI] [PubMed] [Google Scholar]
- Bethune K, Mariac C, Couderc M, Scarcelli N, Santoni S, Ardisson M, Martin JF, Montúfar R, Klein V, Sabot F, et al. . 2019. Long-fragment targeted capture for long-read sequencing of plastomes. Appl Plant Sci. 7(5):e1243. [DOI] [PMC free article] [PubMed] [Google Scholar]
- CBOL Plant Working Group . 2009. A DNA barcode for land plants. Proceedings of the National Academy of Sciences 106, p. 12794–12797. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Clement CR. 1988. Domestication of the pejibaye palm (Bactris gasipaes): past and present. Advances in economic botany, 155–174.
- Clement CR, Casas A, Parra-Rondinel FA, Levis C, Peroni N, Hanazaki N, Cortés-Zárraga L, Rangel-Landa S, Alves RP, Ferreira MJ.. 2021. Disentangling domestication from food production systems in the neotropics. Quaternary. 4(1):4. [Google Scholar]
- Clement CR, Cristo-Araújo MD, Coppens d'Eeckenbrugge G, Reis VMD, Lehnebach R, Picanço-Rodrigues D.. 2017. Origin and dispersal of domesticated peach palm. Front Ecol Evol. 5:148. [Google Scholar]
- Clement CR, Urpí JEM.. 1987. Pejibaye palm (Bactris gasipaes, Arecaceae): multi-use potential for the lowland humid tropics. Econ Bot. 41(2):302–311. [Google Scholar]
- Couvreur TLP, Billotte N, Risterucci A-M, Lara C, Vigouroux Y, Ludeña B, Pham J-L, Pintaud J-C.. 2006. Close genetic proximity between cultivated and wild Bactris gasipaes Kunth revealed by microsatellite markers in Western Ecuador. Genet Resour Crop Evol. 53(7):1361–1373. [Google Scholar]
- Couvreur TLP, Hahn WJ, Granville JJD, Pham JL, Ludena B, Pintaud JC.. 2007. Phylogenetic relationships of the cultivated Neotropical palm Bactris gasipaes (Arecaceae) with its wild relatives inferred from chloroplast and nuclear DNA polymorphisms. Systematic Botany. 32(3):519–530. [Google Scholar]
- de Santana Lopes A, Gomes Pacheco T, Nimz T, do Nascimento Vieira L, Guerra MP, Nodari RO, de Souza EM, de Oliveira Pedrosa F, Rogalski M.. 2018. The complete plastome of macaw palm [Acrocomia aculeata (Jacq.) Lodd. ex Mart.] and extensive molecular analyses of the evolution of plastid genes in Arecaceae. Planta. 247(4):1011–1030. [DOI] [PubMed] [Google Scholar]
- Dierckxsens N, Mardulyn P, Smits G.. 2017. NOVOPlasty: de novo assembly of organelle genomes from whole genome data. Nucleic Acids Res. 45(4):e18–e18. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dransfield J, Uhl NW, Asmussen CB, Baker WJ, Harley MM, Lewis CE. 2008. Genera Palmarum: The evolution and classification of palms. Kew Publishing, Kew, 732 pp. [Google Scholar]
- Galluzzi G, Dufour D, Thomas E, Zonneveld M van, Salamanca AFE, Toro AG, Rivera A, Duque HS, Baron HS, Gallego G, Scheldeman X, Mejia AG.. 2015. An Integrated hypothesis on the domestication of Bactris gasipaes. PLOS ONE 10: e0144644. doi: 10.1371/journal.pone.0144644 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Givnish TJ, Zuluaga A, Spalink D, Soto Gomez M, Lam VKY, Saarela JM, Sass C, Iles WJD, de Sousa DJL, Leebens-Mack J.. 2018. Monocot plastid phylogenomics, timeline, net rates of species diversification, the power of multi‐gene analyses, and a functional model for the origin of monocots. Am J Bot. 105(11):1888–1910. [DOI] [PubMed] [Google Scholar]
- Graefe S, Dufour D, Van Zonneveld M, Rodriguez F, Gonzalez A.. 2013. Peach palm (Bactris gasipaes) in tropical Latin America: implications for biodiversity conservation, natural resource management and human nutrition. Biodivers Conserv. 22(2):269–300. [Google Scholar]
- Heinhorst S, Cannon GC.. 1993. DNA replication in chloroplasts. J Cell Sci. 104(1):1–9. [Google Scholar]
- Henderson A. 2000. Bactris (Palmae). Bronx: Organization for Flora Neotropica. [Google Scholar]
- Hernández-Ugalde JA, Mora-Urpí J, Rocha OJ.. 2011. Genetic relationships among wild and cultivated populations of peach palm (Bactris gasipaes Kunth, Palmae): evidence for multiple independent domestication events. Genet Resour Crop Evol. 58(4):571–583. doi: 10.1007/s10722-010-9600-6 [DOI] [Google Scholar]
- Hollingsworth PM, Graham SW, Little DP.. 2011. Choosing and using a plant DNA barcode. PLoS One. 6(5):e19254. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Huang YY, Matzke AJ, Matzke M.. 2013. Complete sequence and comparative analysis of the chloroplast genome of coconut palm (Cocos nucifera). PLoS One. 8(8):e74736. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kane N, Sveinsson S, Dempewolf H, Yang JY, Zhang D, Engels JMM, Cronk Q.. 2012. Ultra-barcoding in cacao (Theobroma spp.; Malvaceae) using whole chloroplast genomes and nuclear ribosomal DNA. Am J Bot. 99(2):320–329. [DOI] [PubMed] [Google Scholar]
- Katoh K, Rozewicki J, Yamada KD.. 2019. MAFFT online service: multiple sequence alignment, interactive sequence choice and visualization. Brief Bioinform. 20(4):1160–1166. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Krumsiek J, Arnold R, Rattei T.. 2007. Gepard: a rapid and sensitive tool for creating dotplots on genome scale. Bioinformatics. 23(8):1026–1028. [DOI] [PubMed] [Google Scholar]
- Liu YY, Zhang ZJ, Jiang SF, Wang DL, Gui LJ.. 2020. The chloroplast genome of Archontophoenix alexandrae(Arecaceae): an important landscape tree for the subtropics. Mitochondrial DNA B Resour. 5(1):746–747. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mariac C, Blanc O, Zekraoui L.. 1970. High molecular weight DNA extraction from plant nuclei isolation optimized for long-read sequencing. Protocols.io
- Mariac C, Scarcelli N, Pouzadou J, Barnaud A, Billot C, Faye A, Kougbeadjo A, Maillol V, Martin G, Sabot F, et al. . 2014. Cost-effective enrichment hybridization capture of chloroplast genomes at deep multiplexing levels for population genetics and phylogeography studies. Mol Ecol Resour. 14(6):1103–1113. [DOI] [PubMed] [Google Scholar]
- Montúfar R, Rosas J.. 2013. Chontaduro, Chontilla. In: Valencia R., Montúfar R., Navarrete H., Balslev H., editors. Bactris gasipaes. Palmeras ecuatorianas: biología y uso sostenible. Quito: Publicaciones del Herbario QCA, Pontificia Universidad Católica del Ecuador. [Google Scholar]
- Rajesh MK, Gangaraj KP, Prabhudas SK, Prasad TSK.. 2020. The complete chloroplast genome data of Areca catechu (Arecaceae). Data Brief. 33:106444. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rodrigues DP, , Astolfi Filho S, , Clement CR. 2005. Molecular marker-mediated validation of morphologically defined landraces of Pejibaye (Bactris gasipaes) and their phylogenetic relationships. Genet Resour Crop Evol. 51(8):871–882. doi: 10.1007/s10722-005-0774-2. [DOI] [Google Scholar]
- Santos da Silva R, Clement CR, Balsanelli E, Baura VA de, Souza EM de, Fraga HP de F, Vieira L do N. 2021. The plastome sequence of Bactris gasipaes and evolutionary analysis in tribe Co-coseae (Arecaceae). PLOS ONE 16: e0256373. doi: 10.1371/journal.pone.0256373 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shi L, Chen H, Jiang M, Wang L, Wu X, Huang L, Liu C.. 2019. CPGAVAS2, an integrated plastome sequence annotator and analyzer. Nucleic Acids Res. 47(W1):W65–W73. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stamatakis A. 2015. Using RAxML to infer phylogenies. Curr Protoc Bioinform. 51(1):6–14. [DOI] [PubMed] [Google Scholar]
- Uthaipaisanwong P, Chanprasert J, Shearman JR, Sangsrakru D, Yoocha T, Jomchai N, Jantasuriyarat C, Tragoonrung S, Tangphatsornruang S.. 2012. Characterization of the chloroplast genome sequence of oil palm (Elaeis guineensis Jacq.). Gene. 500(2):172–180. [DOI] [PubMed] [Google Scholar]
- Wood DE, Lu J, Langmead B.. 2019. Improved metagenomic analysis with Kraken 2. Genome Biol. 20(1):1–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zheng S, Poczai P, Hyvönen J, Tang J, Amiryousefi A.. 2020. Chloroplot: an online program for the versatile plotting of organelle genomes. Front Genet. 11:1123. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The genome sequence data that support the findings of this study are openly available in Zenodo (doi: 10.5281/zenodo.6337604) and GenBank of NCBI under the accession number: OM047178. The associated Bioproject, Bio-Sample and SRA, are PRJNA825158, SAMN27503645, and SRR18697342, respectively.