Reference-based assembly of chloroplast genome from leaf transcriptome data of Pterocarpus santalinus

Shanmugavel Senthilkumar; Kandasamy Ulaganathan; Modhumita Ghosh Dasgupta

doi:10.1007/s13205-021-02943-0

. 2021 Aug 2;11(8):393. doi: 10.1007/s13205-021-02943-0

Reference-based assembly of chloroplast genome from leaf transcriptome data of Pterocarpus santalinus

Shanmugavel Senthilkumar ¹, Kandasamy Ulaganathan ², Modhumita Ghosh Dasgupta ^1,^✉

PMCID: PMC8329147 PMID: 34458062

Abstract

Chloroplast genome sequencing is an essential tool to understand genome evolution and phylogenetic relationship. The available methods for constructing chloroplast genome include chloroplast enrichment followed by long overlapping PCR or extraction and assembly of chloroplast-specific reads from whole-genome datasets. In the present study, we propose an alternate strategy of extraction and assembly of chloroplast-specific reads from leaf transcriptome data of Pterocarpus santalinus using bowtie2 aligner program. The assembled genome was compared with the published chloroplast genome of P. santalinus for genome size, number of predicted genes, microsatellite repeat motifs, and nucleotide repeats. A near-complete chloroplast genome was assembled from the transcriptome reads. The proposed method requires less computational time and know-how, limited virtual memory, and is cost-effective when compared to whole-genome sequencing. Assembly of Cp genome from transcriptome data will enhance the resolution of phylogenetic studies through comparative plastome analysis, facilitate accurate species/genotype discrimination and accelerate the development of transplastomic plants with enhanced biotic and abiotic tolerance.

Supplementary Information

The online version contains supplementary material available at 10.1007/s13205-021-02943-0.

Keywords: Assembly, Chloroplast genome, Phylogeny, Repeat analysis, Transcriptome data

Introduction

Chloroplasts (Cp) are semi-autonomous organelles and metabolic centre of life. They were the first genome to be sequenced and have been a valuable resource for deciphering phylogenetic relatedness between taxa and resolving evolutionary relationships (Daniell et al. 2016). Until 2020, 4717 Cp genomes have been sequenced from plants (Zhong 2020). Predominantly, Cp genome is a circular molecule and recent studies have shown multi-branched linear structures of Cp DNA in few angiosperms (Mower et al. 2018). Apart from being the organelle for conducting photosynthesis, it also regulates other crucial biochemical processes including the synthesis of biomolecules like fatty acids, nucleotides, amino acids, phytohormones, vitamins and play a major role in plant response to biotic and abiotic stresses (Daniell et al. 2016). The use of Cp genomes in timber forensics, crop improvement, and production of biopharmaceuticals is extensively documented (Bansal and Saha 2012; Daniell et al. 2016; Yu et al. 2020; Teske et al. 2020; Li et al. 2021)

The conventional method for Cp genome sequencing involves chloroplast enrichment using sucrose or percoll gradient, high salt method or use of proprietary kits (Chloroplast Isolation Kit from Sigma Aldrich, USA or Abcam, MA, USA). Subsequently, long overlapping PCR is conducted to sequence the genome (reviewed by Twyford and Ness 2017). The major limitation in this strategy is the cost involved in the isolation of chloroplast and the large quantity of starting material required, which can be a limitation for samples sourced from herbaria or endangered species (Vieira Ldo et al. 2014). Another challenge is the primer designing for long-range PCR which depends on the sequence conservation across species. The difference in gene organization can severely hamper the amplification success thus affecting genome assembly (Atherton et al. 2010). Alternately, screening of bacterial artificial chromosome (BAC) or fosmid libraries using chloroplast-specific probes is also reported (Daniell et al. 2006; Jansen et al. 2011) which are technically demanding procedures. With the advent of next-generation sequencing (NGS) platforms, sequencing of enriched chloroplast DNA (Atherton et al. 2010) or use of whole-genome sequence datasets has emerged as a viable method for assembling Cp genomes (reviewed by Twyford and Ness 2017). The Illumina platform is considered the most suitable NGS platform for sequencing Cp genomes, since it allows rolling circle amplification products (Atherton et al. 2010). Software tools like IOGA (Baker et al. 2010), Fast-Plast (McKain and Wilson 2017), GetOrganelle (Jin et al. 2020), NOVOplasty (Dierckxsens et al. 2017) and ChloroExtractor (Ankenbrand et al. 2018) were developed for extracting organellar reads from whole-genome datasets and used for assembling chloroplast genomes. A comparison of different tools to assemble complete chloroplast revealed that GetOrganelle performed the best both on simulated and real data, followed by Fast-Plast (Freudenthal et al. 2020). However, sequencing the whole genome is cost-intensive, and assembling organellar genome from these datasets requires high-end computational knowledge and infrastructure.

In the present study, an alternate approach of extracting and assembling chloroplast reads from transcriptome dataset was attempted and the method was demonstrated using the leaf transcriptome of Pterocarpus santalinus. This methodology demands less computational time and limited virtual memory and can be executed by researchers with limited knowledge in computational biology.

Total RNA was isolated from young leaves of P. santalinus using RNAqueous^®-Micro Total RNA Isolation Kit (Thermo Scientific, USA). The concentration of RNA was quantified using Qubit fluorometer (Thermo Fisher Scientific, MA, USA) and TapeStation (Agilent Technologies Inc., Santa Clara, CA). RNA integrity number equivalent (RIN^e) was determined in TapeStation. Five hundred ng of total RNA was used to enrich mRNA using NEB Next Poly (A) mRNA magnetic isolation module and the enriched mRNA was chemically fragmented, reverse transcribed, and cleaned. The cDNA was end-repaired, adapter-ligated, size selected, PCR amplified (12 cycles) and cleaned prior to library construction. The library was constructed using NEBNext® Ultra™ II RNA Library Prep Kit using manufacturer’s protocol, quantified using Qubit fluorometer, validated in TapeStation and sequenced in Illumina HiSeq 2000 (Illumina Inc., San Diego, CA, USA) using 150 bp paired-end chemistry.

The raw RNA-seq data were quality checked using FastQC and low-quality and adapter sequences were removed using Trimmomatic tool (Bolger et al. 2014). The processed reads were subsequently used as input for Bowtie2 aligner program and P. santalinus chloroplast sequence (Acc. No. MT249117.1; Hong et al. 2020) was used as reference. The reference Cp genome used for the present study was assembled from whole-genome dataset, which was generated using a hybrid strategy of short-read sequencing on Illumina Hiseq 4000 and long-read sequencing using PacBio Sequel (Hong et al. 2020).

The SAM file from Bowtie2 program was then converted to coordinate-sorted BAM file followed by the generation of consensus FASTA sequence using SAM tools (Li et al. 2009) and VCF tools (Danecek et al. 2011).

The commands used for constructing Cp genome is given below:

Command for building reference index

$ bowtie2-build-f/path/to/reference.fasta/directory/path/to/write/reference/index/.

Command for building Cp genome with reference sequence

$ bowtie2-local-p10-x/path/to/reference/index/directory-1/path/to/transcriptome/raw/reads/forward.fastq.gz-2 /path/to/transcriptome/raw/reads/reverse.fastq.gz-S output.sam.

Conversion of SAM to sorted BAM file

$ samtools view-bS output.sam|samtools sort-o output.bam.

Generation of consensus FASTA file from BAM file

$ samtools mpileup-uf/path/to/reference.fasta output.bam|bcftools call-c|vcfutils.pl vcf2fq > output.fastq.

All analysis were carried out on a Dell precision workstation 3630 (i7–8700 K 3.2 GHz processor 6 cores 12 threads, 32 GB RAM in Linux Ubuntu 20.10 LTS).

The sequence thus sorted was annotated using GeSeq online tool (Tillich et al. 2017). The number of genes in the assembled and the reference Cp genome was predicted using the same tool. REputer (Kurtz et al. 2001) and MISA (Beier et al. 2017) were used to identify the nucleotide repeats and microsatellite repeats in both assembled and reference Cp genomes respectively using default parameters. The number of each nucleotide was determined using Python script. mVISTA (available at http://genome.lbl.gov/vista/index.shtml) (Mayor et al. 2000) was used to visualize the alignment of reference and assembled Cp genome of P. santalinus and identify sequence variations in the two assemblies.

The assembled and reference Cp genomes of P. santalinus along with 27 members from Fabaceae were used to construct the phylogenetic tree. Pterocarpus species including P. indicus, P. macrocarpus, P. marsupium. P. tinctorius and P. pedatus were included in the study to document their phylogenetic relatedness. Multiple sequence alignment was conducted using BioEdit (Hall 1999) and phylogenetic analysis was carried out in MEGA X (Kumar et al. 2018). Neighbor-Joining (NJ) tree was constructed using p-distance model with 1000 iterations for bootstrap values and pair-wise deletions was selected for gap treatment.

The concentration of total RNA isolated from the leaf tissues was 43.2 ng/µl using Qubit fluorometer and the RNA integrity number equivalent (RIN^e) value was 7.8. The enriched sequencing library was quantified using both Qubit fluorometer and TapeStation and the concentration was 18.7 and 15.2 ng/µl respectively.

A total of 35,861,326 raw reads were generated with a read length of 150 bp and the percent of reads above Q30 was 89.04%. The reference-based assembly of P. santalinus from leaf RNA-seq raw reads generated a Cp genome of 158,966 bp (Fig. 1), similar to the genome reported by Hong et al. (2020). A total of 158 genes were identified in the assembled genome when compared to 159 genes predicted from the reference genome (Hong et al. 2020) (Table 1). The genome sequences were annotated using GeSeq and the list of genes annotated in both the Cp genomes, gene position, and gene length is presented in Table 1. The predicted genes and their numbers were comparable except for trnI-CAU, which was not predicted in the assembled genome. The comparative analysis indicated that a near-complete assembly of P. santalinus Cp genome was achievable using the present method.

Fig. 1 — Chloroplast genome of *Pterocarpus santalinus* assembled from leaf transcriptome data. The genes drawn outside and inside of the circle are transcribed in clockwise and counter clockwise directions, respectively. Genes are colored based on their functional groups

Table 1.

Comparative analysis of genes predicted from the assembled and reference chloroplast genomes of Pterocarpus santalinus using GeSeq

Group	Gene name	Gene position		Gene length		Total no of genes
Group	Gene name	Assembled	Reference	Assembled	Reference	Assembled	Reference
ATP synthase	atpA	52,185	52,185	1533	1533	7	7
	atpB	7331	7331	1488	1488
	atpE	8815	8815	402	402
	atpF	50,786	50,786	145	145
	atpF	51,703	51,703	407	407
	atpH	50,123	50,123	246	246
	atpI	48,266	48,266	744	744
NADH dehydrogenase	ndhA	32,511	32,511	553	553	15	15
	ndhA	34,305	34,305	542	542
	ndhB	57,733	57,733	777	777
	ndhB	59,195	59,195	756	756
	ndhB	146,192	146,192	777	777
	ndhB	147,654	147,654	756	756
	ndhC	10,821	10,821	363	363
	ndhD	37,775	37,775	1497	1497
	ndhE	36,826	36,826	306	306
	ndhF	42,562	42,562	2256	2256
	ndhG	36,060	36,060	531	531
	ndhH	31,328	31,328	1182	1182
	ndhI	34,926	34,926	486	486
	ndhJ	12,053	12,053	477	477
	ndhK	11,153	11,153	744	744
Cytochrome b/f complex	petA	64,436	64,436	963	963	8	8
	petB	78,320	78,320	6	6
	petB	79,143	79,143	642	642
	petD	79,997	79,997	8	8
	petD	80,714	80,714	475	475
	petG	68,950	68,950	114	114
	petL	68,689	68,689	96	96
	petN	124,655	124,655	90	90
Photosystem I	psaA	20,016	20,016	2253	2253	5	5
	psaB	22,294	22,294	2205	2205
	psaC	37,398	37,398	246	246
	psaI	62,223	62,223	105	105
	psaJ	70,142	70,142	135	135
Photosystem II	psbA	157,594	157,594	1062	1062	14	14
	psbB	75,816	75,816	1527	1527
	psbC	130,754	130,754	1386	1386
	psbD	129,709	129,709	1062	1062
	psbE	91,561	91,561	252	252
	psbF	91,822	91,822	120	120
	psbH	77,953	77,953	222	222
	psbI	102,726	102,726	111	111
	psbJ	92,216	92,216	123	123
	psbK	102,027	102,027	186	186
	psbL	91,964	91,964	117	117
	psbM	32,839	32,839	105	105
	psbT	77,538	77,538	108	108
	psbZ	132,836	132,836	189	189
Large subunit of ribosome	rpl14	73,842	73,842	369	369	14	14
	rpl16	72,140	72,140	9	9
	rpl16	73,310	73,310	399	399
	rpl2	68,957	68,957	391	391
	rpl2	70,013	70,013	434	434
	rpl2	157,416	157,416	391	391
	rpl2	158,472	158,472	434	434
	rpl20	86,804	86,804	360	360
	rpl22	70,941	71,243	113	327
	rpl23	68,657	68,657	276	276
	rpl23	157,116	157,116	276	276
	rpl32	117,190	117,190	147	147
	rpl33	70,781	70,781	201	201
	rpl36	75,419	75,419	114	114
Small subunit of ribosome	rps11	76,054	76,054	417	417	18	18
	rps12	56,100	56,100	232	232
	rps12	56,864	56,864	26	26
	rps12	144,559	144,559	232	232
	rps12	145,323	145,323	26	26
	rps12-fragment	85,828	85,828	114	114
	rps14	24,622	24,622	303	303
	rps15	30,944	30,944	273	273
	rps16	58,227	58,227	40	40
	rps16	59,166	59,166	230	230
	rps18	71,243	71,254	327	84
	rps19	70,507	70,507	279	279
	rps2	47,307	47,307	711	711
	rps3	71,322	71,322	657	657
	rps4	15,958	15,958	606	606
	rps7	56,947	56,947	468	468
	rps7	145,406	145,406	468	468
	rps8	74,578	74,578	405	405
RNA polymerase subunits	rpoA	76,553	76,553	996	996	5	5
	rpoB	36,680	36,680	3213	3213
	rpoC1	39,919	39,919	432	432
	rpoC1	41,095	41,095	1623	1623
	rpoC2	42,888	42,888	4167	4167
Ribosomal RNA	rrn16	16,558	16,558	1491	1491	10	10
	rrn16	105,017	105,017	1491	1491
	rrn23	20,448	20,448	2617	2617
	rrn23	23,065	23,065	199	199
	rrn23	108,907	108,907	2617	2617
	rrn23	111,524	111,524	199	199
	rrn4.5	23,362	23,362	104	104
	rrn4.5	111,821	111,821	104	104
	rrn5	23,690	23,690	121	121
	rrn5	112,149	112,149	121	121
Transfer RNA genes	trnA-UGC	19,418	19,418	38	38	44	45
	trnA-UGC	20,256	20,256	35	35
	trnA-UGC	107,877	107,877	38	38
	trnA-UGC	108,715	108,715	35	35
	trnC-GCA	123,449	123,449	71	71
	trnD-GUC	32,339	32,339	74	74
	trnE-UUC	31,613	31,613	73	73
	trnF-GAA	145,553	145,553	73	73
	trnfM-CAU	25,092	25,092	74	74
	trnG-GCC	133,681	133,681	71	71
	trnG-UCC	103,910	103,910	23	23
	trnG-UCC	104,640	104,640	48	48
	trnH-GUG	158,851	158,851	75	75
	trnI-CAU	68,139	68,139	74	74
	trnI-CAU		156,598		74
	trnI-GAU	18,335	18,335	37	37
	trnI-GAU	19,324	19,324	35	35
	trnI-GAU	106,794	106,794	37	37
	trnI-GAU	107,783	107,783	35	35
	trnK-UUU	154,633	154,633	37	37
	trnK-UUU	157,243	157,243	35	35
	trnL-CAA	60,528	60,528	81	81
	trnL-CAA	148,987	148,987	81	81
	trnL-UAA	144,543	144,543	35	35
	trnL-UAA	145,116	145,116	50	50
	trnL-UAG	118,203	118,203	80	80
	trnM-CAU	149,521	149,521	73	73
	trnN-GUU	45,663	45,663	72	72
	trnN-GUU	134,122	134,122	72	72
	trnP-UGG	89,449	89,449	74	74
	trnQ-UUG	57,577	57,577	72	72
	trnR-ACG	24,071	24,071	74	74
	trnR-ACG	112,530	112,530	74	74
	trnR-UCU	104,944	104,944	72	72
	trnS-GCU	55,883	55,883	87	87
	trnS-GGA	142,090	142,090	88	88
	trnS-UGA	26,492	26,492	93	93
	trnT-GGU	128,163	128,163	72	72
	trnT-UGU	15,609	15,609	73	73
	trnV-GAC	16,264	16,264	72	72
	trnV-GAC	104,723	104,723	72	72
	trnV-UAC	9629	9629	39	39
	trnV-UAC	10,261	10,261	35	35
	trnW-CCA	89,698	89,698	74	74
	trnY-GUA	31,746	31,746	84	84
Miscellaneous group	accD	60,176	60,176	1506	1506	10	10
	ccsA	118,409	118,409	972	972
	cemA	63,530	63,530	690	690
	clpP1	83,619	83,619	71	71
	clpP1	84,501	84,501	292	292
	clpP1	85,385	85,385	228	228
	infA	75,165	75,165	168	168
	matK	155,389	155,389	1326	1326
	pbf1	81,125	81,125	132	132
	rbcL	152,413	152,413	1428	1428
Hypothetical chloroplast reading frames	ycf1	25,227	25,227	5334	5334	8	8
	ycf1	113,686	113,686	468	468
	ycf2	3124	2458	6195	6861
	ycf2	90,917	90,917	6861	6861
	ycf3	17,138	17,138	124	124
	ycf3	17,984	17,984	230	230
	ycf3	18,995	18,995	153	153
	ycf4	62,512	62,512	555	555
				Total		158	159

Open in a new tab

Comparison of the assembled and reference Cp genome with mVISTA showed significant sequence similarity except for variability in the ycf genes (Supplementary Fig. 1). The sequence variability in this gene is well documented and is a target for Pterocarpus barcode development (Jiao et al. 2019).

Repeat analysis using REPuter predicted a total of 25 repeat regions with 23 repeats between 22 and 65 bp and 2 repeats between 244 and 287 bp in forward vs forward comparison in the assembled genome (Supplementary Fig. 2a). In the reference genome, 11 repeats were documented between 24 and 67 bp, one repeat in 68–111 bp and 2 repeats were predicted between 244 and 287 bp in forward vs forward comparison (Supplementary Fig. 2b). Similarly, in the forward versus reverse complement comparison, 34 repeats were identified between 26 and 1409 bp, one repeat between 1410 and 2792 bp, while two repeats were predicted between 5560–6943 and 6944–8326 bp in the assembled genome (Supplementary Fig. 2c). In the reference genome, forward vs reverse compliment identified 17 repeats between 24 and 4301 bp and one repeat in 21,416–25,693 bp, totalling to 32 repeat regions (Supplementary Fig. 2d).

The number of nucleotides in the assembled genome was A = 35,355, G = 17,851, T = 35,573, C = 17,608, while in the reference genome it was A = 50,633, G = 29,013, T = 50,615, C = 28,705. Microsatellite repeat analysis using MISA predicted 344 repeats (Fig. 2a) with 268 mono-nucleotide (77.90%), 52 di-nucleotide (15.12%), 15 tri-nucleotide (4.36%), 5 tetra-nucleotide (1.45%), 3 penta-nucleotide (0.87%) and 1 hexa-nucleotide (0.29%) repeats in assembled Cp genome. In comparison, a total of 349 microsatellite repeats were identified in reference genome with 272 mono-nucleotide (77.93%), 51 di-nucleotide (14.61%) and 15 tri-nucleotide (4.29%), 5 tetra nucleotide (1.43%), 5 Penta -nucleotide (1.43%) and 1 hexa-nucleotide (0.28%) microsatellite repeats. A total of 10 and 12 repeat types were predicted in assembled and reference genome respectively and AT/AT was the predominant repeat class in both assembled and reference Cp genome (Fig. 2b).

Fig. 2 — a Number of microsatellite repeat motifs predicted in genic and intergenic regions of assembled and reference chloroplast genome of *Pterocarpus santalinus.* b Number of repeat types predicted in genic and intergenic regions of assembled and reference chloroplast genome of *Pterocarpus santalinus*

The phylogenetic tree grouped both the Cp genomes of P. santalinus with 100% confidence (Fig. 3). The other Pterocarpus species including P. pedatus, P. indicus, P. marsupium and P. macrocarpus grouped into a single clade, while P. tinctorius formed as a separate clade (Fig. 3). The phylogenetic grouping of the Pterocarpus species is in consonance with the previous report by Hong et al. (2020). Hence, the comparative analysis of the two genomes indicates that the methodology proposed in the present study can effectively assemble a near-complete Cp genome from transcriptome datasets. Phylogenetic grouping of the reference and assembled genomes with 100% confidence reiterates the feasibility of the method developed in the study.

In land plants, the Cp DNA is highly conserved in structure, content, and gene order (Shaw et al. 2007). The genome size varies from 15,553 to 521,168 bp (Dobrogojski et al. 2020) and the total number of genes encoded by Cp genomes ranges from 120 to 140 (Rogalski et al. 2015). A typical Cp genome is arranged in a quadripartite structure, consisting of a large single copy (LSC 80–90 kbp) region and a small single copy (SSC 16–27 kbp) region separated by a pair of inverted repeats (IRs 20–30 kbp) (Wicke et al. 2011). Comparative chloroplast genomics revealed that the Cp DNA is highly variable at genome-scale (Whittall et al. 2010; Besnard et al. 2011) specifically in the non-coding intergenic spacer region (Daniell et al. 2006, 2016). Hence, recent studies have utilized the entire plastomes as ‘super barcodes’ enabling identification of hypervariable loci and lineage-specific InDels for efficient discrimination of plant species (Niu et al. 2017; Fu et al. 2019).

The use of Cp genome in evolutionary analysis, phylogenomics, barcoding, and meta-barcoding is well established (Li et al. 2015; Hollingsworth et al. 2016; Dormontt et al. 2018). In crop breeding, it has been used in the identification of cultivars, assessing hybrid purity, and understanding domestication history (Daniell et al. 2016; Teske et al. 2020). The translational application of chloroplast transformation in conferring biotic and abiotic stress tolerance in plants and production of biopharmaceuticals, biomaterials, enzymes, biofuels, and vaccines is also reported (reviewed by Bansal and Saha 2012; Daniell et al. 2016; Yu et al. 2020; Li et al. 2021). These transplastomic plants can integrate and express up to 10,000 copies of transgenes in contrast to nuclear genome, facilitating an extremely high level of transgene expression (Oey et al. 2009; Jin and Daniell 2015). Due to its maternal inheritance, it also minimizes the transgene escape, alleviating biosafety concerns (Daniell 2007; Boehm and Bock 2019).

RNA editing is a post-translational gene expression process which generates RNA and protein diversity and regulate gene expression (Okuda et al. 2007). Land plants typically have 20–60 editing spots in chloroplast RNA (Ichinose and Sugita 2016) and the key editing target is the rbcL gene encoding the large subunit of ribulose bisphosphate carboxylase/oxygenase (RuBisCO). Transplastomic plants have facilitated understanding RNA editing and have been extensively used in the mapping of cis-acting elements, introduction of heterologous editing sites to characterize trans-acting specificity factors and expression of synthetic sequences (Ruf and Bock 2011; Avila et al. 2016). In a recent study, transplastomic tobacco expressing synthetic glycolate metabolic pathways were reported and field evaluation of the transgenic lines revealed 20% improvement in photosynthesis and up to 37% increase in biomass. These lines were also tolerant to photorespiration stress (South et al. 2019). This study opens up a new vista in chloroplast genomics indicating that gene editing in conjunction with synthetic biology can enhance the photosynthetic efficiency of crop plants, thereby enhancing productivity.

Cp genome sequencing has been successfully conducted either by chloroplast enrichment and sequencing or by assembling it from whole-genome datasets (reviewed by Twyford and Ness 2017). Computational pipelines like Fast-Plast, GetOrganelle, NOVOplasty and ChloroExtractor have been evaluated for their efficiency in assembling the Cp genomes (Freudenthal et al. 2020). These tools vary in their hardware requirements and utilization, efficiency, repeatability and time consumption in processing the WGS reads. We had used Novoplasty and GetOrganelle programs to assemble the Cp genome of P. santalinus from transcriptome data. Both programs generated fragmented contigs in the range of ~ 500 bp to ~ 21 kb (data not shown) and successful assembly could not be achieved. Hence, a pipeline was developed to construct a near-complete Cp genome from the leaf RNA-seq reads. This alternate approach is more cost-effective and less labour intensive when compared to chloroplast enrichment followed by NGS or whole-genome sequencing. An indicative costing of chloroplast enrichment and sequencing using Illumina platform in P. santalinus is ~ 335 USD, while WGS with 30× coverage will be ~ 1272 USD. Genome skimming at 1.5 × depth would cost ~ 536 USD. Transcriptome sequencing which would cost ~ 340 USD can be used for both expression studies and retrieval of Cp specific reads for genome assembly.

The pipeline developed in the present study offers several advantages including the limited requirement of computing time and know-how and cost-effectiveness when compared to WGS. One major benefit of using transcriptome data is the reduced size of the dataset, which is less than 5% of the entire genome (Pertea 2012). Further, the presence of less tandem repeat elements in transcriptome data reduces errors in sequence assembly when compared to WGS data (Tørresen et al. 2019). The near-complete Cp genome of P. santalinus assembled using the present method is highly encouraging, considering that the reference genome used for comparison was assembled from high depth whole genome sequencing. The minor gaps observed in the present assembly could be minimized by increasing the depth of RNA-seq or can be bridged using amplicon sequencing. This method can fast pace evolutionary and phylogenomic studies, enable species discrimination and hybrid validation in breeding programs, delineate cryptic species, assist timber forensics and accelerate chloroplast genomics in plants.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 1032 KB)^{(1MB, docx)}

Acknowledgements

The authors acknowledge the National Biodiversity Authority, Government of India for funding support.

Author contributions

SS conducted Cp genome assembly, annotation, analysis and drafted the manuscript; KU conceptualized the pipeline; MGD conceptualized the research, obtained funding, conducted transcriptome sequencing, prepared and finalized the manuscript. All authors have approved the manuscript.

Funding

This study was funded by the National Biodiversity Authority, Government of India.

Availability of data and material

Not applicable.

Code availability

Not applicable.

Declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethics approval

Not applicable.

Consent to participate

Not applicable.

Consent for publication

Not applicable.

References

Ankenbrand MJ, Pfaff S, Terhoeven N, et al. chloroExtractor: extraction and assembly of the chloroplast genome from whole genome shotgun data. J Open Source Softw. 2018;3:464. doi: 10.2110/joss.00464. [DOI] [Google Scholar]
Atherton RA, McComish BJ, Shepherd LD, Berry LA, Albert NW, Lockhart PJ. Whole genome sequencing of enriched chloroplast DNA using the illumina GAII platform. Plant Methods. 2010;6:22. doi: 10.1186/1746-4811-6-22. [DOI] [PMC free article] [PubMed] [Google Scholar]
Avila ME, Gisby MF, Day A. Seamless editing of the chloroplast genome in plants. BMC Plant Biol. 2016;16:168. doi: 10.1186/s12870-016-0857-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
Baker P, Jackson P, Aitken K. Bayesian estimation of marker dosage in sugarcane and other autopolyploids. Theor Appl Genet. 2010;120:1653–1672. doi: 10.1007/s00122-010-1283-z. [DOI] [PubMed] [Google Scholar]
Bansal KC, Saha D. Chloroplast genomics and genetic engineering for crop improvement. Agric Res. 2012;1:53–66. doi: 10.1007/s40003-011-0010-6. [DOI] [Google Scholar]
Beier S, Thiel T, Münch T, et al. MISA-web: a web server for microsatellite prediction. Bioinformatics. 2017;33:2583–2585. doi: 10.1093/bioinformatics/btx198. [DOI] [PMC free article] [PubMed] [Google Scholar]
Besnard G, Hernández P, Khadari B, Dorado G, Savolainen V. Genomic profiling of plastid DNA variation in the mediterranean olive tree. BMC Plant Biol. 2011;11:80. doi: 10.1186/1471-2229-11-80. [DOI] [PMC free article] [PubMed] [Google Scholar]
Boehm CR, Bock R. Recent advances and current challenges in synthetic biology of the plastid genetic system and metabolism. Plant Physiol. 2019;179:794–802. doi: 10.1104/pp.18.00767. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30:2114–2120. doi: 10.1093/bioinformatics/btu170. [DOI] [PMC free article] [PubMed] [Google Scholar]
Danecek P, Auton A, Abecasis G, et al. The variant call format and VCFtools. Bioinformatics. 2011;27:2156–2158. doi: 10.1093/bioinformatics/btr330. [DOI] [PMC free article] [PubMed] [Google Scholar]
Daniell H. Transgene containment by maternal inheritance: effective or elusive? Proc Natl Acad Sci USA. 2007;104:6879–6880. doi: 10.1073/pnas.0702219104. [DOI] [PMC free article] [PubMed] [Google Scholar]
Daniell H, Lee SB, Grevich J, Saski C, Quesada-Vargas T, Guda C, et al. Complete chloroplast genome sequences of Solanum bulbocastanum, Solanum lycopersicum and comparative analyses with other Solanaceae genomes. Theor Appl Genet. 2006;112:1503–1518. doi: 10.1007/s00122-006-0254-x. [DOI] [PubMed] [Google Scholar]
Daniell H, Lin CS, Yu M, Chang WJ. Chloroplast genomes: diversity, evolution, and applications in genetic engineering. Genome Biol. 2016;17(1):134. doi: 10.1186/s13059-016-1004-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
Dierckxsens N, Mardulyn P, Smits G. NOVOPlasty: de novo assembly of organelle genomes from whole genome data. Nucleic Acids Res. 2017;45:18. doi: 10.1093/nar/gkw955. [DOI] [PMC free article] [PubMed] [Google Scholar]
Dobrogojski J, Adamiec M, Lucinski R. The chloroplast genome: a review. Acta Physiol Plant. 2020;42:98. doi: 10.1007/s11738-020-03089-x. [DOI] [Google Scholar]
Dormontt EE, van Dijk K, Bell KL, Biffin E, Breed MF, Byrne M, Caddy-Retalic S, Encinas-Viso F, Nevill PG, Shapcott A, Young JM, Waycott M, Lowe AJ. Advancing DNA barcoding and metabarcoding applications for plants requires systematic analysis of herbarium collections—an Australian p[erspective. Front Ecol Evol. 2018;6:134. doi: 10.3389/fevo.2018.00134. [DOI] [Google Scholar]
Freudenthal JA, Pfaff S, Terhoeven N, et al. A systematic comparison of chloroplast genome assembly tools. Genome Biol. 2020;21:254. doi: 10.1186/s13059-020-02153-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
Fu CN, Wu CS, Ye LJ, Mo ZQ, Liu J, Chang YW, Li DZ, Chaw SM, Gao LM. Prevalence of isomeric plastomes and effectiveness of plastome super-barcodes in yews (Taxus) worldwide. Sci Rep. 2019;9(1):1–11. doi: 10.1038/s41598-019-39161-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hall TA. BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucleic Acids Symp Ser. 1999;41:95–98. [Google Scholar]
Hollingsworth PM, Li D-Z, Van Der Bank M, Twyford AD. Telling plant species apart with DNA: from barcodes to genomes. Philos Trans R Soc B. 2016;371:20150338. doi: 10.1098/rstb.2015.0338. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hong Z, Wu Z, Zhao K, Yang Z, Zhang N, Guo J, Tembrock LR, Xu D. Comparative analyses of five complete chloroplast genomes from the genus Pterocarpus (Fabacaeae) Int J Mol Sci. 2020;21(11):3758. doi: 10.3390/ijms21113758. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ichinose M, Sugita M. RNA editing and its molecular mechanism in plant organelles. Genes. 2016;8(1):5. doi: 10.3390/genes8010005. [DOI] [PMC free article] [PubMed] [Google Scholar]
Jansen RK, Saski C, Lee SB, Hansen AK, Daniell H. Complete plastid genome sequences of three rosids (Castanea, Prunus, Theobroma): evidence for at least two independent transfers of rpl22 to the nucleus. Mol Biol Evol. 2011;28:835–847. doi: 10.1093/molbev/msq261. [DOI] [PMC free article] [PubMed] [Google Scholar]
Jiao L, Lu Y, He T, Li J, Yin Y. A strategy for developing high-resolution DNA barcodes for species discrimination of wood specimens using the complete chloroplast genome of three Pterocarpus species. Planta. 2019;250(1):95–104. doi: 10.1007/s00425-019-03150-1. [DOI] [PubMed] [Google Scholar]
Jin S, Daniell H. The engineered chloroplast genome just got smarter. Trends Plant Sci. 2015;20:622–640. doi: 10.1016/j.tplants.2015.07.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
Jin JJ, Bin YuW, Yang JB, et al. GetOrganelle: a fast and versatile toolkit for accurate de novo assembly of organelle genomes. Genome Biol. 2020;21:241. doi: 10.1186/s13059-020-02154-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kumar S, Stecher G, Li M, et al. MEGA X: Molecular evolutionary genetics analysis across computing platforms. Mol Biol Evol. 2018;35:1547–1549. doi: 10.1093/molbev/msy096. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kurtz S, Choudhuri JV, Ohlebusch E, et al. REPuter: the manifold applications of repeat analysis on a genomic scale. Nucleic Acids Res. 2001;29:4633–4642. doi: 10.1093/nar/29.22.4633. [DOI] [PMC free article] [PubMed] [Google Scholar]
Li H, Handsaker B, Wysoker A, et al. The sequence alignment/map format and SAMtools. Bioinformatics. 2009;25:207–209. doi: 10.1093/bioinformatics/btp352. [DOI] [PMC free article] [PubMed] [Google Scholar]
Li XW, Yang Y, Henry RJ, Rossetto M, Wang YT, Chen SL. Plant DNA barcoding: from gene to genome. Biol Rev. 2015;90:157–166. doi: 10.1111/brv.12104. [DOI] [PubMed] [Google Scholar]
Li S, Chang L, Zhang J. Advancing organelle genome transformation and editing for crop improvement. Plant Commun. 2021;2:100–141. doi: 10.1016/j.xplc.2021.100141. [DOI] [PMC free article] [PubMed] [Google Scholar]
Mayor C, Brudno M, Schwartz JR, Poliakov A, Rubin EM, Frazer KA, Pachter LS, Dubchak I. VISTA: visualizing global DNA sequence alignments of arbitrary length. Bioinformatics. 2000;16:1046–1047. doi: 10.1093/bioinformatics/16.11.1046. [DOI] [PubMed] [Google Scholar]
McKain, M Wilson (2017) Fast-Plast: rapid de novo assembly and finishing for whole chloroplast genomes. https://github.com/mrmckain/Fast-Plast
Mower JP, Vickrey TL. Structural diversity among plastid genomes of land plants. In: Chaw S-M, Jansen RK, editors. Advances in botanical research. Cambridge: Academic Press; 2018. pp. 263–292. [Google Scholar]
Niu Z, Xue Q, Zhu S, Sun J, Liu W, Ding X. The complete plastome sequences of four orchid species: insights into the evolution of the Orchidaceae and the utility of plastomic mutational hotspots. Front Plant Sci. 2017;8:715. doi: 10.3389/fpls.2017.00715. [DOI] [PMC free article] [PubMed] [Google Scholar]
Oey M, Lohse M, Kreikemeyer B, Bock R. Exhaustion of the chloroplast protein synthesis capacity by massive expression of a highly stable protein antibiotic. Plant J. 2009;57:436–445. doi: 10.1111/j.1365-313X.2008.03702.x. [DOI] [PubMed] [Google Scholar]
Okuda K, Myouga F, Motohashi R, Shinozaki K, Shikanai T. Conserved domain structure of pentatricopeptide repeat proteins involved in chloroplast RNA editing. Proc Natl Acad Sci USA. 2007;104:8178–8183. doi: 10.1073/pnas.0700865104. [DOI] [PMC free article] [PubMed] [Google Scholar]
Pertea M. The human transcriptome: an unfinished story. Genes (basel) 2012;3(3):344–360. doi: 10.3390/genes3030344. [DOI] [PMC free article] [PubMed] [Google Scholar]
Rogalski M, do Vieira NL, Fraga HP, Guerra MP. Plastid genomics in horticultural species: importance and applications for plant population genetics, evolution, and biotechnology. Front Plant Sci. 2015;6:586. doi: 10.3389/fpls.2015.00586. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ruf S, Bock R (2011) In vivo analysis of rna editing in plastids. In: Aphasizhev R (ed) RNA and DNA editing. Methods in molecular biology, 718. Humana Press, Totowa. 10.1007/978-1-61779-018-8_8 [DOI] [PubMed]
Shaw J, Lickey EB, Schilling EE, Small RL. Comparison of whole chloroplast genome sequences to choose noncoding regions for phylogenetic studies in angiosperms: the tortoise and the hare III. Am J Bot. 2007;94(3):275–288. doi: 10.3732/ajb.94.3.275. [DOI] [PubMed] [Google Scholar]
South PF, Cavanagh AP, Liu HW, Ort DR. Synthetic glycolate metabolism pathways stimulate crop growth and productivity in the field. Science. 2019;363:aat9077. doi: 10.1126/science.aat9077. [DOI] [PMC free article] [PubMed] [Google Scholar]
Teske D, Peters A, Möllers A, Fischer M. Genomic profiling: the strengths and limitations of chloroplast genome-based plant variety authentication. J Agric Food Chem. 2020;68(49):14323–14333. doi: 10.1021/acs.jafc.0c03001. [DOI] [PubMed] [Google Scholar]
Tillich M, Lehwark P, Pellizzer T, et al. GeSeq—versatile and accurate annotation of organelle genomes. Nucleic Acids Res. 2017;45:W6–W11. doi: 10.1093/nar/gkx391. [DOI] [PMC free article] [PubMed] [Google Scholar]
Tørresen OK, Star B, Mier P, Andrade-Navarro MA, Bateman A, Jarnot P, Gruca A, Grynberg M, Kajava AV, Promponas VJ, Anisimova M, Jakobsen KS, Linke D. Tandem repeats lead to sequence assembly errors and impose multi-level challenges for genome and protein databases. Nucleic Acids Res. 2019;47(21):10994–11006. doi: 10.1093/nar/gkz841. [DOI] [PMC free article] [PubMed] [Google Scholar]
Twyford AD, Ness RW. Strategies for complete plastid genome sequencing. Mol Ecol Resour. 2017;17:858–868. doi: 10.1111/1755-0998.12626. [DOI] [PMC free article] [PubMed] [Google Scholar]
Vieira Ldo LN, Faoro H, Rogalski M, Fraga HP, Cardoso RL, de Souza EM, et al. The complete chloroplast genome sequence of Podocarpus lambertii: genome structure, evolutionary aspects, gene content and SSR detection. PLoS ONE. 2014 doi: 10.1371/journal.pone.0090618. [DOI] [PMC free article] [PubMed] [Google Scholar]
Whittall JB, Syring J, Parks M, Buenrostro J, Dick C, Liston A, et al. Finding a (pine) needle in a haystack: chloroplast genome sequence divergence in rare and widespread pines. Mol Ecol. 2010;19(Suppl 1):100–114. doi: 10.1111/j.1365-294X.2009.04474.x. [DOI] [PubMed] [Google Scholar]
Wicke S, Schneeweiss GM, de Pamphilis CW, Müller KF, Quandt D. The evolution of the plastid chromosome in land plants: gene content, gene order, gene function. Plant Mol Biol. 2011;76(3):273–297. doi: 10.1007/s11103-011-9762-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
Yu Y, Yu PC, Chang WJ, Yu K, Lin CS. Plastid transformation: how does it work? Can it be applied to crops? What can it offer? Int J Mol Sci. 2020;21(14):4854. doi: 10.3390/ijms21144854. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhong X (2020) Assembly, annotation and analysis of chloroplast genomes. Doctoral thesis, The University of Western Australia

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary file1 (DOCX 1032 KB)^{(1MB, docx)}

Data Availability Statement

Not applicable.

[CR1] Ankenbrand MJ, Pfaff S, Terhoeven N, et al. chloroExtractor: extraction and assembly of the chloroplast genome from whole genome shotgun data. J Open Source Softw. 2018;3:464. doi: 10.2110/joss.00464. [DOI] [Google Scholar]

[CR2] Atherton RA, McComish BJ, Shepherd LD, Berry LA, Albert NW, Lockhart PJ. Whole genome sequencing of enriched chloroplast DNA using the illumina GAII platform. Plant Methods. 2010;6:22. doi: 10.1186/1746-4811-6-22. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR3] Avila ME, Gisby MF, Day A. Seamless editing of the chloroplast genome in plants. BMC Plant Biol. 2016;16:168. doi: 10.1186/s12870-016-0857-6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR4] Baker P, Jackson P, Aitken K. Bayesian estimation of marker dosage in sugarcane and other autopolyploids. Theor Appl Genet. 2010;120:1653–1672. doi: 10.1007/s00122-010-1283-z. [DOI] [PubMed] [Google Scholar]

[CR5] Bansal KC, Saha D. Chloroplast genomics and genetic engineering for crop improvement. Agric Res. 2012;1:53–66. doi: 10.1007/s40003-011-0010-6. [DOI] [Google Scholar]

[CR6] Beier S, Thiel T, Münch T, et al. MISA-web: a web server for microsatellite prediction. Bioinformatics. 2017;33:2583–2585. doi: 10.1093/bioinformatics/btx198. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR7] Besnard G, Hernández P, Khadari B, Dorado G, Savolainen V. Genomic profiling of plastid DNA variation in the mediterranean olive tree. BMC Plant Biol. 2011;11:80. doi: 10.1186/1471-2229-11-80. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR8] Boehm CR, Bock R. Recent advances and current challenges in synthetic biology of the plastid genetic system and metabolism. Plant Physiol. 2019;179:794–802. doi: 10.1104/pp.18.00767. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR9] Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30:2114–2120. doi: 10.1093/bioinformatics/btu170. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR10] Danecek P, Auton A, Abecasis G, et al. The variant call format and VCFtools. Bioinformatics. 2011;27:2156–2158. doi: 10.1093/bioinformatics/btr330. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR11] Daniell H. Transgene containment by maternal inheritance: effective or elusive? Proc Natl Acad Sci USA. 2007;104:6879–6880. doi: 10.1073/pnas.0702219104. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR12] Daniell H, Lee SB, Grevich J, Saski C, Quesada-Vargas T, Guda C, et al. Complete chloroplast genome sequences of Solanum bulbocastanum, Solanum lycopersicum and comparative analyses with other Solanaceae genomes. Theor Appl Genet. 2006;112:1503–1518. doi: 10.1007/s00122-006-0254-x. [DOI] [PubMed] [Google Scholar]

[CR13] Daniell H, Lin CS, Yu M, Chang WJ. Chloroplast genomes: diversity, evolution, and applications in genetic engineering. Genome Biol. 2016;17(1):134. doi: 10.1186/s13059-016-1004-2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR14] Dierckxsens N, Mardulyn P, Smits G. NOVOPlasty: de novo assembly of organelle genomes from whole genome data. Nucleic Acids Res. 2017;45:18. doi: 10.1093/nar/gkw955. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR15] Dobrogojski J, Adamiec M, Lucinski R. The chloroplast genome: a review. Acta Physiol Plant. 2020;42:98. doi: 10.1007/s11738-020-03089-x. [DOI] [Google Scholar]

[CR16] Dormontt EE, van Dijk K, Bell KL, Biffin E, Breed MF, Byrne M, Caddy-Retalic S, Encinas-Viso F, Nevill PG, Shapcott A, Young JM, Waycott M, Lowe AJ. Advancing DNA barcoding and metabarcoding applications for plants requires systematic analysis of herbarium collections—an Australian p[erspective. Front Ecol Evol. 2018;6:134. doi: 10.3389/fevo.2018.00134. [DOI] [Google Scholar]

[CR17] Freudenthal JA, Pfaff S, Terhoeven N, et al. A systematic comparison of chloroplast genome assembly tools. Genome Biol. 2020;21:254. doi: 10.1186/s13059-020-02153-6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR18] Fu CN, Wu CS, Ye LJ, Mo ZQ, Liu J, Chang YW, Li DZ, Chaw SM, Gao LM. Prevalence of isomeric plastomes and effectiveness of plastome super-barcodes in yews (Taxus) worldwide. Sci Rep. 2019;9(1):1–11. doi: 10.1038/s41598-019-39161-x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR19] Hall TA. BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucleic Acids Symp Ser. 1999;41:95–98. [Google Scholar]

[CR20] Hollingsworth PM, Li D-Z, Van Der Bank M, Twyford AD. Telling plant species apart with DNA: from barcodes to genomes. Philos Trans R Soc B. 2016;371:20150338. doi: 10.1098/rstb.2015.0338. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR21] Hong Z, Wu Z, Zhao K, Yang Z, Zhang N, Guo J, Tembrock LR, Xu D. Comparative analyses of five complete chloroplast genomes from the genus Pterocarpus (Fabacaeae) Int J Mol Sci. 2020;21(11):3758. doi: 10.3390/ijms21113758. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR22] Ichinose M, Sugita M. RNA editing and its molecular mechanism in plant organelles. Genes. 2016;8(1):5. doi: 10.3390/genes8010005. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR23] Jansen RK, Saski C, Lee SB, Hansen AK, Daniell H. Complete plastid genome sequences of three rosids (Castanea, Prunus, Theobroma): evidence for at least two independent transfers of rpl22 to the nucleus. Mol Biol Evol. 2011;28:835–847. doi: 10.1093/molbev/msq261. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR24] Jiao L, Lu Y, He T, Li J, Yin Y. A strategy for developing high-resolution DNA barcodes for species discrimination of wood specimens using the complete chloroplast genome of three Pterocarpus species. Planta. 2019;250(1):95–104. doi: 10.1007/s00425-019-03150-1. [DOI] [PubMed] [Google Scholar]

[CR25] Jin S, Daniell H. The engineered chloroplast genome just got smarter. Trends Plant Sci. 2015;20:622–640. doi: 10.1016/j.tplants.2015.07.004. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR26] Jin JJ, Bin YuW, Yang JB, et al. GetOrganelle: a fast and versatile toolkit for accurate de novo assembly of organelle genomes. Genome Biol. 2020;21:241. doi: 10.1186/s13059-020-02154-5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR27] Kumar S, Stecher G, Li M, et al. MEGA X: Molecular evolutionary genetics analysis across computing platforms. Mol Biol Evol. 2018;35:1547–1549. doi: 10.1093/molbev/msy096. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR28] Kurtz S, Choudhuri JV, Ohlebusch E, et al. REPuter: the manifold applications of repeat analysis on a genomic scale. Nucleic Acids Res. 2001;29:4633–4642. doi: 10.1093/nar/29.22.4633. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR29] Li H, Handsaker B, Wysoker A, et al. The sequence alignment/map format and SAMtools. Bioinformatics. 2009;25:207–209. doi: 10.1093/bioinformatics/btp352. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR30] Li XW, Yang Y, Henry RJ, Rossetto M, Wang YT, Chen SL. Plant DNA barcoding: from gene to genome. Biol Rev. 2015;90:157–166. doi: 10.1111/brv.12104. [DOI] [PubMed] [Google Scholar]

[CR31] Li S, Chang L, Zhang J. Advancing organelle genome transformation and editing for crop improvement. Plant Commun. 2021;2:100–141. doi: 10.1016/j.xplc.2021.100141. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR32] Mayor C, Brudno M, Schwartz JR, Poliakov A, Rubin EM, Frazer KA, Pachter LS, Dubchak I. VISTA: visualizing global DNA sequence alignments of arbitrary length. Bioinformatics. 2000;16:1046–1047. doi: 10.1093/bioinformatics/16.11.1046. [DOI] [PubMed] [Google Scholar]

[CR33] McKain, M Wilson (2017) Fast-Plast: rapid de novo assembly and finishing for whole chloroplast genomes. https://github.com/mrmckain/Fast-Plast

[CR34] Mower JP, Vickrey TL. Structural diversity among plastid genomes of land plants. In: Chaw S-M, Jansen RK, editors. Advances in botanical research. Cambridge: Academic Press; 2018. pp. 263–292. [Google Scholar]

[CR35] Niu Z, Xue Q, Zhu S, Sun J, Liu W, Ding X. The complete plastome sequences of four orchid species: insights into the evolution of the Orchidaceae and the utility of plastomic mutational hotspots. Front Plant Sci. 2017;8:715. doi: 10.3389/fpls.2017.00715. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR36] Oey M, Lohse M, Kreikemeyer B, Bock R. Exhaustion of the chloroplast protein synthesis capacity by massive expression of a highly stable protein antibiotic. Plant J. 2009;57:436–445. doi: 10.1111/j.1365-313X.2008.03702.x. [DOI] [PubMed] [Google Scholar]

[CR37] Okuda K, Myouga F, Motohashi R, Shinozaki K, Shikanai T. Conserved domain structure of pentatricopeptide repeat proteins involved in chloroplast RNA editing. Proc Natl Acad Sci USA. 2007;104:8178–8183. doi: 10.1073/pnas.0700865104. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR38] Pertea M. The human transcriptome: an unfinished story. Genes (basel) 2012;3(3):344–360. doi: 10.3390/genes3030344. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR39] Rogalski M, do Vieira NL, Fraga HP, Guerra MP. Plastid genomics in horticultural species: importance and applications for plant population genetics, evolution, and biotechnology. Front Plant Sci. 2015;6:586. doi: 10.3389/fpls.2015.00586. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR40] Ruf S, Bock R (2011) In vivo analysis of rna editing in plastids. In: Aphasizhev R (ed) RNA and DNA editing. Methods in molecular biology, 718. Humana Press, Totowa. 10.1007/978-1-61779-018-8_8 [DOI] [PubMed]

[CR41] Shaw J, Lickey EB, Schilling EE, Small RL. Comparison of whole chloroplast genome sequences to choose noncoding regions for phylogenetic studies in angiosperms: the tortoise and the hare III. Am J Bot. 2007;94(3):275–288. doi: 10.3732/ajb.94.3.275. [DOI] [PubMed] [Google Scholar]

[CR42] South PF, Cavanagh AP, Liu HW, Ort DR. Synthetic glycolate metabolism pathways stimulate crop growth and productivity in the field. Science. 2019;363:aat9077. doi: 10.1126/science.aat9077. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR43] Teske D, Peters A, Möllers A, Fischer M. Genomic profiling: the strengths and limitations of chloroplast genome-based plant variety authentication. J Agric Food Chem. 2020;68(49):14323–14333. doi: 10.1021/acs.jafc.0c03001. [DOI] [PubMed] [Google Scholar]

[CR44] Tillich M, Lehwark P, Pellizzer T, et al. GeSeq—versatile and accurate annotation of organelle genomes. Nucleic Acids Res. 2017;45:W6–W11. doi: 10.1093/nar/gkx391. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR45] Tørresen OK, Star B, Mier P, Andrade-Navarro MA, Bateman A, Jarnot P, Gruca A, Grynberg M, Kajava AV, Promponas VJ, Anisimova M, Jakobsen KS, Linke D. Tandem repeats lead to sequence assembly errors and impose multi-level challenges for genome and protein databases. Nucleic Acids Res. 2019;47(21):10994–11006. doi: 10.1093/nar/gkz841. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR46] Twyford AD, Ness RW. Strategies for complete plastid genome sequencing. Mol Ecol Resour. 2017;17:858–868. doi: 10.1111/1755-0998.12626. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR47] Vieira Ldo LN, Faoro H, Rogalski M, Fraga HP, Cardoso RL, de Souza EM, et al. The complete chloroplast genome sequence of Podocarpus lambertii: genome structure, evolutionary aspects, gene content and SSR detection. PLoS ONE. 2014 doi: 10.1371/journal.pone.0090618. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR48] Whittall JB, Syring J, Parks M, Buenrostro J, Dick C, Liston A, et al. Finding a (pine) needle in a haystack: chloroplast genome sequence divergence in rare and widespread pines. Mol Ecol. 2010;19(Suppl 1):100–114. doi: 10.1111/j.1365-294X.2009.04474.x. [DOI] [PubMed] [Google Scholar]

[CR49] Wicke S, Schneeweiss GM, de Pamphilis CW, Müller KF, Quandt D. The evolution of the plastid chromosome in land plants: gene content, gene order, gene function. Plant Mol Biol. 2011;76(3):273–297. doi: 10.1007/s11103-011-9762-4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR50] Yu Y, Yu PC, Chang WJ, Yu K, Lin CS. Plastid transformation: how does it work? Can it be applied to crops? What can it offer? Int J Mol Sci. 2020;21(14):4854. doi: 10.3390/ijms21144854. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR51] Zhong X (2020) Assembly, annotation and analysis of chloroplast genomes. Doctoral thesis, The University of Western Australia

PERMALINK

Reference-based assembly of chloroplast genome from leaf transcriptome data of Pterocarpus santalinus

Shanmugavel Senthilkumar

Kandasamy Ulaganathan

Modhumita Ghosh Dasgupta