Graphical abstract
Keywords: Polyploid sugarcane, PacBio long reads, Allele, Alternative splicing, Sucrose accumulation
Highlights
-
•
There is no high-quality reference genome for sugarcane cultivars at the current stage.
-
•
A full-length transcriptome dataset constructed at the allelic level could serve as a good reference.
-
•
Differential analysis uncovered the potential contribution of subgenome Saccharum spontaneum to sugar accumulation.
-
•
The different or even opposite expression patterns of alleles reflected the complexity of gene regulation in sugarcane.
Abstract
Introduction
Modern sugarcane cultivars (Saccharum spp. hybrids) derived from crosses between S. officinarum and S. spontaneum, with high-sugar traits and excellent stress tolerance inherited respectively. However, the contribution of the S. spontaneum subgenome to sucrose accumulation is still unclear.
Objective
To compensate for the absence of a high-quality reference genome, a transcriptome analysis method is needed to analyze the molecular basis of differential sucrose accumulation in sugarcane hybrids and to find clues to the contribution of the S. spontaneum subgenome to sucrose accumulation.
Methods
PacBio full-length sequencing was used to complement genome annotation, followed by the identification of differential genes between the high and low sugar groups using differential alternative splicing analysis and differential expression analysis. At the subgenomic level, the factors responsible for differential sucrose accumulation were investigated from the perspective of transcriptional and post-transcriptional regulation.
Results
A full-length transcriptome annotated at the subgenomic level was provided, complemented by 263,378 allele-defined transcript isoforms and 139,405 alternative splicing (AS) events. Differential alternative splicing (DA) analysis and differential expression (DE) analysis identified differential genes between high and low sugar groups and explained differential sucrose accumulation factors by the KEGG pathways. In some gene models, different or even opposite expression patterns of alleles from the same gene were observed, reflecting the potential evolution of these alleles toward novel functions in polyploid sugarcane. Among DA and DE genes in the sucrose source-sink complex pathway, we found some alleles encoding sucrose accumulation-related enzymes derived from the S. spontaneum subgenome were differentially expressed or had DA events between the two contrasting sugarcane hybrids.
Conclusion
Full-length transcriptomes annotated at the subgenomic level could better characterize sugarcane hybrids, and the S. spontaneum subgenome was found to contribute to sucrose accumulation.
Introduction
Sugarcane (Saccharum spp.) belongs to the Gramineae family with C4 high photosynthetic efficiency [1]. The Saccharum genus consists of wild species S. spontaneum and S. robustum, ancient cultivars S. sinense and S. barberi, and noble type S. officinarum [2]. Modern sugarcane cultivars are mainly the hybrid progeny from crosses between S. officinarum and S. spontaneum. For example, sugarcane variety R570 (2n = 115) has both S. officinarum and S. spontaneum genetic information; during the hybridization process, the S. officinarum as a recurrent parent retains about 80% of genetic information in hybrids, while S. spontaneum accountes for only 10% of the genetic information in hybrids, and another 10% are identified as recombinant chromosomes between the two species [3]. Sugarcane hybrids inherit their high sugar traits from S. officinarum and excellent stress tolerance from S. spontaneum [4]. It is unclear whether the S. spontaneum subgenome is involved in regulating sugar accumulation. Limited by the lack of high-quality reference genomes, the effect of allele expression patterns from S. spontaneum on sugarcane hybrids has rarely been investigated.
So far, several sugarcane reference genomes have been published, and two of them belong to sugarcane hybrids, R570 and SP80-3280. The R570 monoploid reference genome contained 3,965 high-quality contigs [5], and the assembly of the SP80-3280 genome produced 199,028 contigs, which did not contain allelic information of genes [6]. However, modern sugarcane hybrids contain about 130 chromosomes, which are distributed in > 10 homologous or heterologous chromosomal groups [7]. Therefore, neither R570 nor SP80-3280 reference genome can satisfy the current omics analysis of sugarcane hybrids. In the absence of a suitable reference genome, de novo assembly was often used for transcriptomic analysis of sugarcane hybrids [8], but de novo assembly usually does not provide well-defined allelic information. S. officinarum LA-purple (octoploid) and S. spontaneum AP85-441 (tetraploid) reference genomes have been assembled to the allelic level [4]. Compared with R570 (monoploid) and SP80-3280 genomes (contig, which has not assembled complete chromosome sequences), the allelic information of S. officinarum and S. spontaneum genome makes it’s possible to investigate gene regulation in polyploid sugarcane at subgenome level. Since sugarcane hybrids derived from the cross between S. officinarum and S. spontaneum, the combine genome of S. officinarum and S. spontaneum can serve as reference genome for sugarcane hybrids with better integrity and allelic information. In addition, trancriptomic study in sugarcane at allelic level requires high quality sequences. In recent years, PacBio full-length transcriptome sequencing (ISOseq) has significantly reduced its sequencing error rate [9], and the combination of ISOseq and next generation RNA-seq can well improve the accuracy of transcriptome analysis [10].
High-throughput short reads generated by RNA-seq could be used to quantify the expression of genes and alternative splicing (AS) events. Unlike post-regulation on gene expression levels, AS acts on producing different transcript isoforms, which might alter the efficiency or even function of mRNA. There are five main types of AS events: retained intron, alternative 3′ splice site, alternative 5′ splice site, skipped exon, and mutually exclusive exons. The main AS type in mammals is skipped exon; on the contrary, the main AS type in plants is retained intron [11]. In sugarcane studies, AS actively responds to environmental factors, and both abiotic and biotic stresses can affect AS regulation. For example, in smut-infected sugarcane, some genes were involved in AS regulation, and these genes were enriched in cell wall modifications, transcription factors, and defense signaling pathways [12]. Through AS, the ScMYB2 gene in sugarcane produces two transcript isoforms, ScMYB2S1 and ScMYB2S2, and may involve in drought-induced sugarcane senescence by participating in the ABA-mediated leaf senescence signaling pathway [13]. After cold acclimation, AS genes were closely related to oxidoreductase activity and sugar metabolism pathways [14]. However, the AS mechanism linked to sucrose accumulation in sugarcane remains unknown.
In this study, PacBio long reads were aligned to the reference genome combined from S. officinarum LA-purple (2n = 8x = 80) and S. spontaneum AP85-441 (2n = 4x = 32), providing a representative sequences dataset. Illumina short reads were used to identify differentially alternative spliced (DA) and differentially expressed (DE) genes between high and low sugar sugarcane hybrids, and the associated metabolic pathways of them were explored by KEGG enrichment analysis. The results of this study aimed to (1) construct a high-quality transcriptome that enriches the genomic data available in sugarcane hybrids; (2) investigate the factors responsible for differential sucrose accumulation from the perspective of transcriptional and post-transcriptional regulation; (3) provide candidate genes for genomics-assisted breeding and data support for sugar productivity enhancement in sugarcane.
Materials and methods
Phenotyping and sample collection
The experiment was conducted at the Guangxi University (Guangxi Zhuang Autonomous Region, China) (108°19′ E, 22°51′ N). Fully grown, disease-free 12 months old plants grown in the field were selected for analysis. The tissue chosen for Brix (a measure of soluble solids in sugarcane juice) measurement was the 3rd internode from the top. The tissue selected for sucrose content (fresh weight) measurement was the whole stems by ion chromatography. The above assay results classified sugarcane into high-sugar and low-sugar hybrids. For each sugarcane hybrid, we selected 3rd internode tissue from the top and corresponding healthy leaf tissue for sequencing with three biological replicates. Leaf and stem tissues from six sugarcane hybrids were rapidly frozen in liquid nitrogen and stored at − 80 °C for subsequent RNA-seq.
Library construction and quality control of sequencing data
RNA was extracted by grinding tissue in a TRIzol reagent. mRNA was enriched by Oligo-dT magnetic beads. Then the enriched mRNA was reverse transcribed into cDNA using Clontech SMARTer PCR cDNA Synthesis Kit. Optimized cycle number was used to generate double-stranded cDNA. In addition, >5kb size selection was performed using the BluePippinTM Size-Selection System and mixed equally with the no-size-selection cDNA. Then large-scale PCR was performed for the next SMRTbell library construction. cDNAs were DNA damage repaired, end-repaired, and ligated to sequencing adapters. The SMRTbell template was annealed to a sequencing primer, bound to polymerase, and sequenced on the PacBio Sequel II platform by Gene Denovo Biotechnology Co. (Guangzhou, China).
High-fidelity circular consensus sequencing (CCS) reads with sequencing accuracy of 99% were obtained using pbccs (v6.0.0) using the parameters: min-passes 1, min-length 50, min-rq 0.99. Lima (v2.0.0) was used to remove primers from CCS reads, samtools (v1.12) was used to merge the multiple output files, and isoseq3 (v3.4.0) was used to remove PolyA tails and then extract the full-length non-chimeric (FLNC) sequences. For all RNA-seq data, fastp (v0.20.1) was used for quality control, adapter trimming, quality filtering, and per-read quality pruning [15].
Construction of the updated transcriptome for sugarcane hybrids (UTSH)
The progenitor species genomes of sugarcane hybrids, S. officinarum LA-purple (2n = 8x = 80) and S. spontaneum AP85-441 (2n = 4x = 32) (https://sugarcane.zhangjisenlab.cn/sgd/html/download.html, accessed on 10 January 2022) were integrated to a combined genome. Both genomes provide the relationship of their respective alleles and Sorghum gene models, then the allelic relationships of S. officinarum and S. spontaneum were linked together, and the rest of the alleles between S. officinarum and S. spontaneum were matched according to the highest value of mutual BLAST (v2.12).
For each PacBio sample, the FLNC reads were uniquely aligned to the combined genome using GMAP (v2021.8.25) with parameters: -n 1, min-trimmed-coverage 0.5, min-identity 0.99, cross-species. Considering the calibration accuracy of CCS reads to be 0.99, we used the same similarity to tolerate errors.
For aligned reads, the script collapse-isoforms-by-sam.py from cDNA-cupcake (v28.0.0) was used to merge redundant FLNC reads for each PacBio sample respectively, parameters: -min-coverage 0.99, -min-identity 0.99. The gffread (v0.12.7) and gffcompare (v0.11.2) merged four PacBio samples to generate synthetic GTF annotation files; then, the script sqanti3-qc.py and Rulesfilter.py from SQANTI3 [16] were used to filter and optimize annotation files to maintain reliable sequences and generate final transcript isoforms. The longest transcript was extracted for each allele as representative sequence and was grouped into Group A, which is the first part of UTSH.
For unaligned reads, the FLNC reads with similarity < 0.99 were first clustered into unigenes using cd-hist-est (v4.8) with similarity 0.95 and then uniquely aligned to the combined genome again with parameters: -n 1, min-trimmed-coverage 0.5, min-identity 0.95, cross-species. The unigenes aligned again were defined as potential alleles grouped in Group B, while unigenes unaligned again were classified as Group C. Similar full-length transcriptome processing [17] provided a reference threshold. In Group C, potential microbial unigenes were predicted using Kaiju [18].
Assessment of the UTSH dataset
The completeness of the UTSH dataset was quantified by BUSCO (v5.0.0) [19]. RNA-seq data of ZZ1 and SP80-3280 were aligned to the UTSH dataset using Hisat2 (v2.2.1) [20] with default parameters to assess the alignment rate. To obtain the functional information of homologous sequences, the UTSH database was searched against the NCBI Non-Redundant Protein Sequence Database (NR, ftp://ftp.ncbi.nlm.nih.gov/blast/db/), Swiss-Prot Database (https://www.uniprot.org/downloads/), and EuKaryotic Orthologous Groups (KOG, https://ftp.ncbi.nih.gov/pub/COG/KOG/) with BLAST (v2.12) threshold E-value 10-5. Sequenceserver (v2.0.0) was used for the online BLAST of the UTSH database [21].
Transcriptomic analysis in sugarcane hybrids
AS events were identified using SUPPA2 (v2.3) with default parameters [22]. RNA-seq clean reads were aligned to the combined genome using Hisat2 (v2.2.1), then DA events were identified using rMATs (v4.1.1) with a false discovery rate (FDR) < 0.05 and ΔPSI > 0.1. Differential expression analysis was carried out using edgeR (v3.36.0), and the screening criteria for differential genes were FDR < 0.05 and |log2FoldChange| > 1.
KAAS (https://www.genome.jp/tools/kaas/) was used to generate KEGG annotation [23]. Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis was performed using the OmicShare tools (https://www.omicshare.com/tools), which invoked the corresponding R package.
Validation of the transcriptome assembly and alternative splicing events
Forward and reverse primers for PCR amplification were designed using NCBI Primer-BLAST function (https://www.ncbi.nlm.nih.gov/tools/primer-blast/, accessed on 20 March 2022). PCR was carried out in a 20 μl reaction mixture containing 10 μl of 10 × reaction buffer, 5 pmol of each primer, 1.25 units of Taq DNA polymerase, and 20 ng of cDNA template. The PCR reaction was performed in thermocyclers using the following cycling parameters: 94 °C (3 min); 35 cycles of 94 °C (30 s), 58 °C (30 s); 72 °C (30 s), then 72 °C (5 min). PCR products were visualized on agarose gels (2%).
Results
RNA sequencing of sugarcane hybrids with contrasting sugar content
According to Brix value (the 3rd internode from the top) and sucrose content (whole stem), six sugarcane hybrids were classified into a low-sugar content group (12–17204, 15–701, and 16–104), and a high sugar content group (YT93-159, GT35, and ROC22). The two groups exhibited a significant difference in the Brix value and sucrose content (P < 0.001) (Fig. 1A). Leaf and stem tissues from the six sugarcane hybrids were utilized for transcriptomic study using the PacBio and Illumina sequencing platforms. A total of 36 samples were sequenced using the Illumina platform, including three biological replicates of leaf and stem tissues for six sugarcane hybrids. The PacBio platform was used to sequence four pooled samples with nine from each group (LL, leaf of low-sugar hybrids; LS, stem of low-sugar hybrids; HL, leaf of high-sugar hybrids; HS, stem of high-sugar hybrids) (Table S1).
Fig. 1.
Experimental design of this study. (A) Brix and sucrose content of selected sugarcane hybrids. At the late growth stage, the tissue sampled for Brix measurement was the 3rd internode from the top, and the tissue sampled for sucrose content determination was the whole stem. (B) The pipeline to construct the updated transcriptome of sugarcane hybrids (USTH) dataset using PacBio long reads. “sim” represents “similarity”.
Full-length non-chimeric (FLNC) sequences were extracted from the PacBio platform after removing sequencing primers (Table S2) and data processing. A total of 3.7 million FLNC reads were obtained from 3.9 million CCS reads, with an average length of 2,602 to 2,707 bp and an N50 length of 2,739 to 3,034 bp from the four-pooled samples (Table 1). RNA-seq results from 36 Illumina samples resulted in an average of 68.3 million clean reads per sample with a Q20 rate of 97.93% and a GC content of 55.25% (Table S3).
Table 1.
PacBio ISOseq output statistics.
| Samples | LL | LS | HL | HS | Average | |
|---|---|---|---|---|---|---|
| Summary of CCSa | ZMWb reads number | 1,458,258 | 1,151,319 | 1,436,305 | 1,191,697 | 1,309,395 |
| CCS reads number | 1,089,235 | 858,242 | 1,065,366 | 889,934 | 975,694 | |
| Read bases of CCS (bp) | 2,953,712,234 | 2,404,447,142 | 2,973,206,987 | 2,412,441,622 | 2,685,951,996 | |
| Mean Read length of CCS (bp) | 2,712 | 2,802 | 2,791 | 2,711 | 2,754 | |
| Summary of FLNCc | FLNC reads number | 1,002,196 | 804,915 | 1,012,454 | 837,039 | 914,151 |
| Read bases of FLNC (bp) | 2,641,851,236 | 2,178,718,162 | 2,739,795,345 | 2,177,595,795 | 2,434,490,135 | |
| Mean read length of FLNC (bp) | 2,636 | 2,707 | 2,706 | 2,602 | 2,663 | |
| GC content of FLNC (%) | 50.13% | 49.57% | 50.36% | 50.00% | 50.02% | |
| N50 of FLNC | 2,922 | 2,869 | 3,034 | 2,739 | 2,891 | |
Circular consensus sequencing (CCS) reads.
Zero-mode waveguide (ZMW) holes.
Full-length non-chimeric (FLNC) reads.
Based on subgenomic features of sugarcane hybrids, we used a genome-guided strategy to separate transcript sequences at the allelic level (Fig. 1B). The combined genomes were composed of two progenitor species genomes, S. officinarum LA-purple (2n = 8x = 80) and S. spontaneum AP85-441 (2n = 4x = 32), and their corresponding alleles were correlated according to genome annotations.
After aligning FLNC reads to the combined genome, they were categorized into three categories based on their sequence similarity with the genome (Table 2). In the first category (Group A), FLNC reads with a high sequence similarity with the reference genome (>0.99) were clustered, and the longest transcripts from each cluster were extracted as a representative sequence. The rest of FLNC reads were assigned to Group B (0.99 > sequence similarity > 0.95) and Group C (sequence similarity < 0.95). The transcript sequences in Group B and C were not distinguishable at the allelic level and were retained to complement sugarcane hybrid unigenes. In total, 369,698 sequences were included in the UTSH dataset, of which 92,149 sequences were derived from Group A at the allelic level, 187,069 sequences from Group B and 90,480 sequences from Group C at unigene level. For alleles with transcript sequences aligned in Group A, 45,733 were annotated by original genome annotations, while 46,416 were novel. A total of 16,141 known gene models were characterized in Group A, with an average of 5.7 alleles per gene model. In Group B, 19,042 unigenes corresponded to 6,824 known gene models, and 176,178 unigenes were not annotated. In Group C, all the 90,480 unigenes were novel, of which 31,584 unigenes were likely derived from microbial genomes according to their sequence annotations.
Table 2.
Statistics of the UTSH dataset of sugarcane hybrids.
| UTSH dataset | Group A | Group B | Group C |
|---|---|---|---|
| Number of sequences | 92,149 | 187,069 | 90,480 |
| Similarity to genome | > 0.99 | Between 0.99 and 0.95 | < 0.95 |
| Resolution level | Allele | Unigene | Unigene |
| Number of sequences matching known annotations | 45,733 | 10,891 | – |
| Number of gene models | 16,141 | 6,824 | – |
| Number of novel genes | 46,416 | 176,178 | 90,480 |
Evaluation of the UTSH dataset
The PacBio provided an average of 10.6 CCS reads and 9.9 FLNC reads per transcript sequence, showing its high quality for constructing the UTSH dataset. To verify the integrity of the UTSH dataset, 1,614 homologous single-copy conserved genes in plants were used. UTSH dataset integrity was 95.7%, only 3.9% lower than the combined genome (99.6%), and even sequences from Group A had 80.6% BUSCO integrity (Fig. 2A). There were very high proportions of duplicated regions (dark blue regions) for all of the datasets because alleles were considered as duplicates. All sequences in the UTSH dataset were then annotated using the NR, Swiss-Prot, and KOG public databases, and 93.7% of the sequences could identify their target hits (Fig. 2B). Overall, the comprehensiveness of the UTSH dataset constructed using PacBio SMRT sequencing was sufficient for the subsequent analysis.
Fig. 2.
Evaluation and annotation of the UTSH dataset. (A) Assessment of assembly integrity using homologous single-copy conserved genes, light blue and dark blue regions are counted for integrity. Code “C” on the bar graph represents complete sequences, “S” represents single-copy sequences, “D” represents duplicated sequences, “F” represents fragmented sequences, “M” represents unassembled sequences, and “n” represents total number of homologous single-copy conserved genes. (B) Annotation of UTSH dataset using NR, Swiss-Prot, and KOG databases.
To evaluate the UTSH dataset as the reference for transcriptomic analyses, we aligned RNA-seq data from two sugarcane cultivars, ZZ1 and SP80-3280 [24] to multiple references (Table S5). The alignment rate to the UTSH dataset (78.3%∼92.0%) was very close to that of the S. officinarum LA-purple genome (83.5%∼93.2%) but exceeded that of the S. spontaneum AP85-441 genome (78.0%∼87.4%) (Figure S1). The alignment rate to the UTSH dataset was significantly higher than the CDS sequences from the S. officinarum LA-purple genome (54.2%∼58.4%) or S. spontaneum AP85-441 genome (50.7%∼55.5%), indicating that many transcripts from sugarcane hybrids may not have been annotated in the two progenitor genomes. The results supported that the UTSH dataset was more suitable for sugarcane transcriptomic study in terms of alignment rate and efficiency.
To improve its utilization, we constructed an online database for the UTSH dataset and key results (https://120.48.69.188:8081/, Figure S2). The online database consisted of two functions: (1) the BLAST function allowed users to search for related results from our study using their sequences of interest (Figure S3); (2) the UTSH dataset sequences, expression matrix files, AS events files, and allele association files were upload, which can be searched and downloaded freely for users (Figure S4).
Alternative splicing analysis of sugarcane hybrids
Since sequences from Group A of the UTSH dataset were at allelic level, 263,378 transcript isoforms from 92,149 alleles were used for AS analysis of the sugarcane transcriptome. Compared with the genome annotations, 263,378 transcript isoforms corresponded to 179,388 known splice junctions and 358,926 new splice junctions. It was observed that around 20% of the genes had more than one transcript isoform (Fig. 3A), which was in line with the previous study [25]. As shown in Table S6, the FSM and ISM transcripts that matched known splice junctions accounted for only 12.21% of the total transcripts captured, indicating that PacBio sequencing could enrich sugarcane hybrids transcript isoform identification. Among the eight categories, four important types of transcripts were plotted according to the features of splice junctions (Fig. 3B). The transcript length of the eight categories was around 2,000 bp, while NIC and NNC were longer than that of FSM or ISM (Fig. 3C), implying that these new transcript isoforms retained introns or partial introns. Both SQANTI and SpliceGraph software detected that GTAG accounted for up to 98% of the total splicing junctions (Fig. 3D). In addition, GCAG and ATAC splicing junctions occupied a relatively small proportion of the junctions, as reported previously in other species [26], [27].
Fig. 3.
Transcript isoform statistics of sugarcane hybrids. (A) Number of isoforms per gene. (B) Structural diagram of four kinds of transcript isoforms based on splice junctions. The gray triangles represented novel splice junctions. (C) Length distribution of transcript isoforms in eight structural categories. (D) The percentage of canonical splice junctions in all alternative splicing events captured by SQANTI and SpliceGraph software.
We estimated the number of five major AS events (Fig. 4A). A total of 139,405 AS events were found, among which 73,224 (52.5%) retained intron (RI), followed by 31,123 (22.3%) alternative 3′ splice site (A3), 19,109 (13.7%) alternative 5′ splice site (A5), 15,061 (10.8%) skipped exon (SE), and 978 (0.7%) mutually exclusive exons (MX) (Fig. 4B), showing that RI was the largest proportion of AS in sugarcane hybrids. The above proportion was consistent with studies of Arabidopsis thaliana [28] and Sorghum bicolor [29]. Interestingly, of the 263,378 transcripts, 223,177 (84.7%) were from S. officinarum and 40,201 (15.3%) from S. spontaneum. Of the 139,405 AS events, 122,536 (87.9%) were generated by transcripts from S. officinarum, while 16,868 (12.1%) by transcripts from S. spontaneum, indicating that most AS processes occurred in subgenomic sources from S. officinarum. Previous studies have shown that about 80% of the genome of sugarcane hybrid is derived from S. officinarum and about 10%-20% from S. spontaneum [30], [31]. Our analysis provided evidence for this opinion from the transcriptomic perspective.
Fig. 4.
Alternative splicing analyses of sugarcane transcriptome. (A) Five major types of AS events. Deep gray represents exon, and light gray represents intron. (B) AS events identified in sugarcane hybrids. (C) Statistics for DA analysis in LL vs. HL (low-sugar vs. high-sugar in leaf tissue) and LS vs. HS (low-sugar vs. high-sugar in stem tissue). The significant thresholds were FDR < 0.05 and ΔPSI > 0.1; ΔPSI is a value in the range of 0–1 and represents the difference between the alternative splice junctions in contrasting groups. (D) KEGG enrichment analysis of DA genes in leaf tissue (LL vs. HL) and stem tissues (LS vs. HS).
The Illumina sequencing data showed good reproducibility of biological replicates (Figure S5), which were used to characterize DA events. DA events are quantified results of AS events, representing AS events with sufficiently significant differences in exon inclusion levels between the two contrasting groups. We identified 793 DA events across 575 coding genes in leaf tissue, and 688 DA events across 520 coding genes in stem tissue (Table S7 and Fig. 4C). All expressed genes were used as background, and the KEGG enrichment analysis revealed significant enrichment of nitrogen metabolism, pyruvate metabolism, and carbon fixation of photosynthetic organisms in leaf tissues (Fig. 4D, LL vs. HL), while the spliceosome pathway and amino acid biosynthesis pathway were significantly enriched in stem tissues (Fig. 4D, LS vs. HS).
In the spliceosome pathway mentioned above, SR coding genes had already been reported to have active alternative splicing [32], so they were excellent for viewing the alternative splicing landscape. We identified five SR coding genes and selected one coding gene Soff.10G0010700-3D as a demonstration to show its splicing landscape. The SR gene Soff.10G0010700-3D included 10 transcript isoforms (Fig. 5A), and 3 DA events between high and low sugar sugarcane hybrids (Fig. 5B). Notably, the splice junctions indicated by the ISOseq full-length isoforms and RNA-seq short-reads showed consistent exon boundaries, supporting the reliability of AS characterized in this study. In addition, we verified the reliability of the analysis through molecular experiments. Three genes (XLOC003085, XLOC000116 and XLOC000693) were randomly selected from the allele-defined UTSH Group A to design primers to amplify them. The expected PCR products were successfully amplified from the DNA templates of all six sugarcane hybrids (Figure S6). Next, we validated AS events by reverse transcription PCR (RT-PCR) on a novel gene XLOC000693 predicted to contain one RI event. The RT-PCR products from six sugarcane hybrids confirmed the existence of RI event (Fig. 5C, Table S4). The results of both PCR and RT-PCR demonstrated that the accuracy of the UTSH database construction and the detection of AS events. Detail information of these validated genes were provided (Figure S6, Table S4).
Fig. 5.
AS landscape of Serine/arginine coding gene Soff.10G0010700-3D and novel gene XLOC000693. (A) Visualization of the 10 transcript isoforms of SR protein-coding gene. Arrows indicated the negative chain; black rectangles represented exons, and black thin lines represented introns. Red, blue, and green rectangles represented models of alternative splicing events. (B) Visualization of three significant DA candidates in SR protein-coding gene, including one A3 and two RI events. Purple and red stacked sections represented the reads per kilobase per million (RPKM) expression of AS region, and the top right corner of the stacked sections recorded the gene name, group name, and lnclevel. Lnclevel represented the Exon Inclusion Level, which was the percentage of Exon Inclusion Isoforms in the total (Exon Inclusion Isoform plus Exon Skipping Isoforms). A higher lnclevel value means more introns were retained, and the absolute value of the difference between the lnclevel of the two samples marked as ΔPSI and the arcs and their numbers represented the number of short Illumina reads spanning splice junctions (known as Exon Skipping Isoforms). The scale at the bottom indicated the chromosome coordinates where the AS event occurred. (C) Validation of the RI alternative splicing event inside the novel gene XLOC000693. The symbols above the gel electrophoresis lanes were marker (M) and the name of each sugarcane hybrid. The arrows indicated by “F” and “R” designed on the full-length transcript in the diagram represent the forward and reverse primers.
Differential expression analysis of sugarcane hybrids
We also investigated differential expression analysis between high and low sugar sugarcane hybrids (Fig. 6A, Table S7). In leaf tissue, a total of 2,166 DE genes involved in 879 known gene models were identified, of which 947 (43.72%) were up-regulated and 1,219 (56.28%) were down-regulated in the high sugar hybrid. Interestingly, opposite expression patterns were observed in alleles of 26 genes (Table S8). Among them, alleles of six genes derived from S. officinarum showed up-regulation (high-sugar vs. low-sugar), accompanied by down-regulation of alleles derived from S. spontaneum. However, one gene showed the opposite situation: alleles derived from S. spontaneum were up-regulated (high-sugar vs. low-sugar), accompanied by down-regulation of alleles derived from S. officinarum. In stem tissue, a total of 2,987 DE genes involved 1,253 known gene models were found, of which 1,365 (45.70%) were up-regulated and 1,622 (54.30%) down-regulated in the high sugar sugarcane hybrid. Different expression patterns of 41 genes were also identified (Table S9). Among them, alleles of three genes derived from S. officinarum showed up-regulation (high-sugar vs. low-sugar), accompanied by down-regulation of alleles derived from S. spontaneum. However, seven genes showed the opposite situation: alleles derived from S. spontaneum were up-regulated (high-sugar vs. low-sugar), accompanied by down-regulation of alleles derived from S. officinarum. The different allelic expression patterns of the same genes probably reflected coordination between subgenomes or these alleles evolved into new functions in allopolyploid sugarcane.
Fig. 6.
Differential expression analysis between high and low sugar sugarcane hybrids. (A) Venn diagram showing DE genes in LL vs. HL (low-sugar vs. high-sugar in leaf tissue) and LS vs. HS (low-sugar vs. high-sugar in stem tissue), “Upregulated” indicated higher expression in high-sugar varieties. (B) KEGG pathway enrichment analysis of DE genes. “up” indicated higher expression in high-sugar varieties. The significance of the most represented pathway in each comparison is shown by log-transformed FDR (red).
KEGG enrichment analysis showed that DE genes were significantly assigned to various pathways (Fig. 6B). DE genes that were up-regulated in leaf tissue and stem tissue enriched in glutathione metabolism and cyanoamino acid metabolism. DE genes were up-regulated in leaf tissue but down-regulated in stem tissue enriched in arginine and proline metabolism. DE genes that were down-regulated in both leaf tissue and stem tissue, which were enriched in the proteasome, and biosynthesis of amino acids. A few pathways were enriched under single regulatory conditions, including starch and sucrose metabolism, biosynthesis of secondary metabolites, peroxisome, nitrogen metabolism, spliceosome, etc. We also evaluated intersected genes identified by differential alternative splicing analysis and differential expression analysis (Table S10). These genes probably played a critical role in sugar metabolism, and further validation experiments were needed.
Integrative gene regulation in source-sink pathways in sugarcane hybrids
During sugarcane growth and sucrose accumulation, sucrose synthesized in the leaf (source) is transferred to the internode (sink) through the symplast and apoplast pathways, and complicated biological processes are involved in the process [33]. Here, we mapped identified DA and DE genes to sucrose-related metabolic pathways (Fig. 7, Table S11, Table S12), including carbon fixation in photosynthetic organisms and starch and sucrose metabolism pathways. To be easy to understand, genes with “Soff” stands for they originally derived from the S. officinarum, and genes with “Sspon” from S. spontaneum, while genes with “XLOC” stands for genes that were not annotated in the combined genomic genome.
Fig. 7.
Sucrose accumulation in source-sink of sugarcane. Log2-transformed values showed the TPM expression of DE genes and DA genes. Metabolic directions were indicated with black arrows. In the pathways, the enzymes marked in color contained DE genes, and the enzymes marked with “*” and in color contained DA genes. In the heat map, the genes marked with arrows were DE genes, and genes marked with “*” were DA genes.
In leaf tissue (source), the carbon fixation in the photosynthetic pathway involves the C4-dicarboxylic acid cycle, including three subtype cycles (NADP-ME, NAD-ME, and PEPCK subtypes). PEPC enzyme is involved in the three decarboxylation pathways simultaneously and plays a role in fixing the carbon source to oxaloacetate (OAA) [34], [35]. We identified two DA alleles (Soff.10G0005450-2C and Soff.10G0005450-5F) and one DE allele (XLOC048421, down-regulated in high sugar sugarcane hybrids) coding PEPC. Then we further analyzed alleles in the three subtypes separately. The NADP-ME subtype is sugarcane's most important decarboxylation pathway [36], which occurs mainly in the chloroplast. From the NADP-ME subtype, we identified five DA alleles coding PPDK (XLOC013286, Soff.09G0003270-3D, Soff.09G0003270-5G, Sspon.07G0008260-1A, and Sspon.07G0008260-2B), which converts pyruvate into phosphoenolpyruvate (PEP). The PEPCK subtype occurs mainly in the cytosol, where AspAT converts aspartate (Asp) to OAA. We identified one DA allele (Soff.04G0006150-3D) coding AspAT, one DA allele (XLOC017087) coding PEPCK, and one DE allele (Soff.04G0000170-6F, up-regulated in high sugar sugarcane hybrids) coding AspAT. The NAD-ME subtype mainly occurs in the mitochondria. One DA allele (XLOC000545) and two DE alleles (Soff.09G0005730-5F down-regulated, Sspon.06G0021590-2C up-regulated in high sugar sugarcane hybrids) coding NAD-MDH were identified. Although sugarcane is predominantly a NADP-ME carbon fixation type, we noted that the differential allelic expression pattern involving PEPCK and NAD-ME subcycles suggested differences between these two subcycles in high-sugar and low-sugar sugarcane hybrids, and it was likely that DA and DE alleles of S. spontaneum played a role in decarboxylation pathways. Additionally, in leaf tissue (source), alleles coding sucrose synthase (SuSy), glucose-6-phosphate synthase (TPS), and sucrose invertase (INV) in the sucrose metabolism pathway were identified as differentially expressed. Specifically, one DE allele (XLOC045207) coding SuSy was up-regulated in the high sugar sugarcane hybrids, while two alleles (Soff.03G0013730-5F and Sspon.02G0015500-4D) coding TPS were down-regulated. Different expression patterns were also identified for INV (Soff.04G0008300-4D and Sspon.03G0025770-2C), in which Soff.04G0008300-4D was up-regulated, and Sspon.03G0025770-2C was down-regulated in the high sugar sugarcane hybrids.
In stem tissue (sink), the pathway of starch and sucrose metabolism was enriched by KEGG enrichment analysis. One allele (Soff.03G0017710-5G) coding SPS and two alleles (XLOC008117 and XLOC010274) coding SuSy showed DA patterns. We identified one allele (XLOC045207) coding SPS and one allele (Soff.04G0008300-5E) coding INV that were significantly up-regulated in the high sugar sugarcane hybrids, while one allele (Soff.09G0002770-6F) coding SPS, one allele (Soff.04G0000540-7G) coding INV and three alleles (Soff.03G0013730-5F, Sspon.02G0015500-4D, and XLOC048176) coding TPS that were significantly down-regulated. Our results were consistent with the related report that genes coding SuSy were up-regulated while genes coding TPS were down-regulated in immature internodes of low sucrose sugarcane [37].
Discussion
As a global sugar and energy crop, sugarcane has already attracted the attention of researchers for a long time with its molecular mechanisms of important traits such as high sugar, high yield, and stress resistance. Modern sugarcane, with the complex allopolyploid genome, has so far posed great difficulties in genome assembly [5]. Limited by the incomplete genome, de novo assembled transcripts were still considered to be the best choice for representing samples in transcriptome analysis; however, unigenes clustered from de novo assembled transcripts usually lack allelic information, and even the accuracy of de novo assembled transcripts themselves have been questioned [38]. So far, limited studies in sugarcane have been conducted to investigate the molecular mechanisms at the allelic level, and the above issues were considered in this study. First, we constructed a full-length sugarcane transcriptome at allelic level. It showed that the full-length sugarcane transcriptome performed well in both “heterologous” and “polyploid” perspectives, and can be used as a good reference for studies of modern sugarcane. The full-length transcripts even included a large number of novel loci that were not available in the original genome annotation, which further complemented the sugarcane hybrids’ omics resources. Second, we identified different expression patterns of alleles in multiple gene models, and screened for DA and DE alleles associated with sucrose content. Although the exact effect of the DA genes identified in the sucrose source-sink process on sugar content remains to be verified, we extended our observations on the post-transcriptional regulation of these known genes. The DE genes identified in the sucrose source-sink process revealed potential causes of high sucrose accumulation. The full-length sugarcane transcripts and related information were all summarized in an online website named “Updated Transcriptome of Sugarcane Hybrids Database”. We expected the transcriptomic information in this study will be an important resource for future modern sugarcane omics research.
The UTSH dataset, a high-quality transcriptomic reference for sugarcane
Transcriptomic study based on next-generation sequencing encounters some difficulties in sugarcane. First, it lacks a high-quality reference genome for sugarcane hybrids because of the complicated genetic background and allopolyploid features and greatly restricts sugarcane omics research. Currently available sugarcane reference sequences were relatively limited, which were genomes of S. hybrids R570 [5] and S. spontaneum AP85-441 [4], and S. hybrids SP80-3280 [39]. However, the genomes of R570 and SP80-3280 have not been assembled to a haplotype-resolved chromosomal level, and only a low proportion of S. spontaneum genomes was inherited in sugarcane hybrids, making all of these genomes not very suitable as references for modern sugarcane hybrids. Second, the genome-guide or de novo assembly strategy using short reads usually results in incomplete and false transcripts [40]. Therefore, it is necessary to construct a high-quality transcript reference dataset for sugarcane hybrids with alleles resolved.
Full-length transcriptome sequencing technology applied in sugarcane hybrids studies has been reported. The technology has been applied in studies of tiller development and regulation of internode elongation by mepiquat chloride, yielding high-quality non-redundant transcript isoforms [41], [42]. And it has also been used in comparison with RNA-seq technology, demonstrating the advantages of full-length sequencing technology in recovering full-length sequences [43]. However, these studies did not provide accurate allelic information, and did not mine subgenomic information in sugarcane hybrids. In the current project, to characterize the transcriptome landscape of sugarcane hybrids, we used the latest PacBio ISOseq technology to obtain full-length sequences of transcripts. The PacBio ISOseq overcomes the assembly difficulties mentioned above by directly sequencing the whole length of transcripts, and has been proven to be powerful in sugarcane transcriptomic studies [43]. To assign transcript isoforms to their corresponding alleles, we integrated the genomes of the two progenitor species, S. officinarum LA-purple and S. spontaneum AP85-441, into an integrated genome; further analyses revealed that 84.7% of transcripts in sugarcane hybrids were derived from the S. officinarum genome and 15.3% from the S. spontaneum genome, which is close to the estimated genome proportion in R570 based on cytogenetic research [3].
The UTSH dataset provided a comprehensive sugarcane transcriptome reference, dramatically recovering transcripts at the allelic level. In the UTSH dataset, the allele-defined Group A sequences complemented 46,416 new gene loci in the intergenic regions of the genome annotation, indicating that the genome annotation was originally incomplete, and the UTSH dataset is more suitable as a transcriptome reference for sugarcane hybrids. The advantage of the UTSH dataset is even more pronounced if the transcript isoforms are considered. Approximately 88% of identified transcript isoforms were novel compared to genome-annotated transcripts, benefiting from the high-fidelity sequences obtained from full-length circular reads using PacBio ISOseq technology. In addition to its high-quality reference at the allelic level, evaluation based on additional RNA-seq data (ZZ1 and SP80-3280) showed that the UTSH dataset had an extremely good alignment rate. Specifically, the alignment rates of ZZ1 and SP80-3280 both indicated that the UTSH dataset had a similar high alignment rate as the S. officinarum genome or the S. spontaneum genome and was much higher than the CDS sequences of both genomes, respectively (∼20% higher). These transcripts defined by high-resolution alleles provide valuable ready-to-use resources for gene discovery and modern sugarcane molecular breeding research.
Alternative splicing landscapes in sugarcane hybrids
Alternative splicing is an important post-transcriptional regulation, and during the process, multiple transcript isoforms from a single gene were generated to fulfill diverse biological functions. Exploring AS landscapes in sugarcane hybrids is beneficial for discovering gene functions at transcript levels. From 263,378 non-redundant sugarcane transcript isoforms, we identified 139,405 AS events, and 52.5% of these were identified as RI types. Several plant studies have shown that RI is the most common splicing form in plants such as A. thaliana and Zea mays [44], [45]. The result indicated that sugarcane also had this feature.
DA genes in leaf tissue were mainly associated with carbon and nitrogen metabolism, and some of them were also distributed in three subcycles of decarboxylation (NADP-ME, NAD-ME, and PEPCK), indicating that differential post-transcriptional regulation existed in decarboxylation between high and low sugarcane hybrids, with high-sugar or low-sugar hybrids tending to express specific transcript isoforms respectively. The carbon fixation process is upstream of sucrose accumulation, so the three decarboxylation subcycles of the carbon fixation process may influence the synthesis of sucrose sources. DA genes in stem tissue were mainly related to the spliceosome pathway, and the post-transcriptional regulation differences of these genes may impact the mRNA splicing process in which the spliceosome participates.
Source-sink regulation analysis of sucrose accumulation at the allelic level
Sugar accumulation in sugarcane involves multiple complex regulations. For example, Grof et al. classified the potential rate-limiting steps of sucrose accumulation as (1) leaf reactions (photosynthesis rate; sucrose biosynthesis; carbon partitioning); (2) rates of phloem sucrose loading in leaves and transport to maturing stalks; (3) unloading rate of sucrose into storage parenchyma vacuoles [46]. In this study, the factors affecting the source-sink accumulation of sucrose were discussed from the perspective of differential genes in high and low sugar hybrids, and the identified differential genes partially intersected with the sugar accumulation rate-limiting step described above. Since DA genes tend to produce transcripts with different structures and the corresponding transcript functions remain to be verified experimentally, DA genes were kept as a data resource from our study. However, we highlighted the perspective of DE genes related to sucrose in this study. The analysis performed at the allele level allowed us to first observe that alleles in some gene models exhibited different expression patterns: in most cases, DE genes up-regulated in high-sugar sugarcane in one gene model were mainly from the S. officinarum subgenome, while DE genes up-regulated in high-sugar sugarcane in some gene models were from the S. spontaneum subgenome. Specifically, 26 genes were found to have oppositely regulated alleles in leaf tissue, and 41 genes were found to have oppositely regulated alleles in stem tissue. Among them, there was an up-regulation (high-sugar vs. low-sugar) of alleles from S. spontaneum and down-regulation (high-sugar vs. low-sugar) of alleles from S. officinarum, suggesting that alleles from S. spontaneum are likely to play an indispensable role, despite the relatively small portion of the S. spontaneum genome inherited in modern hybrids. A similar phenomenon of allele-specific expression has been reported in modern sugarcane hybrids. Margarido et al. found that allele-specific expression may occur in certain sugarcane genotypes, such as those genes involved in the biosynthesis of lignin that showed significant allele-specific expression for the fiber-rich SRA5 genotype, but not for the sugar-rich KQ228, suggesting that the specifically expressed alleles may be related to cellulose or sugar content [47].
Sucrose accumulation requires a large amount of carbon source, and photosynthesis is the first step to accumulating carbon compounds in plants. C4 plants can be divided into three subtypes according to the difference in decarboxylases in bundle sheath cells, namely NADP-ME (chloroplast activity), NAD-ME (mitochondrial activity), and PEPCK (cytoplasmic activity). Sugarcane is classified as the NADP-ME subtype, and recent studies have reported that light limit leads to increased decarboxylation of PEPCK and adjustment to maintain C4 photosynthetic efficiency [48]. The regulations among the three subtypes may lead to differences in carbon fixation capacity among sugarcane hybrids affecting the supply of carbon sources. Sucrose content in sugarcane is closely related to the coordination of the “source-sink” relationship and is regulated by the activity of several enzymes. For example, sucrose phosphate synthase (SPS) is a unidirectional enzyme from UDP-glucose to sucrose and a key regulator of the distribution of photosynthetic products to sucrose and starch [49], [50]. Sucrose synthase (SuSy) is a reversible enzyme that catalyzes both sucrose synthesis and sucrose catabolism and is expressed at higher levels in immature sinks compared to mature sinks [51]; Sucrose invertase (INV) catalyzes the irreversible hydrolysis of sucrose to glucose and fructose and is considered a key enzyme in the regulation of sucrose metabolism [52]. Trehalose-6-P (T6P) is synthesized by UDP-glucose (UDPG) and glucose-6-phosphate (Glc6P) through the catalysis of glucose-phosphate synthase (TPS), and there is also a strong correlation between T6P and sucrose, which is acting in the source-sink communication process [53].
With the help of the allelic relationships of the differential genes, we explored source-sink regulation of sucrose accumulation. In the leaf, we identified DA genes coding PEPCK, AspAT, PPDK, NAD-MDH, and PEPC, and DE genes coding PEPC, NAD-MDH, and AspAT. PEPCK and AspAT are the key enzymes in the PEPCK subcycle. Our analysis showed that the gene coding AspAT in the PEPCK subcycle was up-regulated in the high-sugar sugarcane hybrids relative to low-sugar sugarcane hybrids. Although sugarcane belongs to the NADP-ME subtype and primarily utilizes the NADP-ME subcycle while utilizing the NAD-ME and PEPCK subcycles to a lesser extent, we speculated that the PEPCK subcycle also plays a role in the high-sugar accumulation in sugarcane hybrids.
The sucrose metabolic pathway is an important pathway that directly affects the sucrose content. SuSy breakdowns sucrose to UDP-glucose and the expression of SuSy is negatively correlated with maturation level [54]. In our study, SuSy coding genes were up-regulated in immature stem and corresponding leaf in high-sugar sugarcane hybrids, and the high expression of SuSy increases carbon distribution and may enhance the construction of the sink. SPS plays a substantial role in the direction of sucrose synthesis, and its expression level is positively correlated with the sucrose accumulation in sugarcane internodes [55]. However, SPS may not be the most critical factor that directly promotes sucrose accumulation because over-expression of SPS alone in transgenic sugarcane plants could not improve sucrose yields [56]. SPS coding genes of high-sugar sugarcane hybrids were slightly down-regulated in immature internodes compared to low-sugar hybrids, indicating that SPS may not be the main factor for the sucrose synthesis and accumulation in immature tissues.
TPS is considered to be closely related to sugar partitioning in sugarcane leaves and is a potential target for sugar signaling mechanisms [57]. Trehalose-6-phosphate (T6P) is the product of TPS, and previous studies have shown that elevated levels of sucrose appear to compensate for the constant decrease of T6P to achieve a balance of endogenous regulatory mechanisms [53]. Down-regulation of TPS coding genes in both sink and source may be an important signal of high sucrose accumulation, which is consistent with the above research.
Among the above-mentioned enzymes closely related to sucrose accumulation, we noticed that gene Sspon.06G0021590-2C (up-regulated in high-sugar sugarcane) encoding the key enzyme NAD-MDH, Sspon.03G0025770-2C (down-regulated in high-sugar sugarcane) encoding the key enzyme INV, Sspon.02G0015500-4D (down-regulated in high-sugar sugarcane) encoding the key enzyme TPS, and DA genes Sspon.07G0008260-1A and Sspon.07G0008260-2B encoding PPDK. All these genes were well-characterized genes functioning in sugar accumulation, and inherited from S. spontaneum subgenome, suggesting that S. spontaneum subgenome had a non-negligible role in the upstream carbon fixation pathway and the downstream sucrose-related pathway.
Conclusion
Here, we generated PacBio ISOseq and RNA-seq data from leaf and stem tissues of high and low sugar content sugarcane hybrids to reconstruct and characterize the transcriptome. A high-quality transcriptomic dataset at allelic level was constructed, which will contribute to future high-quality transcriptomic studies of sugarcane cultivars. We identified 575 genes in leaf and 520 genes in stem containing DA events, 879 genes in leaf and 1,253 genes in stem with DE. Some of these DA and DE genes were mapped to classic source-sink pathways such as “carbon fixation in the photosynthetic pathway” and “starch and sucrose metabolism pathway”. Interestingly, we found some of alleles encoding sugar accumulation-related enzymes derived from S. spontaneum subgenome were up-regulated or had DA events in high-sugar hybrids, implying that alleles from S. spontaneum were indispensable for high sugar accumulation. A lot of alleles from the same genes had different patterns, indicating the coordination between subgenomes or these alleles evolved into new functions in polyploid sugarcane. In summary, we provided a representative transcriptome sequence dataset of sugarcane hybrids at allelic level, and further investigated sucrose accumulation mechanism, which provided new insights into the contribution of S. spontaneum and the complex allelic expression transcriptional regulation in sucrose accumulation, and candidate genes for genomics-assisted breeding towards sugar enhancement in sugarcane.
Compliance with ethics requirements
This article does not contain any studies with human or animal subjects
CRediT authorship contribution statement
Jihan Zhao: Conceptualization, Validation, Data curation, Writing – original draft. Sicheng Li: Validation, Data curation, Writing – original draft. Yuzhi Xu: Data curation. Nazir Ahmad: Writing – original draft. Bowen Kuang: Validation. Mengfan Feng: Validation. Ni Wei: Validation. Xiping Yang: Conceptualization, Writing – original draft, Writing – review & editing, Funding acquisition.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
We thank funding supports for current research from National Key R&D Program of China (2022YFD2301100), Guangxi Natural Science Foundation (GK AD20297064), National Natural Science Foundation of China (31901591) and the ‘One Hundred Person’ Project of Guangxi Province.
Footnotes
Supplementary data to this article can be found online at https://doi.org/10.1016/j.jare.2023.02.001.
Appendix A. Supplementary data
The following are the Supplementary data to this article:
References
- 1.Calsa T., Figueira A. Serial analysis of gene expression in sugarcane (Saccharum spp.) leaves revealed alternative C4 metabolism and putative antisense transcripts. Plant Mol Biol. 2007;63(6):745–762. doi: 10.1007/s11103-006-9121-z. [DOI] [PubMed] [Google Scholar]
- 2.Evans D.L., Joshi S.V. Complete chloroplast genomes of Saccharum spontaneum, Saccharum officinarum and Miscanthus floridulus (Panicoideae: Andropogoneae) reveal the plastid view on sugarcane origins. Syst Biodivers. 2016;14(6):548–571. [Google Scholar]
- 3.D’Hont A., Grivet L., Feldmann P., Glaszmann J., Rao S., Berding N. Characterisation of the double genome structure of modern sugarcane cultivars (Saccharum spp.) by molecular cytogenetics. Mol Gen Genet. 1996;250(4):405–413. doi: 10.1007/BF02174028. [DOI] [PubMed] [Google Scholar]
- 4.Zhang J., Zhang X., Tang H., Zhang Q., Hua X., Ma X., et al. Allele-defined genome of the autopolyploid sugarcane Saccharum spontaneum L. Nat Genet. 2018;50(11):1565–1573. doi: 10.1038/s41588-018-0237-2. [DOI] [PubMed] [Google Scholar]
- 5.Garsmeur O., Droc G., Antonise R., Grimwood J., Potier B., Aitken K., et al. A mosaic monoploid reference sequence for the highly complex genome of sugarcane. Nat Commun. 2018;9(1):1–10. doi: 10.1038/s41467-018-05051-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Riaño-Pachón D.M., Mattiello L. Draft genome sequencing of the sugarcane hybrid SP80-3280. F1000 Res. 2017;6 doi: 10.12688/f1000research.11859.1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Grivet L., D’Hont A., Roques D., Feldmann P., Lanaud C., Glaszmann J.C. RFLP mapping in cultivated sugarcane (Saccharum spp.): genome organization in a highly polyploid and aneuploid interspecific hybrid. Genetics. 1996;142(3):987–1000. doi: 10.1093/genetics/142.3.987. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Wang M., Li A.M., Liao F., Qin C.X., Chen Z.L., Zhou L., et al. Control of sucrose accumulation in sugarcane (Saccharum spp. hybrids) involves miRNA-mediated regulation of genes and transcription factors associated with sugar metabolism. GCB Bioenergy. 2022;14(2):173–191. [Google Scholar]
- 9.Rhoads A., Au K.F. PacBio sequencing and its applications. Genom Proteomics Bioinform. 2015;13(5):278–289. doi: 10.1016/j.gpb.2015.08.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Diao J., Yu X., Wang X., Fan Y., Wang S., Li L., et al. Full-length transcriptome sequencing combined with RNA-seq analysis revealed the immune response of fat greenling (Hexagrammos otakii) to Vibrio harveyi in early infection. Microb Pathog. 2020;149 doi: 10.1016/j.micpath.2020.104527. [DOI] [PubMed] [Google Scholar]
- 11.Kim E., Magen A., Ast G. Different levels of alternative splicing among eukaryotes. Nucleic Acids Res. 2007;35(1):125–131. doi: 10.1093/nar/gkl924. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Bedre R., Irigoyen S., Schaker P.D., Monteiro-Vitorello C.B., Da Silva J.A., Mandadi K.K. Genome-wide alternative splicing landscapes modulated by biotrophic sugarcane smut pathogen. Sci Rep. 2019;9(1):1–12. doi: 10.1038/s41598-019-45184-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Guo J., Ling H., Ma J., Chen Y., Su Y., Lin Q., et al. A sugarcane R2R3-MYB transcription factor gene is alternatively spliced during drought stress. Sci Rep. 2017;7(1):1–11. doi: 10.1038/srep41922. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Li Y., Mi X., Zhao S., Zhu J., Guo R., Xia X., et al. Comprehensive profiling of alternative splicing landscape during cold acclimation in tea plant. BMC Genomics. 2020;21(1):1–16. doi: 10.1186/s12864-020-6491-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Chen S., Zhou Y., Chen Y., Gu J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics. 2018;34(17):i884–i890. doi: 10.1093/bioinformatics/bty560. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Tardaguila M., De La Fuente L., Marti C., Pereira C., Pardo-Palacios F.J., Del Risco H., et al. SQANTI: extensive characterization of long-read transcript sequences for quality control in full-length transcriptome identification and quantification. Genome Res. 2018;28(3):396–411. doi: 10.1101/gr.222976.117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.He W., Zhang X., Lv P., Wang W., Wang J., He Y., et al. Full-length Transcriptome Reconstruction Reveals Genetic Differences in Hybrids of Oryza Sativa and Oryza Punctata With Different Ploidy and Genome Compositions. BMC Plant Biol. 2022;22(1):131. doi: 10.1186/s12870-022-03502-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Menzel P., Ng K.L., Krogh A. Fast and sensitive taxonomic classification for metagenomics with Kaiju. Nat Commun. 2016;7(1):1–9. doi: 10.1038/ncomms11257. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Simão F.A., Waterhouse R.M., Ioannidis P., Kriventseva E.V., Zdobnov E.M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015;31(19):3210–3212. doi: 10.1093/bioinformatics/btv351. [DOI] [PubMed] [Google Scholar]
- 20.Kim D., Langmead B., Salzberg S.L. HISAT: a fast spliced aligner with low memory requirements. Nat Methods. 2015;12(4):357–360. doi: 10.1038/nmeth.3317. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Priyam A., Woodcroft B.J., Rai V., Moghul I., Munagala A., Ter F., et al. Sequenceserver: a modern graphical user interface for custom BLAST databases. Mol Biol Evol. 2019;36(12):2922–2924. doi: 10.1093/molbev/msz185. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Trincado J.L., Entizne J.C., Hysenaj G., Singh B., Skalic M., Elliott D.J., et al. SUPPA2: fast, accurate, and uncertainty-aware differential splicing analysis across multiple conditions. Genome Biol. 2018;19(1):1–11. doi: 10.1186/s13059-018-1417-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Moriya Y., Itoh M., Okuda S., Yoshizawa A.C., Kanehisa M.K.A.A.S. an automatic genome annotation and pathway reconstruction server. Nucl Acids Res. 2007;35(suppl_2):W182–W185. doi: 10.1093/nar/gkm321. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Simões M.S., Ferreira S.S., Grandis A., Rencoret J., Persson S., Floh E.I.S., et al. Differentiation of tracheary elements in sugarcane suspension cells involves changes in secondary wall deposition and extensive transcriptional reprogramming. Front Plant Sci. 2020;11 doi: 10.3389/fpls.2020.617020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Barbazuk W.B., Fu Y., McGinnis K.M. Genome-wide analyses of alternative splicing in plants: opportunities and challenges. Genome Res. 2008;18(9):1381–1392. doi: 10.1101/gr.053678.106. [DOI] [PubMed] [Google Scholar]
- 26.Drummond I.A., Rohwer-Nutter P., Sukhatme V.P. The zebrafish egr1 gene encodes a highly conserved, zinc-finger transcriptional regulator. DNA Cell Biol. 1994;13(10):1047–1055. doi: 10.1089/dna.1994.13.1047. [DOI] [PubMed] [Google Scholar]
- 27.Wyman D., Mortazavi A. TranscriptClean: variant-aware correction of indels, mismatches and splice junctions in long-read transcripts. Bioinformatics. 2019;35(2):340. doi: 10.1093/bioinformatics/bty483. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Filichkin S.A., Priest H.D., Givan S.A., Shen R., Bryant D.W., Fox S.E., et al. Genome-wide mapping of alternative splicing in Arabidopsis thaliana. Genome Res. 2010;20(1):45–58. doi: 10.1101/gr.093302.109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Abdel-Ghany S.E., Hamilton M., Jacobi J.L., Ngam P., Devitt N., Schilkey F., et al. A survey of the sorghum transcriptome using single-molecule long reads. Nat Commun. 2016;7(1):1–11. doi: 10.1038/ncomms11706. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Cuadrado A., Acevedo R., Moreno Díaz de la Espina S., Jouve N., De La Torre C. Genome remodelling in three modern S. officinarum×S. spontaneum sugarcane cultivars. J Exp Botany. 2004;55(398):847–854. doi: 10.1093/jxb/erh093. [DOI] [PubMed] [Google Scholar]
- 31.D’Hont A. Unraveling the genome structure of polyploids using FISH and GISH; examples of sugarcane and banana. Cytogenet Genome Res. 2005;109(1–3):27–33. doi: 10.1159/000082378. [DOI] [PubMed] [Google Scholar]
- 32.Duque P. A role for SR proteins in plant stress responses. Plant Signal Behav. 2011;6(1):49–54. doi: 10.4161/psb.6.1.14063. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Wang J., Nayak S., Koch K., Ming R. Carbon partitioning in sugarcane (Saccharum species) Front Plant Sci. 2013;4:201. doi: 10.3389/fpls.2013.00201. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Wang Y., Bräutigam A., Weber A.P., Zhu X.-G. Three distinct biochemical subtypes of C4 photosynthesis? A modelling analysis. J Exp Bot. 2014;65(13):3567–3578. doi: 10.1093/jxb/eru058. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Hatch M., Kagawa T., Craig S. Subdivision of C4-pathway species based on differing C4 acid decarboxylating systems and ultrastructural features. Funct Plant Biol. 1975;2(2):111–128. [Google Scholar]
- 36.Tsuchida H., Tamai T., Fukayama H., Agarie S., Nomura M., Onodera H., et al. High level expression of C4-specific NADP-malic enzyme in leaves and impairment of photoautotrophic growth in a C3 plant, rice. Plant Cell Physiol. 2001;42(2):138–145. doi: 10.1093/pcp/pce013. [DOI] [PubMed] [Google Scholar]
- 37.Wang M., Li A.M., Liao F., Qin C.X., Chen Z.L., Zhou L., et al. Control of sucrose accumulation in sugarcane (Saccharum spp. hybrids) involves miRNA-mediated regulation of genes and transcription factors associated with sugar metabolism. GCB Bioenergy. 2022 [Google Scholar]
- 38.Hoang N.V., Furtado A., Thirugnanasambandam P.P., Botha F.C., Henry R.J. De novo assembly and characterizing of the culm-derived meta-transcriptome from the polyploid sugarcane genome based on coding transcripts. Heliyon. 2018;4(3):e00583. doi: 10.1016/j.heliyon.2018.e00583. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Souza G.M., Van Sluys M.-A., Lembke C.G., Lee H., Margarido G.R.A., Hotta C.T., et al. Assembly of the 373k gene space of the polyploid sugarcane genome reveals reservoirs of functional diversity in the world’s leading biomass crop. GigaScience. 2019;8(12):giz129. doi: 10.1093/gigascience/giz129. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Haas B.J., Dobin A., Li B., Stransky N., Pochet N., Regev A. Accuracy assessment of fusion transcript detection via read-mapping and de novo fusion transcript assembly-based methods. Genome Biol. 2019;20(1):1–16. doi: 10.1186/s13059-019-1842-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Yan H., Zhou H., Luo H., Fan Y., Zhou Z., Chen R., et al. Characterization of full-length transcriptome in Saccharum officinarum and molecular insights into tiller development. BMC Plant Biol. 2021;21(1):1–12. doi: 10.1186/s12870-021-02989-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Chen R., Fan Y., Zhou H., Mo S., Zhou Z., Yan H., et al. Global transcriptome changes of elongating internode of sugarcane in response to mepiquat chloride. BMC Genomics. 2021;22(1):1–15. doi: 10.1186/s12864-020-07352-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Hoang N.V., Furtado A., Mason P.J., Marquardt A., Kasirajan L., Thirugnanasambandam P.P., et al. A survey of the complex transcriptome from the highly polyploid sugarcane genome using full-length isoform sequencing and de novo assembly from short read sequencing. BMC Genomics. 2017;18(1):1–22. doi: 10.1186/s12864-017-3757-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Marquez Y., Brown J.W., Simpson C., Barta A., Kalyna M. Transcriptome survey reveals increased complexity of the alternative splicing landscape in Arabidopsis. Genome Res. 2012;22(6):1184–1195. doi: 10.1101/gr.134106.111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Wang G., Zhong M., Wang J., Zhang J., Tang Y., Wang G., et al. Genome-wide identification, splicing, and expression analysis of the myosin gene family in maize (Zea mays) J Exp Bot. 2014;65(4):923–938. doi: 10.1093/jxb/ert437. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Grof C.P., Campbell J.A. Sugarcane sucrose metabolism: scope for molecular manipulation. Funct Plant Biol. 2001;28(1):1–12. [Google Scholar]
- 47.Margarido G.R.A., Correr F.H., Furtado A., Botha F.C., Henry R.J. Limited allele-specific gene expression in highly polyploid sugarcane. Genome Res. 2022;32(2):297–308. doi: 10.1101/gr.275904.121. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Sales C.R., Ribeiro R.V., Hayashi A.H., Marchiori P.E., Silva K.I., Martins M.O., et al. Flexibility of C4 decarboxylation and photosynthetic plasticity in sugarcane plants under shading. Environ Exp Bot. 2018;149:34–42. [Google Scholar]
- 49.Sawitri W.D., Afidah S.N., Nakagawa A., Hase T., Sugiharto B. Identification of UDP-glucose binding site in glycosyltransferase domain of sucrose phosphate synthase from sugarcane (Saccharum officinarum) by structure-based site-directed mutagenesis. Biophys Rev. 2018;10(2):293–298. doi: 10.1007/s12551-017-0360-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Worrell A.C., Bruneau J.-M., Summerfelt K., Boersig M., Voelker T.A. Expression of a maize sucrose phosphate synthase in tomato alters leaf carbohydrate partitioning. Plant Cell. 1991;3(10):1121–1130. doi: 10.1105/tpc.3.10.1121. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Schäfer W.E., Rohwer J.M., Botha F.C. Protein-level expression and localization of sucrose synthase in the sugarcane culm. Physiol Plant. 2004;121(2):187–195. doi: 10.1111/j.0031-9317.2004.00316.x. [DOI] [PubMed] [Google Scholar]
- 52.Bocock P.N., Morse A.M., Dervinis C., Davis J.M. Evolution and diversity of invertase genes in Populus trichocarpa. Planta. 2008;227(3):565–576. doi: 10.1007/s00425-007-0639-3. [DOI] [PubMed] [Google Scholar]
- 53.Yadav U.P., Ivakov A., Feil R., Duan G.Y., Walther D., Giavalisco P., et al. The sucrose–trehalose 6-phosphate (Tre6P) nexus: specificity and mechanisms of sucrose signalling by Tre6P. J Exp Bot. 2014;65(4):1051–1068. doi: 10.1093/jxb/ert457. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Koch K. Sucrose metabolism: regulatory mechanisms and pivotal roles in sugar sensing and plant development. Curr Opin Plant Biol. 2004;7(3):235–246. doi: 10.1016/j.pbi.2004.03.014. [DOI] [PubMed] [Google Scholar]
- 55.Botha F.C., Black K.G. Sucrose phosphate synthase and sucrose synthase activity during maturation of internodal tissue in sugarcane. Funct Plant Biol. 2000;27(1):81–85. [Google Scholar]
- 56.Vickers J., Grof C., Bonnett G., Jackson P., Morgan T. Effects of tissue culture, biolistic transformation, and introduction of PPO and SPS gene constructs on performance of sugarcane clones in the field. Aust J Agr Res. 2005;56(1):57–68. [Google Scholar]
- 57.McCormick A., Cramer M., Watt D. Differential expression of genes in the leaves of sugarcane in response to sugar accumulation. Trop Plant Biol. 2008;1(2):142–158. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.








