Skip to main content
Journal of Advanced Research logoLink to Journal of Advanced Research
. 2023 May 18;58:13–30. doi: 10.1016/j.jare.2023.05.004

Transcriptional and translational landscape fine-tune genome annotation and explores translation control in cotton

Ghulam Qanmber a,b,1, Qi You c,1, Zhaoen Yang a,b,1, Liqiang Fan b, Zhibin Zhang b, Mao Chai b, Baibai Gao a, Fuguang Li a,b,, Zuoren Yang a,b,
PMCID: PMC10982868  PMID: 37207930

Graphical abstract

graphic file with name ga1.jpg

Keywords: Translatome, Transcriptome, ORFs, LncRNA, Translational regulation, Cotton fiber

Highlights

  • The transcriptome and translatome of 8 tissues of cotton generated large datasets.

  • De novo transcriptome assembly and ribosome profiling fine-tune cotton genome annotation.

  • Unannotated translational events including sORFs (uORFs and dORF), and lncRNAs were identified.

  • Omics integrated analysis of normal fiber ZM24 and short fiber pag1 explored fiber-specific genes.

  • GhKCS6 overexpression showed greater fiber length than the control and validated our approach.

Abstract

Introduction

The unavailability of intergenic region annotation in whole genome sequencing and pan-genomics hinders efforts to enhance crop improvement.

Objectives

Despite advances in research, the impact of post-transcriptional regulation on fiber development and translatome profiling at different stages of fiber growth in cotton (G. hirsutum) remains unexplored.

Methods

We utilized a combination of reference-guided de novo transcriptome assembly and ribosome profiling techniques to uncover the hidden mechanisms of translational control in eight distinct tissues of upland cotton.

Results

Our study identified P-site distribution at three-nucleotide periodicity and dominant ribosome footprint at 27 nucleotides. Specifically, we have detected 1,589 small open reading frames (sORFs), including 1,376 upstream ORFs (uORFs) and 213 downstream ORFs (dORFs), as well as 552 long non-coding RNAs (lncRNAs) with potential coding functions, which fine-tune the annotation of the cotton genome. Further, we have identified novel genes and lncRNAs with strong translation efficiency (TE), while sORFs were found to affect mRNA transcription levels during fiber elongation. The reliability of these findings was confirmed by the high consistency in correlation and synergetic fold change between RNA-sequencing (RNA-seq) and Ribosome-sequencing (Ribo-seq) analyses. Additionally, integrated omics analysis of the normal fiber ZM24 and short fiber pag1 cotton mutant revealed several differentially expressed genes (DEGs), and fiber-specific expressed (high/low) genes associated with sORFs (uORFs and dORFs). These findings were further supported by the overexpression and knockdown of GhKCS6, a gene associated with sORFs in cotton, and demonstrated the potential regulation of the mechanism governing fiber elongation on both the transcriptional and post-transcriptional levels.

Conclusion

Reference-guided transcriptome assembly and the identification of novel transcripts fine-tune the annotation of the cotton genome and predicted the landscape of fiber development. Our approach provided a high-throughput method, based on multi-omics, for discovering unannotated ORFs, hidden translational control, and complex regulatory mechanisms in crop plants.

Introduction

Proteins play a central role in several biological processes [1] with mRNA translation serving as the ultimate executor of the proteome. Besides gene expression, translational regulation plays a crucial role in determining cell structure, activity, and functions across all living organisms [2], although our understanding of its role in plant cells is limited. The process of genome-transcribed transcriptome-translated proteome involves a series of complex steps, including mRNA splicing [3], localization [4], translation [5], and protein modification [6]. Despite the crucial role of the proteome, research in proteomics lags behind that of nucleic acids. While the transcriptome serves as the link between nucleotide sequences and phenotype, the proteome is linked directly to phenotype. However, proteomic variations are not necessarily linked to transcriptomic variation due to the complex regulatory mechanisms that buffer against such variation [7], [8], [9]. Thus, accurate quantification and high-resolution mapping of thousands of expressed transcripts are necessary. To this end, ribosome profiling, which involves the deep sequencing of mRNA fragments, allows for the quantification of abundance and monitoring of proteomic variations at a constant rate [10], [11].

High-throughput techniques such as ribosome sequencing (Ribo-seq) can be used to analyze the translatome, enabling the exploration of mRNA fragments known as ribosome footprints [10], [12]. The sequencing of these footprints allows for the inference of ribosome movement codon by codon, which provides information on the position and quantity of ribosomes on a specific transcript and allows for the identification of actively translated ORFs [13]. Among these newly identified ORFs, sORFs, including uORFs, and dORFs can be detected [14], [15]. Furthermore, ribosomes translate mRNA at a three-nucleotide periodicity, which facilitates the identification of previously unannotated regions [16]. Numerous regulatory uORFs have been discovered in protein-coding genes across a variety of organisms, including plants, yeast, mice, and humans [16], [17], [18]. Previous research has suggested that uORFs regulate downstream ORF translation [19]. lncRNAs, which are at least 200 base pairs (bp) in length, perform various regulatory functions related to chromatin modification, transcriptional and post-transcriptional regulation, and de novo protein evolution [20], [21], [22]. Therefore, ribosome sequencing represents a highly attractive means of detecting translated ORFs in an unbiased fashion across the entire genome.

Translatome profiling has previously been utilized to understand translational events, plant development, hormone responses, stress responses, chloroplast differentiation, and the biogenesis of small interfering RNAs in plants [11], [23], [24], [25], [26]. Notably, newly identified translation events, such as P-sites (peptidyl), ORFs, and uORFs in annotated non-coding RNAs, have been detected in Arabidopsis [16]. Of the three tRNA (transfer RNA) binding sites (A, P, and E) within ribosomes, the P-site is the second binding site for tRNA, with the A-site (aminoacyl) and E-site (exit) being the first and third binding sites, respectively. During translation, the P-site holds tRNA, which is linked to the growing polypeptide chain, and upon encountering a stop codon, the peptidyl-tRNA bond of the tRNA located in the P-site is cleaved, releasing a newly synthesized protein [27]. Thus far, comprehensive identification of translated ORFs have been performed in Arabidopsis [16], [24], tomato [2], and maize [11], [26].

Cotton is the most important cash crop worldwide and serves as a model organism for investigating polyploidization and cell elongation. Many genomic resources have been developed for cotton, which has greatly facilitated functional genomics research [28], [29], [30]. Despite significant progress resulting from whole genome sequencing and pan-genomics, current annotations of the cotton genome only include putative protein-encoding genes. The intergenic region accounts for a significant portion of the genome and contains uORFs and lncRNAs. This missing information has limited our ability to enhance cotton crop improvement and productivity. However, unlike whole genome sequencing, high-throughput translatome and transcriptome analyses can reveal the hidden translational regulatory mechanisms present in both the genomic and intergenic regions, allowing for the discovery of novel transcripts, including ORFs, sORFs (uORFs and dORFs), and non-coding ORFs (ncORFs/lncRNAs), which can serve to fine-tune genome annotation. Transcriptome and translatome analyses provide unprecedented opportunities to gain insight into the mechanisms underlying plant growth, development, and resistance to stresses. In cotton, the fiber is a natural single cell that can grow up to 60 mm in length in G. barbadense. Nonetheless, the mechanisms underlying fiber development are highly complex. In addition to plant growth and development, numerous phytohormonal pathways and signaling mechanisms have been proposed as key factors in cotton fiber development. Nevertheless, nearly all existing studies have focused on the post-transcriptional contributions to cotton plant growth and fiber development. Increasing studies suggest that the translation processes that decode genetic information into functional proteins are also critical in plant development. However, the translational profile of cotton has remained largely unexplored.

GhPAG1 (PAGODA1 gene), is a homolog of CYP734A1, a gene involved in brassinosteroids biosynthesis, and is responsible for regulating endogenous brassinosteroids. In the pag1 mutant, plant architecture is stunted and fiber length is reduced. Nevertheless, applying exogenous brassinosteroids partially restores plant height and fiber length, making the pag1 mutant an excellent genetic material for investigating cotton fiber growth and development [31], [32], [33]. In cotton, several genes have been identified that regulate fiber elongation and ultimately enhance fiber length. The suppression of the brassinosteroids biosynthesis gene STEROID 5α-REDUCTASE DE-ETOLIATED2 (GhDET2) significantly reduces fiber length, while overexpression of genes related to brassinosteroids biosynthesis or signaling leads to longer fibers [34], [35]. GhBES1.4 regulate the expression of GhKCS10_At (a 3-ketoacyl-CoA synthase 10) which regulates the biosynthesis of endogenous very-long-chain fatty acids (VLCFAs) and promotes fiber elongation [36]. The overexpression of GhBES1.4 promoted fiber elongation in cotton. Conversely, silencing GhBES1.4 resulted in a decrease in fiber length [37]. GhBES1 directly activates the transcription of GhCERP, a protein involved in cell elongation. GhCERP serves as a downstream target of GhBES1 and plays a crucial role in transmitting the brassinosteroid signaling mediated by GhBES1 to its target gene, GhEXPA3-1 [38].

In this study, we have characterized the transcriptome and translatome landscape of the cotton plant. We performed strand-specific RNA-seq and Ribo-seq in parallel, examining root, stem, leaf, and flower tissues from ZM24 cotton, as well as ovule (0 DPA), 5, 10, and 20 DPA (days post anthesis) fiber tissues from both ZM24 and pag1 plants (a short fiber BR deficient mutant in the ZM24 background). By using an integrated approach combining ribosome profiling and de novo transcriptome assembly, we were able to discover newly identified (unannotated) transcripts and map the translational landscape across various tissues (particularly fiber tissues). Our quantification and mapping of ribosome footprints allowed us to explore unannotated transcripts, identify new regulatory elements, improve the annotation of the entire genome, and shed light on the contributions of translational regulation during cotton fiber development.

Materials and methods

Plant materials, and lysate preparation for RNA-seq, and Ribo-seq analysis

Seeds of ZM24 cotton plants (normal fiber) and pag1 plants (a short fiber cotton mutant) were sown in the soil in a glasshouse under standard cultural practices, with a light/dark cycle of 14/10 h and temperatures ranging from 28 to 34 °C and 24–27 °C, respectively. For ZM24 plants, we collected two samples with three biological replicates each for root, stem, leaf, and flower tissues. Additionally, we collected three samples with three biological replicates each for ovule (0 DPA), 5, 10, and 20 DPA (days post-anthesis) fiber tissues from both ZM24 and pag1 cotton plants. Fiber tissues from both ZM24 and pag1 plants were excised from developing flower buds or cotton bolls at specific time points; days post-anthesis are denoted as ovule (0 DPA). All harvested samples were immediately frozen in liquid nitrogen.

The frozen tissues were ground, and approximately 0.4 g of tissue powder from each sample was suspended in 1.2 ml of lysis buffer, which contained 100 mM Tris-HCl [pH 8], 20 mM MgCl2, 40 mM KCl, 2% [v/v] polyoxyethylene [10] tridecyl ether [Sigma, P2393], 1 mM dithiothreitol, 100 mg mL21 cycloheximide, 1% [w/v] sodium deoxycholate [Sigma, D6750], and 10 units mL21 DNase I [Epicenter, D9905K], as described previously [2]. The samples were then incubated on ice with moderate shaking and spun at 20,000g for 10 min at 4 °C. The supernatant was then transferred to new tubes and aliquoted into 100 μl portions. The lysate was flash-frozen in liquid nitrogen and stored at −80 °C.

RNA purification, and construction of libraries

We added 100 ml of 10% (w/v) SDS to 10 ml of aliquoted lysate and then extracted RNA with a length of more than 200 nucleotides using the Zymo RNA Clean & Concentrator kit (Zymo Research, R1017). The RNA integrity of the obtained RNAs was evaluated using a Bioanalyzer (Agilent). To remove eRNAs (ribosomal RNAs), we utilized the RiboZero Plant Leaf kit (Illumina, MRZPL1224). Starting with 100 ng of rRNA-depleted RNA, we fragmented the RNA to approximately 200 bp based on RNA integrity and created strand-specific libraries for each sample using the NEBNext Ultra Directional RNA Library Prep Kit (New England Biolabs, E7420S). These libraries were pooled at equal molarity and sequenced on one lane of Hi-Seq 4000 using SE-50 sequencing.

Ribosome footprinting, and construction of the library

To optimize the ribosome profiling method for cotton, we made modifications to a previously described protocol. RNA concentration was quantified using the Qubit RNA HS assay (Invitrogen, Q32852), and 40 ml of each sample was treated with 100 units of nuclease (TruSeq Mammalian Ribo Profile Kit, Illumina, RPHMR12126) by moderate shaking for one hour at room temperature. The reaction was stopped by moving the lysate on ice, and 15 ml of SUPERase-IN (Invitrogen, AM2696) was added. Ribosomes were isolated using Illustra MicroSpin S-400 HR columns (GE Healthcare, 27514001), and RNA with a length of more than 17 nucleotides and less than 200 nucleotides was purified using Zymo Research, R1017, and Zymo Research, R1015, respectively. rRNAs were depleted using the RiboZero Plant Leaf kit (Illumina, MRZPL1224), and the remaining RNA was separated using 15% (w/v) Tris-borate-EDTA-urea PAGE (Invitrogen, EC68852BOX). Gel slices with a length of 28–30 nucleotides were extracted, and ribosome footprints were eluted overnight. Libraries were generated using the TruSeq Mammalian Ribo Profile Kit, and nine-cycle PCR amplification was performed for each library sample. The library samples were then pooled equivalently and sequenced on two lanes of Hi-Seq 4000 using SE-50 sequencing.

Data analysis

The genome sequences and annotation files for ZM24 cotton were obtained from CottonFGD (https://www.cottonfgd.org/) [39]. The adaptor sequences (AGATCGGAAGAGCACACGTCT) were removed from Ribo-seq data using FASTX_clipper v0.0.14 (https://hannonlab.cshl.edu/fastx_toolkit/) to obtain the clean reads. We removed the repeat sequences, small nucleolar RNA, small nuclear RNA, tRNA, and rRNA sequences from both RNA-seq and Ribo-seq datasets using Bowtie2 v2.3.4.1 [40]. ZM24_genome_assembly_V1.0 annotation was used to mine small nucleolar RNA, small nuclear RNA, tRNA, and rRNA sequences. Finally, the clean datasets were used for genome mapping.

Detecting ORFs, and sequencing data alignment

We first mapped the clean reads of RNA-seq and Ribo-seq datasets to the ZM24_genome_assembly_V1.0 sequences using Hisat2 [41]. For RNA-seq data, the mapped reads were used to calculate read count and relative expression values using stringtie v2.1.5 [42]. We then combined the gtf files of ZM24_genome_annotation_V1.0 and de novo predicted lncRNAs for genome annotation.

Next, we used RiboTaper_v1.3 [43] for detecting P-sites and ORFs. We retrieved the annotation files and offset parameters such as the position of inferred P-site from RiboTaper and created RiboTaper annotation files using the create_annotation_files.bash function. We determined the offset parameters by executing the create_metaplots.bash function. Finally, we used the Ribotaper.sh script with RiboTaper annotation files, offset parameters, and bam files of RNA-seq and Ribo-seq datasets to identify P-sites and ORFs. RiboTaper calculated three types of rules, and after evaluation, we selected the “best_periodicity” method for verifying ORFs. To identify the Kozak consensus sequences and translation initiation sites in different types of ORFs, we analyzed nine nucleotide sequence lengths for the Kozak consensus sequence. These sequences were extracted from four nucleotides upstream to four nucleotides downstream of the ORF start sites, which were annotated by RiboTaper_v1.3 [43].

Identification of lncRNAs

The identification of lncRNAs was performed using a six-step method, previously described [20]. Initially, adaptors and low-quality bases were eliminated from all RNA datasets, and clean reads were mapped to ZM24_genome_assembly_V1.0 by using Hisat2 [41]. Subsequently, the de novo assembly of lncRNAs was conducted through Stringtie v2.1.5 [42]. To extract all transcript sequences, we utilized the gffread tool [44]. Subsequently, we used LGC, CPC, and CPAT to predict non-coding transcripts. The default parameters were used for LGC and CPC, while CPAT employed a list of reference non-coding sequences as a testing dataset [20] and a cutoff of 0.35 was set. Finally, we compared the list of new transcripts with a length greater than 200 bp to the results obtained from the three aforementioned tools. The ncORFs analysis only considered lncRNAs with FPKM values greater than 0 in at least one sample.

ORF quantification, DEGs, and GO analysis

To quantify gene expression, the sequence reads mapping to coding sequences (CDS), including newly identified translated lncRNAs, were counted using Ballgown [41]. The quantification of uORFs was performed following the previously described method [17]. To ensure the comparison and integration of RNA-seq and Ribo-seq data, we conducted the quantification of DEGs between ZM24 and pag1 cotton plants using DESeq2 v1.12.4 [45]. The read normalization and size estimation were performed simultaneously for RNA-seq and Ribo-seq data.

To assess the co-regulation among translated genes, primary ORFs, sORFs, lncRNA, and mRNA, Spearman correlations were computed. Hierarchical clustering was performed using the correlation matrix and heatmap2 from the R library gplots v3.0.1 (https://cran.r-project.org/web/packages/gplots/index.html). Correlation coefficients were calculated using the cor() function in R script to estimate Pearson and Spearman correlation coefficients for pairwise comparisons [46]. Additionally, functional annotation was assigned by conducting GO and KEGG pathway analysis through clusterProfiler 4.0 [47]. The Gene Ontology (GO) analysis utilized the GO annotation of ZM24, which was downloaded from CottenGen (https://www.cottongen.org/). The co-expression network was constructed using data for lncRNA prediction. The network was built using the PCC and MR (Mutual Rank) algorithm [48], as previously described [49] with an MR value threshold of less than 100 for co-expression gene pairs. The MR values were classified into four levels: Top, L1, L2, and L3, based on their values. The Top-level included the top 3 co-expressed gene lists for each gene, while L1, L2, and L3 were defined as MR values less than 5, less than 30, and less than 100, respectively.

Vectors construction, cotton transformation, and phenotypic analysis

To construct overexpression (OE) and RNA interference (RNAi) lines, the genetic material of cotton cultivar ZM24 was used. The coding sequences for GhKCS6 were obtained from CottonGen (https://www.cottongen.org/) and gene-specific primers (Table S1) were used to amplify the gene from the ZM24 cDNA library. For generating GhKCS6 overexpression lines, the amplified fragment was inserted into the E6-driven pCAMBIA-2300 vector (E6-GhKCS6-OE). To generate GhKCS6-RNAi lines, the GhKCS6-RNAi sequences were amplified from the ZM24 cDNA library using gene-specific primers and inserted into a pBI121 vector. The transformed vectors, E6-GhKCS6-OE and GhKCS6-RNAi, were then transferred to Agrobacterium tumefaciens LBA4404. The cotton transformation was performed as previously described [50].

The transgenic plants were cultured under standard conditions in the field for the T3 generation, and their phenotypes were observed. For fiber length analysis, mature fibers were collected from the bolls on the mainstem. Fiber samples were collected from 10 plants (30 bolls each) of each E6-GhKCS6-OE, GhKCS6-RNAi, and wild-type (ZM24) lines. Three independent lines of E6-GhKCS6-OE and GhKCS6-RNAi plants were used for the phenotypic observation and subsequent analysis. The fiber was straightened with a comb, and its length was measured using a ruler.

We performed qRT-PCR analysis to confirm successful overexpression and silencing of the GhKCS6 gene in E6-GhKCS6-OE and GhKCS6-RNAi plants, respectively. Total RNA was extracted from 10 DPA fiber tissues using RNA prep Pure Plant Kit (TIANGEN, Beijing, China) and reverse-transcribed using a PrimeScript RT reagent kit (Takara, Dalian, China). The LightCycler 480 system (Roche Diagnostics, Mannheim, Germany) was used for the RT-qPCR assay with specific primers for GhKCS6 (Table S1). The internal control in PCR experiments was GhHis3 (GenBank accession number AF024716). The 2 − ΔCT method was used to calculate the relative expression [51], and each RT-qPCR was performed in triplicates with mean and standard deviation calculated as results.

Results

Establishment of experiment and overview of RNA-seq and Ribo-seq data

To obtain a comprehensive profile of the transcriptome and translatome in cotton, we utilized two different cotton genotypes, ZM24 (a G. hirsutum cultivar) and short fiber pag1 (a BR-deficient mutant in the ZM24 background that exhibits a shorter fiber phenotype) [31]. We conducted strand-specific RNA-seq and Ribo-seq analyses in parallel on various tissues, including root, stem, leaf, flower, ovule (0 DPA), and 5, 10, and 20 DPA (days post-anthesis) fiber tissues of ZM24, as well as 5, 10, and 20 DPA fiber tissues of pag1 plants. RNA-seq analysis provided information on transcript identity and abundance, while Ribo-seq analysis quantified and mapped the occupancy of ribosomes on specific transcripts [52]. This approach allowed for the large-scale, comprehensive profiling of the transcriptome and translatome in cotton. In this study, we followed the protocol previously described [16] with slight modifications. Initially, paired-end 150 bp (PE150) RNA-seq analysis was conducted, and a reference-guided de novo transcriptome assembly was generated to detect any transcripts that were not included in the ZM24_genome_assembly_V1.0 annotation. Secondly, to ensure a comparable resolution and capture ribosome footprint reads with a wide length distribution, a higher amount of RNase I was used during the Ribo-seq analysis. Consequently, an ORF finding tool, RiboTaper_v1.3 [43] was utilized to map transcript regions from both annotated and previously unannotated transcripts.

The objective of this study was to identify novel transcripts that play important regulatory roles in plant growth and cotton fiber development and to refine the annotation of the cotton genome. To achieve this, we generated a reference-guided transcriptome assembly using RNA-seq data from the root, stem, leaf, flower, ovule (0 DPA), and fiber tissues from three different developmental stages (5, 10, and 20 DPA) of both ZM24 and pag1 cotton. By analyzing the cotton translatome, we identified several novel transcripts and complex regulators of cotton fiber development, which helped to fine-tune the annotation of the cotton genome. Our RNA-seq and Ribo-seq analyses involved the statistical analysis of DEGs in fiber versus non-fiber tissues, as well as DEGs in 10 DPA and 20 DPA fiber tissues. We also examined the distribution of novel protein-coding genes, lncRNAs, and sORFs (uORFs and dORFs) on each chromosome of the At and Dt sub-genomes of cotton (Fig. 1 and Table 1). We used the short fiber mutant pag1 to identify coding and non-coding elements, including lncRNAs and sORFs. Our analysis revealed a total of 2,852 DEGs in fiber versus non-fiber tissues, 504 DEGs in 10 DPA fiber, and 145 DEGs in 20 DPA fiber that overlapped in RNA-seq DEGs and Ribo-seq DEGs. Additionally, we identified 71,526 protein-coding genes, 1,376 uORFs, 213 dORFs, and 552 lncRNAs with ncORFs distributed across all chromosomes of the At and Dt sub-genomes of cotton. We added all of the newly identified transcripts to the cotton genome, which helped to fine-tune the genome annotation.

Fig. 1.

Fig. 1

Circos diagram representing the statistics of identified novel transcripts in the cotton genome. Overlapped DEGs of Ribo-seq DEGs and RNA-seq DEGs in (I) fiber versus non-fiber tissues, (II) 20 DPA fiber, and (III) 10 DPA fiber, and (IV) dORFs density, (V) uORFs density, (VI) lncRNAs with ncORFs, and (VII) protein-coding genes. during RNA-seq and Ribo-seq data analysis.

Table 1.

Distribution of identified novel transcripts on the chromosomes of At and Dt sub-genome of cotton.

Features Chromosomes in At and Dt sub-genomes of cotton
A01/D01 A02/D02 A03/D03 A04/D04 A05/D05 A06/D06 A07/D07 A08/D08 A09/D09 A10/D10 A11/D11 A12/D12 A13/D13
Coding genes 2543/2449 2128/2617 2486/1918 1741/2159 4488/3919 2433/2453 2607/2500 2821/2828 2746/2627 2646/2824 3782/3695 2958/2921 2677/2560
dORFs 9/3 6/11 5/7 5/4 11/8 6/10 10/5 6/11 5/9 9/6 15/16 10/8 8/10
ncORFs 28/22 17/19 23/5 18/16 35/25 12/28 24/16 23/15 16/20 26/23 28/29 17/19 26/22
uORFs 39/45 40/59 38/47 30/36 103/79 50/36 52/40 51/54 60/46 42/48 81/84 59/63 52/42
10DPA (Ribo-RNA seq DEGs) 16/25 15/15 23/17 13/20 37/31 14/16 11/14 16/18 15/17 20/25 24/28 19/18 15/22
20 DPA (Ribo-RNA seq DEGs) 5/10 5/7 4/3 4/6 11/9 2/4 1/2 7/8 8/8 1/3 6/5 4/9 7/16
Fiber vs non-fiber (Ribo-RNA seq DEGs) 117/114 89/148 110/83 53/84 191/161 64/72 87/94 119/117 82/89 83/114 166/143 120/124 106/122
Expressed genes 1866/1822 1559/2013 1881/1486 1200/1623 3587/3141 1821/1890 2009/2012 2213/2220 2082/2079 1929/2071 2833/2920 2294/2330 2037/1986
Expressed Ratios 0.73/0.74 0.73/0.77 0.76/0.77 0.69/0.75 0.80/0.80 0.75/0.77 0.77/0.80 0.78/0.79 0.76/0.79 0.73/0.73 0.75/0.79 0.78/0.80 0.76/0.78

We investigated the sub-genome-biased expression of key regulators in cotton fiber development. To do so, we calculated the ratios and numbers of expressed coding genes (FPKM > 1 in at least one sample) on the At and Dt sub-genomes, which were approximately 70%. To investigate the sub-genome-biased expression, we conducted a Chi-squared Test of ncORFs and DEGs between the At and Dt sub-genomes. Our results showed a significant difference in the DEGs of fiber-specific genes between the sub-genomes, indicating a sub-genome-biased phenomenon (Table S2). The numbers of DEGs in 10 DPA fibers and 20 DPA fibers were also markedly different between the At and Dt sub-genomes. For instance, there were 238 DEGs on the At sub-genome in the 10 DAP fiber, while there were only 65 DEGs on the Dt sub-genome. The Chi-squared Test confirmed that this difference was highly significant (Table S3). Regarding the appearance of sORFs, there was no significant difference between the sub-genomes, as the p-values of uORFs and dORFs were both greater than 0.05 (Table S4). These results showed that there is a sub-genome-biased expression of key regulators of cotton fiber development, while sORFs do not exhibit this phenomenon. These findings suggest that sub-genome-biased expression is a critical factor in the regulation of cotton fiber development.

Footprints and comparison of transcriptome and translatome profiling

To characterize the length of ribosome footprints, we mapped Ribo-seq reads to cytosolic mRNAs using Bowtie2 [53]. We found that the dominant ribosome footprint size was 27 nucleotides, which was consistent with observations from other tissues, such as root, stem, flower, ovule (0 DPA), 5, 10, and 20 DPA fiber tissues of ZM24 and 5 and 10 DPA fiber tissues of pag1 plants, except for leaf tissues of ZM24 and 20 DPA fiber tissues of pag1 plants (Fig. 2 and Fig. S1). Unlike tomatoes and Arabidopsis, where the canonical ribosome footprint size is 28 nucleotides, the footprint in cotton was one nucleotide shorter. Additionally, Ribo-seq reads in cotton were completely mapped with annotated coding sequences but not in Arabidopsis [10], [16], [54]. Based on the location of Ribo-seq reads on the genome, we classified them into three categories: consensus CDS (CCDS) region, exon region (5′UTR and 3′UTR), and nonCDS region (lncRNAs). The fact that the dominant ribosome footprint size was 27 nucleotides and Ribo-seq reads were completely mapped with annotated CCDSs indicated the validity of our datasets compared to other plant species.

Fig. 2.

Fig. 2

Distribution of ribosome footprints in read length. Bar plots of dominant ribosome footprint distribution in the first sample of root, stem, leaf, flower, and ovule (0 DPA) of ZM24 plant and 5, 10, and 20 DPA fiber tissues of ZM24 and pag1 plants. The suffix (1) in the bar plots indicates the first sample of these tissues.

To assess the similarity between transcriptomes, we calculated the Spearman correlation coefficient for three samples of the ovule, 5, 10, and 20 DPA fiber tissues from ZM24 and pag1 plants, as well as two samples each of root, stem, leaf, and flower tissues from ZM24 cotton plants. The results revealed that stem tissues of ZM24 had the highest correlation coefficient (0.97), while the root and 5 and 10 DPA fiber tissues of ZM24 cotton had the lowest correlation coefficient (0.75) (Fig. S2A). Similarly, the correlation coefficient among translatome of all samples showed the highest correlation (0.91) between 10 DPA fiber of ZM24 and the lowest (0.33) between stem and 20 DPA fiber tissues of ZM24 cotton (Fig. S2B). Overall, both transcriptome and translatome datasets demonstrated a positive correlation across all tissues. We created a heatmap between the transcriptome and translatome datasets for all tissues and samples and calculated the consistency in repeats. The 5, 10, and 20 DPA fiber tissues of ZM24 and pag1 cotton were clustered together, exhibiting greater consistency across repeats, and demonstrating the specificity of fiber stages (Fig. S2C).

We generated a scatterplot to assess the correlation between the transcriptome and translatome datasets for each tissue and sample, including two samples each of root, stem, leaf, and flower, as well as three samples each of the ovule (0 DPA) of ZM24 plants and 5, 10, and 20 DPA fiber tissues of both ZM24 and pag1 plants. The scatterplot revealed a positive correlation between RNA-seq and Ribo-seq datasets for ZM24 cotton plants, with an R2 value of 0.50 for root, 0.08 for the stem, 0.24 for leaf, 0.51 for flower, 0.55 for ovule (0 DPA), 0.53 for 5 DPA fiber, 0.53 for 10 DPA fiber, and 0.47 for 20 DPA fiber tissues (Fig. 3). Similarly, tissues of short fiber pag1 cotton plants also showed a positive correlation between RNA-seq and Ribo-seq datasets, with R2 values of 0.54 for 5 DPA fiber, 0.54 for 10 DPA fiber, and 0.45 for 20 DPA fiber tissues (Fig. 3). Our results indicated moderate correlations between transcriptome and translatome datasets for the different observed tissues of ZM24 and pag1 plants. Overall, a positive correlation between the transcriptome and translatome datasets provided a more comprehensive understanding of gene expression regulation and help to elucidate the molecular mechanisms underlying cotton growth and fiber development.

Fig. 3.

Fig. 3

Correlation between RNA-seq and Ribo-seq data of fiber and non-fiber tissues of ZM24 and pag1 cotton tissues. Scatter plot of correlation between RNA-seq and Ribo-seq data of non-fiber (root, stem, leaf, and flower) and fiber tissues (ovule (0 DPA), 5, 10, and 20 DPA) of ZM24 and pag1 cotton.

P-site distribution to map cotton translatome

The distribution of ribosome footprints within ORFs displayed a clear three-nucleotide periodicity, indicating ribosome translation reads three nucleotides at once. Furthermore, the distribution of ribosome footprints about the translation start and stop codons revealed the presence of codons at the P-site within the ribosome. We observed P-site distributions within the first 100 nucleotides for each gene in the CCDS, exonCCDS, and nonCCDS regions. The results showed that the P-site distribution within the first 100 nucleotides exhibited a more pronounced three-nucleotide periodicity in CCDS and exonCCDS regions than in nonCCDS regions across different observed tissues. For example, the root, stem, leaf, flower, and 5 DPA fiber tissues of ZM24 cotton exhibited a stronger three-nucleotide periodicity in CCDS and exonCCDS regions compared to nonCCDS regions (Fig. 4A and Fig. S3). We assessed the three-nucleotide periodicity in fiber tissues, including ovules (0 DPA), 5 DPA, 10 DPA, and 20 DPA fibers of ZM24 and pag1 cotton, in the CCDS (Fig. S4 and S5), exonCCDS (Fig. S6 and S7), and nonCCDS regions (Fig. S8 and S9). All fiber tissues displayed a clear three-nucleotide periodicity in CCDS and exonCCDS regions, indicating efficient translation. However, the first 100 nucleotides of the nonCCDS region in all observed tissues of both ZM24 and pag1 cotton plants showed an unclear three-nucleotide periodicity, suggesting that these regions are unable to initiate translation and synthesize a protein. We estimated the efficiency of the three-nucleotide periodicity based on the read percentage in the expected reading frame for CCDS, exonCCDS, and nonCCDS regions (Fig. 4B). Overall, the ribosome footprints with a clear three-nucleotide periodicity in the first 100 nucleotides from the start of each gene in CCDS and exonCCDS regions indicated the high quality of our Ribo-seq dataset for ZM24 and pag1 cotton plants.

Fig. 4.

Fig. 4

Distribution of P-sites in the first 100 nucleotides of each read (CDS, UTR, or lncRNA), and percentage of footprints in first 100 nucleotides of the root, stem, leaf, flower, and 5 DPA fiber tissues of ZM24 cotton. (A) Bar plot of P-site distribution showing three-nucleotide periodicity in the first sample of the root, stem, leaf, flower, and 5 DPA fiber tissues of ZM24 cotton in CCDS (left), exonCCDS (middle), and nonCCDS region (right). (B) Bar plot of the percentage of footprints that match these primary reading frames.

Translatome landscapes from multiple tissues of ZM24 and pag1 improve cotton genome annotation

To perform the reference-guided de novo transcriptome assembly, we initially merged the replicates of newly assembled transcriptomes and compared them with ZM24_genome_assembly_V1.0 using Bowtie2 v2.3.4.1 [53]. In our study, we utilized previously reported data (Table S5) for the prediction of lncRNAs [20]. The comparison results are presented in Fig. S10. The most abundant group of novel transcripts identified in our dataset were lncRNAs exhibiting tissue-specific characteristics. We subjected all identified novel transcripts in our study to ZM24_genome_assembly_V1.0 and Hisat2 to obtain translated ORFs for all tissues of both ZM24 and pag1 cotton. In plants, several studies have been conducted and thousands of lncRNAs have been identified [55], [56]. Here, we identified a total of 43,901 unannotated transcripts that may encode some potential novel transcripts, which may be lncRNAs and categorized as intergenic lncRNAs. The lncRNAs in cotton are more abundant than those of Arabidopsis (more than 6,000) and maize (20,163) [55], [56]. The lowest expression rule was used for filtering novel transcripts with low expression, and finally, 10,326 were reserved. After filtering, 552 lncRNAs were detected with ORFs based on Ribo-seq data. The maximum length of lncRNA was 562 amino acids (aa), while the minimum length of lncRNA was 4 aa and the average length of lncRNA was 53 aa. Fiber tissues of both ZM24 and pag1 cotton contained the most translated lncRNAs, followed by root and flower tissues (Table S6). As lncRNAs with ORFs are ncORF and we identified these ncORFs as novel protein-coding genes. All identified ncORFs as novel protein-coding genes were annotated (Table S7) and were integrated into cotton genome annotation. To examine, whether these novel protein-coding genes are tissue-specific, we constructed a heat map indicating the distribution of ncORFs (lncRNAs) in different tissues. Results indicated that the novel identified protein-coding genes (lncRNAs with ncORFs) are tissue-specific and were distributed in almost all tissues of ZM24 and pag1 cotton (Fig. S11 and Table S8). The majority of lncRNAs identified in our study were expressed in 10 and 20 DPA fiber tissues of both ZM24 and pag1 cotton.

To identify potential ORFs in our dataset, we employed RiboTaper_v1.3 [43] on two samples of root, stem, leaf, and flower of the ZM24 plant as well as three samples of the ovule (0 DPA), 5, 10, and 20 DPA fiber tissues of ZM24 and pag1 plants. RiboTaper_v1.3 evaluates P-sites in each potential ORF and assesses the statistical significance of three-nucleotide periodicity through signals [43]. Firstly, we determined the number of ORFs, including sORFs (uORFs and dORFs), ncORFs, and ORFs_ccds, in all tissues of ZM24 and pag1 cotton plants (Table S9). We identified several novel ORFs in the root, stem, leaves, flower, and ovule (0 DPA) of ZM24 cotton and 5, 10, and 20 DPA fiber tissues of ZM24 and pag1 cotton. Specifically, we observed a greater number of translated ORFs (ORFs_ccds) in almost all tissues, with the highest numbers in 10 DPA fiber of ZM24 and pag1 cotton (Fig. 5A and Table S9). In summary, more uORFs were detected in 10 DPA fiber of pag1 and root tissues of ZM24 (Table S10), and more dORFs (Table S11) and ncORFs were detected in 10 DPA fiber of both ZM24 and pag1 cotton (Table S9). However, a comparison of non-fiber tissues of ZM24 cotton showed that more uORFs (2 5 3), and ORFs_ccds (47,937) were present in the root than in other tissues. Moreover, a comparison of fiber tissues of ZM24 and pag1 cotton indicated that more uORFs, dORFs, ncORFs, and ORFs_ccds were present in 5, 10, and 20 DPA normal fiber of ZM24 than 5, 10, and 20 DPA short fiber of pag1 cotton. The majority of protein-coding transcripts with significant transcript levels in our dataset demonstrate the efficacy and strength of ORFs translation.

Fig. 5.

Fig. 5

Numbers and length distributions of different types of ORFs (dORFs, ncORFs, ORFs_ccds, and uORFs) in the tissues of ZM24 and pag1 cotton plants. (A) Circular bar plot indicating the numbers of dORFs, ncORFs, ORFs_ccds, and uORFs in all observed tissues of ZM24 and pag1 plants. Here, the prefix Z indicates ZM24 cotton, and P indicates pag1 cotton. (B) A circular heatmap displaying p-sites per codon in novel identified ORFs (uORFs, dORFs, ncORFs, and ORFs_ccds) for each tissue including two samples of root, stem, leaf, flower, and ovule (0 DPA) of ZM24 plant and three samples of 5, 10, and 20 DPA fiber tissues of ZM24 and pag1 plants. (C) Length distribution (nucleotides) of ORFs in the first sample of the ovule (0 DPA), 5, 10, and 20 DPA fiber of ZM24 and pag1 cotton. (D) Length distribution (nucleotides) of sORFs (short ORFs) in the first sample of the ovule (0 DPA), 5, 10, and 20 DPA fiber of ZM24 and pag1 cotton. The suffixes (1), (2), and (3) in the bar plots indicate the first, second, and third samples of these tissues respectively. The prefix Z indicates ZM24 cotton, and P indicates pag1 cotton.

We examined the P-site per codon in different types of novel identified ORFs (uORFs, dORFs, ncORFs, and ORFs_ccds) for each tissue of ZM24 and pag1 cotton plants (Table S12). The P-site, which is the second binding site for tRNA in the ribosome, helps translation by holding tRNA linked with the growing polypeptide chain [27]. We used this information to analyze the efficiency of translation in each type of ORF (Fig. 5B and Table S12). The ORFs were classified as uORFs (present in 5′ UTRs), dORFs (present in 3′ UTRs), ORFs_ccds (ORFs in coding regions), and ncORFs (ORFs in regions of lncRNAs). In the 20 DPA fibers of pag1 cotton, it was estimated that there were a greater number of average P-sites in both the uORFs (1.098) and dORFs (0.338). Similarly, a higher number of average P-sites in ncORFs (0.879) were estimated in the flowers of ZM24 cotton. In contrast, the maximum number of average P-sites in ORFs_ccds (0.208) were estimated in the 10 DPA fibers of the pag1 cotton plant. Among the non-fiber tissues of ZM24 cotton, it was observed that there were more P-sites in the uORF region of leaf tissues as compared to other tissues. However, when the 5, 10, and 20 DPA fiber tissues of ZM24 and pag1 cotton were compared, more P-sites were observed in the uORF region of the 20 DPA short fiber of pag1 than in the normal fiber of ZM24 cotton.

We also examined unannotated ORFs of various lengths located in the 5′ and 3′ UTR of annotated transcripts from root, stem, leaf, and flower samples of the ZM24 plant, as well as three samples (0 DPA ovule, and 5, 10, and 20 DPA fiber tissues) of both ZM24 and pag1 plants. However, not all transcripts were annotated in the 5′ UTR because RiboTaper_v1.3 only identifies transcripts within a defined region [2] As a result, the number of identified ORFs and sORFs (including uORFs and dORFs) in different tissues of ZM24 and pag1 cotton plants varied. Most of the newly assembled transcripts were ncORFs (non-coding ORFs). The annotated ORFs in our datasets were approximately 200–250 nucleotides in size, while variations in the size of sORFs were observed (Fig. S12A and B). All newly identified sORFs were annotated and integrated into the cotton genome annotation (Table S13). We then compared the length distribution of ORFs and sORFs. As a result, both samples of root, stem, leaf, and flower of the ZM24 plant (Fig. S13A and B), as well as three samples of the ovule (0 DPA), 5, 10, and 20 DPA fiber tissues from normal fiber ZM24 and short fiber pag1 cotton plants (Fig. 5C and D, and Fig. S14 and S15), demonstrated that sORFs had an irregular length distribution pattern compared to the length distribution pattern of ORFs in both normal and short fiber and other tissues.

Previous studies have shown that the analysis and annotation of Ribo-seq data can improve and fine-tune genome annotation [2]. Ribo-seq is a comprehensive method that helps validate and improve genome annotation. In our study, the newly identified lncRNAs with ORFs (ncORFs) may potentially be coding genes, which could aid in improving and fine-tuning genome annotation. Similarly, in humans, several regions containing short ORFs that overlap with long ORFs with different reading frames were identified with unknown functional importance [57]. Overall, the integration of Ribo-seq datasets with RiboTaper analysis not only validated annotated genes but also discovered new ORFs that have now been integrated into the cotton genome annotation.

Translational efficiency of ORFs

The Kozak consensus sequence is a nucleic acid motif that serves as a protein translation initiation site in most mRNA transcripts. It ensures proper protein translation and facilitates ribosome assembly and translation initiation. To investigate the translation initiation site in various types of ORFs, including sORFs (uORFs and dORFs), ncORFs, and ORFs_ccds, we examined their Kozak consensus sequences (Fig. 6A). We observed the translation initiation site (ATG) in the Kozak consensus sequences of uORFs, dORFs, ncORFs, and ORFs_ccds, indicating that these identified ORFs have the potential to mediate ribosome assembly and translation initiation. Similar results were observed in previous studies conducted on other species, such as Arabidopsis [58] and tomatoes [2]. To further investigate the TE of the novel-identified translated ORFs (protein-coding genes) and lncRNAs, we conducted a detailed analysis (Fig. S16A and B). Our results demonstrated that both translated ORFs and lncRNAs possess features of translation and exhibit TE. These findings suggest that the novel-identified translated ORFs and lncRNAs act as genuine protein-coding genes in the cotton genome. Moreover, we compared the TE of protein-coding genes and lncRNAs in two samples of root, stem, leaf, and flower from the ZM24 plant, as well as three samples of the ovule (0 DPA), 5, 10, and 20 DPA fiber tissues from ZM24 and pag1 plants (Fig. 6B and Fig. S17). Our results revealed that genes exhibited higher TE than lncRNAs in all samples of ZM24 and pag1 cotton plants, except for the stem of the ZM24 cotton plant where the TE of both genes and lncRNAs was similar at the non-distinguishable stage. Our findings further confirmed that lncRNAs possess the characteristics of translation, although their translational efficiency is lower than that of genes. We also investigated whether lncRNAs, particularly ncORFs, could be translated. Our analysis revealed that MSTRG.48719.1 is a novel coding gene with ncORFs. We observed that this gene exhibited high expression levels in the fiber tissue of ZM24 at 10 DPA. However, in the fiber tissue of pag1 cotton, its expression levels were very low and even undetectable during RNA-seq and Ribo-seq analysis (Fig. 6C and D). While the absence of an open reading frame generally limits the translation efficiency and potential of lncRNAs as protein-coding genes. Our results suggest that some lncRNAs can still be translated and may function as protein-coding genes.

Fig. 6.

Fig. 6

ORFs pattern, TE of genes, and lncRNAs. (A) The pattern of Kozak consensus sequences of uORFs, dORFs, ncORFs, and ORFs_ccds. (B) Comparison of TE between protein-coding genes and lncRNAs in first samples of root, stem, leaf, flower, and ovule (0 DPA) of ZM24 cotton and 5, 10, and 20 DPA fiber of normal fiber ZM24 and short fiber pag1 cotton. (C) The expression level of MSTRG.48719.1 at 10 DPA fiber of pag1 and ZM24 cotton during RNA-seq analysis. (D) The expression level and P-sites of lncRNA (MSTRG.48719.1) at 10 DPA fiber of pag1 and ZM24 cotton during Ribo-seq analysis. The prefix Z indicates ZM24 cotton, and P indicates pag1 cotton. The suffix (1) in the bar plots indicates the first sample of these tissues.

Transcriptome and translatome of DEGs

To compare the normal and short fiber of ZM24 and pag1 cotton plants, we investigated the transcriptional and translational control of various genes in both non-fiber tissues (root, stem, leaf, flower) and fiber tissues (ovule (0 DPA), 5, 10, and 20 DPA), particularly during the elongation stage (10 and 20 DPA fiber). We selected the 10 and 20 DPA fiber tissues for analysis as they are fast-growing tissues that primarily determine the important parameter of fiber length. All the DEGs identified during fiber vs non-fiber tissues, 10 DPA, and 20 DPA fiber were annotated and listed in Table S14, S15, and S17. Initially, we compared the RNA-seq and Ribo-seq datasets of fiber and non-fiber tissues of ZM24. The RNA-seq analysis identified 11,752 DEGs, including 9,531 up-regulated genes and 2,221 down-regulated genes (Fig. S18A), while the Ribo-seq analysis detected 5,659 DEGs, including 4,321 up-regulated genes and 1,338 down-regulated genes (Fig. S18B). A comparison of both RNA-seq and Ribo-seq datasets showed a total of 2,409 up-regulated genes and 526 down-regulated genes (Fig. S18C). Notably, fewer genes were up-regulated while a greater number of genes were down-regulated in fiber tissues compared to non-fiber tissues. Furthermore, a comparison of synergistic changes for all genes revealed a linear relationship (with the PCC = 0.63) between RNA-seq and Ribo-seq, indicating higher synchrony between the two analyses for both fiber and non-fiber tissues of ZM24 cotton (Fig. S18D). The correlation analysis of a total of 2,935 DEGs between RNA-seq and Ribo-seq datasets indicated high consistency between the two analyses, with an R2 value of 0.66 (Fig. S18E). The Gene Ontology (GO) enrichment analysis for both fiber and non-fiber tissues of ZM24 cotton revealed a total of 697 functional gene families, including genes associated with photosynthesis and photosystem development (Fig. S19A). Additionally, KEGG enrichment analysis identified 255 functional gene families, including genes related to photosystem, photosynthesis-related pathways, MYC, and WRKY families in both RNA-seq and Ribo-seq datasets (Fig. S19B and Table S14).

Subsequently, we conducted a comparison of DEGs between ZM24 and pag1 plants at 10 DPA fiber tissues through RNA-seq and Ribo-seq analysis. The RNA-seq analysis revealed 3,015 DEGs in the 10 DPA tissue of normal fiber ZM24 and short fiber pag1 cotton plants, wherein 1,248 genes were up-regulated and 1,767 were down-regulated (Fig. S20A). Likewise, Ribo-seq analysis detected 2,070 DEGs, with 1,016 genes being up-regulated and 1,054 genes being down-regulated in the 10 DPA tissue of normal fiber ZM24 and short fiber pag1 cotton plants (Fig. S20B). Moreover, both the RNA-seq and Ribo-seq datasets exhibited 208 up-regulated genes and 313 down-regulated genes (Fig. S20C). A comprehensive comparison of the overall fold changes for all genes showed that the synergistic changes (with the PCC = 0.3) in the RNA-seq and Ribo-seq datasets did not consistently reflect synchronized transcription and translation in the 10 DPA tissue of normal fiber ZM24 and short fiber pag1 cotton plants (Fig. S20D). In contrast, the correlation analysis of a total of 521 DEGs from the RNA-seq and Ribo-seq datasets revealed a high level of consistency between the two analyses, as indicated by an R2 value of 0.67 (Fig. S20E). Subsequently, we performed GO enrichment analysis for the 10 DPA fiber of ZM24 and pag1 plants, which resulted in the identification of 89 functional gene families, including cytochrome P450, receptor-like kinase 1, and MADS gene families (Fig. S21A). Furthermore, KEGG enrichment analysis revealed 17 functional gene families, which included cytochrome P450 and 3-ketoacyl-CoA synthase gene families (Fig. S21B, and Table S15). The results of both GO and KEGG enrichment analyses for the 10 DPA tissues of normal fiber ZM24 and short fiber pag1 cotton plants indicated the enrichment of genes associated with cuticle development, wax biosynthesis, and fatty acid elongation. To explore BR-related genes in cotton fibers, we categorized DEGs in the 10 DPA fiber tissue into three groups based on their expression levels: DEGs at the transcription level, DEGs at the translation level, and DEGs at both levels and manually annotated each category (Table S16). Our analysis identified several BR-related genes, including GhPAG1, GhROT3-1, and GhPRE1, as well as target genes of GhBES1.4 (a master transcription factor in BR signaling) among the DEGs at the transcription, translation, and both levels suggesting a potential role of BR signaling in cotton fiber development [37].

Next, we compared the 20 DPA tissues of normal fiber ZM24 and short fiber pag1 cotton plants using RNA-seq analysis, which revealed 3,832 DEGs (2,053 up-regulated genes and 1,779 down-regulated genes) (Fig. S22A). Additionally, Ribo-seq analysis detected 466 DEGs (213 up-regulated genes and 253 down-regulated genes) (Fig. S22B). While RNA-seq and Ribo-seq showed 75 up-regulated and 79 down-regulated genes (Fig. S22C), a comparison of fold changes for all genes indicated inconsistency (with the PCC = 0.25) between the RNA-seq and Ribo-seq, indicating that RNA-seq and Ribo-seq are not synchronized (Fig. S22D). However, the correlation between RNA-seq and Ribo-seq for a total of 124 DEGs revealed high consistency between the RNA-seq and Ribo-seq, with an R2 value of 0.71 (Fig. S22E). GO enrichment analysis of the 20 DPA fiber tissues of ZM24 and pag1 plants identified a total of 41 functional gene families, including starch hydrolase, NAC, and cytochrome P450 gene families (Fig. S23A). Meanwhile, KEGG enrichment analysis identified 10 functional gene families, including cytochrome P450 and 3-ketoacyl-CoA synthase gene families (Fig. S23B and Table S17). Both GO and KEGG enrichment analyses of the 20 DPA fiber tissues of ZM24 and pag1 plants indicated the enrichment of genes associated with fatty acid elongation. These findings highlighted the differences between the 10 DPA and 20 DPA tissues of both normal fiber ZM24 and short fiber pag1 cotton plants.

sORF plays regulatory roles in fiber development

Recent studies have investigated the biological significance of sORFs (translated small functional peptides) that play regulatory roles during mRNA translation. In yeast and Drosophila, sORFs are involved in the development process by activating transcription factors [59], [60]. In plants, sORFs play significant roles in growth and development, as seen in the process of morphogenesis in Arabidopsis, where several translated sORFs were involved [61]. In our study, we identified several translated sORFs that may play a significant role in fiber development. During RNA-seq and Ribo-seq analysis of fiber tissues of both normal fiber ZM24 and short fiber pag1 cotton plants, we found two DEGs associated with sORFs (uORFs or dORFs). First, we identified the Ghicr24_A05G103300.1 gene, which encodes a 3-ketoacyl-CoA synthase 6 (GhKCS6), associated with sORF that showed peaks of high expression and P-sites at 5 and 10 DPA fiber tissues compared to 20 DPA fiber tissues of normal fiber ZM24 cotton plant (Fig. 7A). Differential expression analysis of the Ghicr24_A05G103300.1 gene in cotton fiber tissues revealed distinct expression patterns between short fiber pag1 and normal fiber ZM24 cotton plants. Moreover, we observed higher expression levels of the Ghicr24_A05G103300.1 gene in both normal fiber ZM24 and short fiber pag1 cotton plants at 5 and 10 DPA tissues (Table S18). These findings suggest that sORFs may play a significant role in the development of cotton fibers. Next, we performed a co-expression network analysis of the Ghicr24_A05G103300.1 gene, and the results revealed that several genes, including Ghicr24_A11G180100.1, Ghicr24_A09G073700.1, Ghicr24_D01G184200.1, Ghicr24_D08G239300.1, Ghicr24_A08G146100.1, and Ghicr24_D08G152400.1, were co-expressed with Ghicr24_A05G103300.1 (Fig. 7B, and Table S19). These co-expressed genes were differentially expressed at 10 DPA fiber tissues of both short fiber pag1 and normal fiber ZM24 cotton plants, and their expression was down-regulated in short fiber pag1 as compared to normal fiber ZM24 cotton. Functional enrichment analysis indicated the involvement of these genes in lipid metabolism and the biosynthesis of fatty acids (Fig. 7B). To investigate the role of Ghicr24_A05G103300.1 (GhKCS6) in cotton fiber elongation, GhKCS6 gene overexpression lines (E6-GhKCS6-OE) with fiber-specific promoter E6, and GhKCS6-RNAi lines, were developed (Fig. 7C). Analysis of GhKCS6 expression patterns in E6-GhKCS6-OE and GhKCS6-RNAi plants revealed noteworthy differences. Specifically, three independent E6-GhKCS6-OE lines exhibited significantly higher GhKCS6 expression in their fiber tissues. Conversely, fiber tissues of all three GhKCS6-RNAi lines showed a marked reduction in GhKCS6 expression. Taken together, the increased expression of GhKCS6 in E6-GhKCS6-OE lines and the reduced expression of GhKCS6 in GhKCS6-RNAi lines provide strong evidence for successful overexpression and silencing of the GhKCS6 gene, respectively. The phenotypic analysis revealed that the fiber length of GhKCS6-OE lines was significantly greater than that of the control plant ZM24. On the other hand, the GhKCS6-RNAi lines exhibited a significant decrease in fiber length when compared to the control suggesting that the sORF-associated GhKCS6 gene plays a significant role in regulating fiber length in cotton (Fig. 7C).

Fig. 7.

Fig. 7

The expression pattern, co-expression network, overexpression and knockdown of sORF associated GhKCS6 (Ghicr24_A05G103300.1). (A) Expression pattern of Ghicr24_A05G103300.1 in 5 10 and 20 DPA fiber tissues of normal fiber ZM24 and short fiber pag1 cotton in RNA-seq and Ribo-seq data. (B) Co-expression network analysis of Ghicr24_A05G103300.1. Red lines in this co-expression network indicate a positive relationship, grey lines indicate a negative relationship (left), and boxplots show FPKMs of the highlighted genes at 10 DPA between pag1 and ZM24 cotton (right). (C) The mature fiber phenotype and associated statistics of three independent overexpression lines (E6-GhKCS6-OE) and three independent knockdown lines (GhKCS6-RNAi) of GhKCS6 were compared to the ZM24 fiber (left). These lines are denoted as L1 (Line 1), L2 (Line 2), and L3 (Line 3). Bar = 1 cm. The expression patterns of GhKCS6 in the fiber tissues of three independent overexpression lines (E6-GhKCS6-OE) and three independent knockdown lines (GhKCS6-RNAi) of GhKCS6 were compared to those in the fiber tissues of the ZM24 cotton (right). Statistical significance was analyzed by one-way ANOVA (***P < 0.01). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

We have identified Ghicr24_D12G286900.1 (homeobox-leucine zipper protein HDG11), previously known as GhHOX3, reported to play a significant role in cotton fiber elongation [62] and this gene is associated with sORF. Based on Ribo-seq analysis, we observed that Ghicr24_D12G286900.1 exhibited peaks of P-sites at 20 DPA tissues of short fiber pag1 cotton, as well as at 5 and 10 DPA tissues of normal fiber ZM24 cotton, compared to other tissues (Fig. S24A). Furthermore, our analysis revealed that Ghicr24_D12G286900.1 was expressed at higher levels in the ovule (0 DPA), as well as at 5, 10, and 20 DPA tissues of normal fiber ZM24 cotton, as compared to the corresponding tissues of short fiber pag1 cotton (Fig. S24B and Table S20). RNA-seq analysis revealed that Ghicr24_D12G286900.1 exhibited high expression level in ovules (0 DPA), as well as in tissues from normal fiber ZM24 and short fiber pag1 cotton at 5, 10, and 20 DPA, and non-fiber tissues such as roots, stems, leaves, and flowers of ZM24 cotton (Fig. S24C and Table S20). These findings collectively suggest that sORFs play a crucial role in regulating cotton fiber development, particularly during the elongation stage.

We also identified 23 DEGs associated with sORFs (uORFs and dORFs) during the comparison of fiber and non-fiber tissues in ZM24 cotton (Table S21). Of these 23 DEGs, 14 were associated with uORFs, while nine were associated with dORFs. Through integrated high-throughput functional analysis, including GO and KEGG enrichment analysis, we found that most of these DEGs associated with sORFs were specific to fiber tissues. These results further support our findings that sORFs may play a crucial regulatory role in cotton fiber development, and shed light on the potential differential regulation mechanism of fiber development in cotton.

Discussion

Ribosome profiling is a key method for studying translational regulation and its role in determining gene expression, with particular relevance to specific biological pathways. Ribosome profiling allows for a comprehensive analysis of the translational landscape at the genome-wide level [10]. In eukaryotes, ribosome profiling enables the detection of translation initiation sites by identifying ribosome footprints that are enriched at or near the start codons of ORFs [63]. Additionally, ribosome profiling provides insights into the active translation of novel transcripts and multiple translation initiation sites within genes [64]. As one of the most widely cultivated natural fiber crops in the world [65], cotton is an ideal model for studying cell elongation, given that cotton fiber is among the longest plant cells [66]. Despite recent advancements in cotton genomics, our understanding of translational regulation in cotton remains incomplete. However, with the availability of complete genome sequencing, it is now possible to combine transcriptome assembly with ribosome footprints to study the cotton translatome. In this study, we utilized RNA-seq and Ribo-seq techniques to investigate the transcriptional and translational landscape in normal fiber ZM24 and short fiber pag1 cotton plants. Unlike prior investigations that focused on identifying uORFs solely in coding regions of the CCDS, our study explored novel ORFs including protein-coding genes, sORFs (uORFs and dORFs), as well as ncORFs/lncRNAs in both CCDS, exonCCDS, and nonCCDS regions. These novel transcripts enabled us to fine-tune the annotation of the cotton genome and elucidate the regulatory role of these novel ORFs in plant growth and fiber development.

Transcriptome and translatome landscape

Translational research in plants has predominantly focused on Arabidopsis, maize, and tomato. Nonetheless, in our investigation, we discovered that despite diverging millions of years ago, many features of the translatome were comparable between Arabidopsis, tomato, and cotton. For instance, ribosome footprints and their sizes, as well as the P-site within ribosome footprints, were enriched and consistent among these species. Additionally, novel translational events involving sORFs (uORFs and dORFs), long non-coding RNAs, and translational regulatory mechanisms were conserved across species [2], [11], [16], [26]. The emergence of deep sequencing of ribosome fragments has facilitated the examination of active translation, while next-generation RNA sequencing (RNA-seq) has enabled us to explore the cotton transcriptome and investigate translational events. For the first time, we generated reference-guided de novo transcriptome assembly for normal fiber ZM24 and short fiber pag1 cotton plants, which enabled us to detect several transcripts that were absent from the ZM24_genome_assembly_V1.0 annotation.

It is noteworthy that cotton translatome profiling revealed a predominant ribosome footprint size of 27 nucleotides, consistent with other plant species such as Arabidopsis and tomatoes [2], [16]. In this study, we classified Ribo-seq reads into three categories, including the coding region (CCDS), the exon coding region (exonCCDS), and the lncRNAs, for all observed tissues of normal and short fiber cotton plants. Similar to previous studies, we observed a strong three-nucleotide periodicity in ribosome footprints distribution within ORFs in the first 100 nucleotides, and the distribution of P-sites was evident in CCDS and exonCCDS regions. However, P-sites distribution was unclear in the nonCCDS regions of all tissues in normal and short-fiber cotton plants. Overall, the Ribo-seq datasets displayed considerable consistency across all tissues, except for the stem tissue. When comparing transcriptome and translatome profiling through a scatterplot of correlation for all observed tissues, a positive correlation was observed. The correlation among tissues of normal and short-fiber plants during RNA-seq and Ribo-seq, as well as between RNA-seq and Ribo-seq for all tissues, validated our approach. Furthermore, the positive correlation between RNA-seq and Ribo-seq datasets among all tissues of normal and short-fiber cotton plants was indicative of a robust association.

Fine-tuning of cotton genome annotation

lncRNAs perform a range of regulatory functions, including chromatin modifications, transcriptional regulation, and post-transcriptional modification [21], [67]. Although they generally lack protein-coding capacity, they are involved in various regulatory pathways [67]. The expression of lncRNAs is often low and specific to particular tissues, suggesting their involvement in tissue development [68]. Previous studies have identified a significant number of lncRNAs in Arabidopsis, maize, and cotton [20], [55], [56]. During the comprehensive functional genome annotation, a total of 552 lncRNAs with ORFs (ncORFs) were identified, with an average length of 53 amino acids. Most of the translated lncRNAs were found in fiber tissues. Furthermore, the novel lncRNAs identified showed enrichment in the 10 and 20 DPA fiber tissues of both normal fiber ZM24 and short fiber pag1 cotton plants, suggesting a complex regulatory role of lncRNAs in cotton fiber development. For example, the highly expressed lncRNA MSTRG.48719.1 was detected in the 10 DPA tissue of normal fiber ZM24, and its expression was higher than in the short fiber pag1 cotton plant in both RNA-seq and Ribo-seq analyses. These results suggest that lncRNAs have the potential to act as protein-coding genes and may be involved in the regulation of cotton fiber elongation.

Furthermore, the P-site per codon in various types of novel ORFs, including sORFs (uORFs, dORFs), ncORFs, and ORFs_ccds, indicating the novelty of our data and their potential for translation, as they were detected as active mRNAs. Among the translated ORFs, a substantial number of sORFs were identified. The size of ORFs ranged from 100 to 250 nucleotides, while the size of sORFs showed variability, as confirmed by comparing the length distribution between ORFs and sORFs. Previous ribosome profiling studies have identified several novel sORFs in tomatoes [2]. sORFs have been reported to play various roles in plants, including vegetative and reproductive growth, RNA biogenesis, and stress tolerance [69], [70], [71], [72]. In addition, the maximum number of sORFs (uORFs and dORFs), ncORFs, and ORFs_ccds were observed in the 10 DPA fiber tissues of normal fiber ZM24 cotton, compared to the short fiber pag1 cotton plant. This suggests a fiber elongation-specific regulatory mechanism of these ORFs. Moreover, several sORFs were also detected in other fiber and non-fiber tissues of both ZM24 and pag1 cotton plants. A considerable number of novel protein-coding transcripts (ORFs_ccds) with reasonable transcript levels were detected in our datasets compared to other types of ORFs, indicating the robustness and efficiency of ORF translation.

An independent analysis of the TE of novel ORFs (protein-coding genes) and lncRNAs indicates that both types of RNA possess the ability to undergo translation. When comparing the TE of protein-coding genes and lncRNAs in all tissues, protein-coding genes exhibited higher TE than lncRNAs, although lncRNAs also displayed TE. The pattern of the Kozak consensus sequence revealed the translation initiation site (ATG) in various types of ORFs, including sORFs (uORFs and dORFs), lncRNAs/ncORFs, and ORFs_ccds, indicating the potential of the novel ORFs to facilitate ribosome assembly and translation initiation. These findings are consistent with previous studies on other plant species, such as Arabidopsis [58] and tomatoes [2]. Overall, the novel ORFs identified in various tissues, including sORFs (uORFs, dORFs), lncRNAs/ncORFs, and ORFs_ccds, have the potential to be translated into protein-coding genes involved in the complex regulatory mechanisms of cotton fiber development. All of the identified novel ORFs have been incorporated into the cotton genome annotation.

Transcriptome and translatome of DEGs

In the comparison of transcriptome and translatome data, several DEGs were identified. The RNA-seq data exhibited a higher number of DEGs than the Ribo-seq data. Specifically, when comparing fiber and non-fiber tissues of normal fiber ZM24 cotton plants, RNA-seq detected a greater number of DEGs than Ribo-seq and indicated a linear relationship and higher synchrony between RNA-seq and Ribo-seq datasets. Functional enrichment analysis indicated that these DEGs were enriched in the photosystem and photosynthesis pathways. Similarly, the comparison of DEGs during 10 and 20 DPA fiber tissue revealed a higher number of DEGs during RNA-seq than Ribo-seq and indicated a linear relationship and higher synchrony. The functional enrichment analysis of DEGs showed the enrichment of genes involved in fatty acid elongation and brassinolide biosynthesis, thereby revealing the mechanism of cotton fiber elongation. Previously, functional enrichment analysis of DEGs has been conducted in tomatoes [2]. GO and KEGG enrichment analyses and comparison between 10 and 20 DPA fiber tissues of normal fiber ZM24 and short fiber pag1 cotton plants suggested that RNA-seq analysis provides transcript identity and abundance, while Ribo-seq analysis quantifies and maps transcript occupancy [52]. Moreover, the greater number of DEGs detected by RNA-seq analysis as compared to Ribo-seq analysis supports the validation of both methods and highlights clear differences among 10 and 20 DPA fiber and non-fiber tissues of normal fiber ZM24 and short fiber pag1 cotton plants. Our analysis of DEGs in the 10 DPA fiber tissue revealed several BR-related genes suggesting that BR signaling plays a crucial role in cotton fiber development at the transcription and translation levels. Further, the ratios and numbers of expressed coding genes on the At and Dt sub-genomes suggested that sub-genome-biased expression is a critical factor in the regulation of cotton fiber development.

Previous studies have shown that sORFs, can impact downstream ORF translation [18], [73]. and play a role in regulating mRNA translation [74], [75], [76]. In rice, the repression of a uORF has been linked to immediate plant resistance to pathogen attack without any negative effects on plant growth [75]. Identification of ORFs can provide insights into the proteins involved in various physiological and biological pathways. However, uORFs in cotton are currently underestimated, and the 5′ UTR annotation of cotton genes is still necessary to identify missing uORFs. Therefore, the extensive study of uORFs could enhance our understanding of translational control in cotton. In our study, we identified two genes associated with sORFs. One of these genes, Ghicr24_A05G103300.1 (GhKCS6), was highly expressed in 5 and 10 DPA tissues of normal fiber ZM24 cotton. Co-expressed genes with GhKCS6 were found to be involved in lipid metabolism and fatty acid biosynthesis. Additionally, overexpression and knockdown of GhKCS6 in cotton resulted in a significant increase and decrease in fiber length, respectively, compared to the control plant ZM24. The analysis of GhKCS6 expression patterns in E6-GhKCS6-OE and GhKCS6-RNAi plants demonstrated successful overexpression and silencing of the GhKCS6 gene, respectively. Another gene, Ghicr24_D12G286900.1 (GhHOX3), was highly expressed in 20 DPA tissues of short fiber pag1 cotton and 5 and 10 DPA tissues of normal fiber ZM24 cotton. These findings provide support for the hypothesis that sORFs associated genes GhKCS6 and GhHOX3, play significant roles in the regulation of fiber length in cotton plants. These results were supported by a previous study [62]. Additionally, our identification of 23 DEGs associated with sORFs (including uORFs and dORFs) in the comparison of fiber versus non-fiber tissues of ZM24 cotton has shed light on their potential involvement in the mechanisms of cotton fiber development, thereby advancing our understanding of this process. These findings collectively suggest that sORFs may be differentially regulated during cotton fiber development.

Conclusions

In summary, our study utilized an integrated approach that combined transcriptome and translatome profiling to construct a reference-guided de novo transcriptome assembly and identify novel ORFs, including lncRNAs, in normal fiber ZM24 and short fiber pag1 cotton plants. The translatome profiling revealed unique and conserved translational features that fine-tuned the cotton genome annotation and demonstrated the complex regulatory role of lncRNAs and sORFs during cotton fiber elongation. Overall, our findings provide valuable insights into cotton genomics and the extensive mechanisms involved in the translational control of cotton fiber development. Furthermore, our approach offers a practical method for studying the translatome in other plant species and organisms.

Availability of data and materials

All the data generated during this study and identified novel transcripts including ORFs, sORFs (uORFs, and dORFs), and lncRNAs/ncORFs are provided in supplementary data. This data is available on the public database GRAND (Gossypium Resource and Network Database) at https://grand.cricaas.com.cn/page/download/download.

Ethics approval and consent to participate

Not applicable.

CRediT authorship contribution statement

Ghulam Qanmber: Conceptualization, Formal analysis, Writing – original draft, Writing – review & editing. Qi You: Methodology, Data curation, Investigation. Zhaoen Yang: Conceptualization, Methodology, Investigation, Writing – review & editing. Liqiang Fan: Data curation, Formal analysis. Zhibin Zhang: Data curation, Formal analysis. Baibai Gao: Software, Visualization. Fuguang Li: Conceptualization, Writing – review & editing, Funding acquisition, Supervision. Zuoren Yang: Software, Writing – review & editing, Funding acquisition, Supervision.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was supported by the National Key R&D Program of China (No.2022YFF1001400), the National Natural Science Foundation of China (No. 32000458), the Tian-Shan Talent Program (2022TSYCCX0087), and the Key Research and Development Program of Xinjiang (2022B02052).

Footnotes

Appendix A

Supplementary data to this article can be found online at https://doi.org/10.1016/j.jare.2023.05.004.

Contributor Information

Fuguang Li, Email: aylifug@caas.cn.

Zuoren Yang, Email: yangzuoren@caas.cn.

Appendix A. Supplementary material

The following are the Supplementary data to this article:

Supplementary data 1
mmc1.zip (8.3MB, zip)

References

  • 1.Mergner J., Frejno M., List M., Papacek M., Chen X., Chaudhary A., et al. Mass-spectrometry-based draft of the Arabidopsis proteome. Nature. 2020;579(7799):409–414. doi: 10.1038/s41586-020-2094-2. [DOI] [PubMed] [Google Scholar]
  • 2.Wu H.-Y.-L., Song G., Walley J.W., Hsu P.Y. The tomato translational landscape revealed by transcriptome assembly and ribosome profiling. Plant Physiol. 2019;181(1):367–380. doi: 10.1104/pp.19.00541. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Bava F.-A., Eliscovich C., Ferreira P.G., Minana B., Ben-Dov C., Guigo R., et al. CPEB1 coordinates alternative 3′-UTR formation with translational regulation. Nature. 2013;495(7439):121–125. doi: 10.1038/nature11901. [DOI] [PubMed] [Google Scholar]
  • 4.Holt C.E., Schuman E.M. The central dogma decentralized: new perspectives on RNA function and local translation in neurons. Neuron. 2013;80(3):648–657. doi: 10.1016/j.neuron.2013.10.036. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Cenik C., Cenik E.S., Byeon G.W., Grubert F., Candille S.I., Spacek D., et al. Integrative analysis of RNA, translation, and protein levels reveals distinct regulatory variation across humans. Genome Res. 2015;25(11):1610–1621. doi: 10.1101/gr.193342.115. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Deschênes-Simard X., Lessard F., Gaumont-Leclerc M.-F., Bardeesy N., Ferbeyre G. Cellular senescence and protein degradation: breaking down cancer. Cell Cycle. 2014;13(12):1840–1858. doi: 10.4161/cc.29335. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Vogel C., Marcotte E.M. Insights into the regulation of protein abundance from proteomic and transcriptomic analyses. Nat Rev Genet. 2012;13(4):227–232. doi: 10.1038/nrg3185. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Khan Z., Ford M.J., Cusanovich D.A., Mitrano A., Pritchard J.K., Gilad Y. Primate transcript and protein expression levels evolve under compensatory selection pressures. Science. 2013;342(6162):1100–1104. doi: 10.1126/science.1242379. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Jiang L.-G., Li B., Liu S.-X., Wang H.-W., Li C.-P., Song S.-H., et al. Characterization of Proteome Variation During Modern Maize Breeding*[S] Mol Cell Proteomics. 2019;18(2):263–276. doi: 10.1074/mcp.RA118.001021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Ingolia N.T., Ghaemmaghami S., Newman J.R., Weissman J.S. Genome-wide analysis in vivo of translation with nucleotide resolution using ribosome profiling. Science. 2009;324(5924):218–223. doi: 10.1126/science.1168978. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Zhu W., Chen S., Zhang T., Qian J., Luo Z., Zhao H., et al. Dynamic patterns of the translatome in a hybrid triplet show translational fractionation of the maize subgenomes. Crop J. 2021 [Google Scholar]
  • 12.Andreev D.E., O'Connor P.B., Loughran G., Dmitriev S.E., Baranov P.V., Shatsky I.N. Insights into the mechanisms of eukaryotic translation gained with ribosome profiling. Nucleic Acids Res. 2017;45(2):513–526. doi: 10.1093/nar/gkw1190. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Calviello L., Ohler U. Beyond read-counts: Ribo-seq data analysis to understand the functions of the transcriptome. Trends Genet. 2017;33(10):728–744. doi: 10.1016/j.tig.2017.08.003. [DOI] [PubMed] [Google Scholar]
  • 14.Morris D.R., Geballe A.P. Upstream open reading frames as regulators of mRNA translation. Mol Cell Biol. 2000;20(23):8635–8642. doi: 10.1128/mcb.20.23.8635-8642.2000. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Andrews S.J., Rothnagel J.A. Emerging evidence for functional peptides encoded by short open reading frames. Nat Rev Genet. 2014;15(3):193–204. doi: 10.1038/nrg3520. [DOI] [PubMed] [Google Scholar]
  • 16.Hsu P.Y., Calviello L., Wu H.-Y.-L., Li F.-W., Rothfels C.J., Ohler U., et al. Super-resolution ribosome profiling reveals unannotated translation events in Arabidopsis. PNAS. 2016;113(45):E7126–E7135. doi: 10.1073/pnas.1614788113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.van Heesch S, Witte F, Schneider-Lunitz V, Schulz JF, Adami E, Faber AB, et al. The translational landscape of the human heart. Cell 2019; 178(1):242-260. e229. [DOI] [PubMed]
  • 18.Chew G.-L., Pauli A., Schier A.F. Conservation of uORF repressiveness and sequence features in mouse, human and zebrafish. Nat Commun. 2016;7(1):1–10. doi: 10.1038/ncomms11663. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Ruiz-Orera J., Albà M.M. Translation of small open reading frames: roles in regulation and evolutionary innovation. Trends Genet. 2019;35(3):186–198. doi: 10.1016/j.tig.2018.12.003. [DOI] [PubMed] [Google Scholar]
  • 20.Wang M., Yuan D., Tu L., Gao W., He Y., Hu H., et al. Long noncoding RNA s and their proposed functions in fibre development of cotton (Gossypium spp.) New Phytol. 2015;207(4):1181–1197. doi: 10.1111/nph.13429. [DOI] [PubMed] [Google Scholar]
  • 21.Cech T.R., Steitz J.A. The noncoding RNA revolution—trashing old rules to forge new ones. Cell. 2014;157(1):77–94. doi: 10.1016/j.cell.2014.03.008. [DOI] [PubMed] [Google Scholar]
  • 22.Ruiz-Orera J, Messeguer X, Subirana JA, Alba MM. Long non-coding RNAs as a source of new peptides. elife 2014; 3:e03523. [DOI] [PMC free article] [PubMed]
  • 23.Chotewutmontri P., Barkan A. Dynamics of chloroplast translation during chloroplast differentiation in maize. PLoS Genet. 2016;12(7):e1006106. doi: 10.1371/journal.pgen.1006106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Bazin J., Baerenfaller K., Gosai S.J., Gregory B.D., Crespi M., Bailey-Serres J. Global analysis of ribosome-associated noncoding RNAs unveils new modes of translational regulation. PNAS. 2017;114(46):E10018–E10027. doi: 10.1073/pnas.1708433114. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Shamimuzzaman M., Vodkin L. Ribosome profiling reveals changes in translational status of soybean transcripts during immature cotyledon development. PLoS One. 2018;13(3):e0194596. doi: 10.1371/journal.pone.0194596. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Chotewutmontri P, Barkan A. Ribosome profiling elucidates differential gene expression in bundle sheath and mesophyll cells in maize. bioRxiv 2021:2020.2012. 2015.422948. [DOI] [PMC free article] [PubMed]
  • 27.Ahmed N., Sormanni P., Ciryam P., Vendruscolo M., Dobson C.M., O’Brien E.P. Identifying A-and P-site locations on ribosome-protected mRNA fragments using Integer Programming. Sci Rep. 2019;9(1):1–14. doi: 10.1038/s41598-019-42348-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Yang Z., Wang J., Huang Y., Wang S., Wei L., Liu D., et al. CottonMD: a multi-omics database for cotton biological study. Nucleic Acids Res. 2022 doi: 10.1093/nar/gkac863. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Dai F., Chen J., Zhang Z., Liu F., Li J., Zhao T., et al. COTTONOMICS: a comprehensive cotton multi-omics database. Database. 2022;2022 doi: 10.1093/database/baac080. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Yang Z., Gao C., Zhang Y., Yan Q., Hu W., Yang L., et al. Recent progression and future perspectives in Cotton genomic breeding. J Integr Plant Biol. 2022 doi: 10.1111/jipb.13388. [DOI] [PubMed] [Google Scholar]
  • 31.Yang Z., Zhang C., Yang X., Liu K., Wu Z., Zhang X., et al. PAG1, a cotton brassinosteroid catabolism gene, modulates fiber elongation. New Phytol. 2014;203(2):437–448. doi: 10.1111/nph.12824. [DOI] [PubMed] [Google Scholar]
  • 32.Ahmar S., Gruszka D. In-silico study of brassinosteroid signaling genes in rice provides insight into mechanisms which regulate their expression. Front Genet. 2022;13 doi: 10.3389/fgene.2022.953458. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Ahmar S., Zolkiewicz K., Gruszka D. Analyses of genes encoding the Glycogen Synthase Kinases in rice and Arabidopsis reveal mechanisms which regulate their expression during development and responses to abiotic stresses. Plant Sci. 2023;111724 doi: 10.1016/j.plantsci.2023.111724. [DOI] [PubMed] [Google Scholar]
  • 34.Zhou Y., Zhang Z.T., Li M., Wei X.Z., Li X.J., Li B.Y., et al. Cotton (G ossypium hirsutum) 14-3-3 proteins participate in regulation of fibre initiation and elongation by modulating brassinosteroid signalling. Plant Biotechnol J. 2015;13(2):269–280. doi: 10.1111/pbi.12275. [DOI] [PubMed] [Google Scholar]
  • 35.Zhao B., Cao J.F., Hu G.J., Chen Z.W., Wang L.Y., Shangguan X.X., et al. Core cis-element variation confers subgenome-biased expression of a transcription factor that functions in cotton fiber elongation. New Phytol. 2018;218(3):1061–1075. doi: 10.1111/nph.15063. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Yang Z., Liu Z., Ge X., Lu L., Qin W., Qanmber G., et al. Brassinosteroids regulate cotton fiber elongation by modulating very-long-chain fatty acid biosynthesis. Plant Cell. 2023:koad060. doi: 10.1093/plcell/koad060. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Liu L., Chen G., Li S., Gu Y., Lu L., Qanmber G., et al. A brassinosteroid transcriptional regulatory network participates in regulating fiber elongation in cotton. Plant Physiol. 2023;191(3):1985–2000. doi: 10.1093/plphys/kiac590. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Zhu L., Wang H., Zhu J., Wang X., Jiang B., Hou L., et al. A conserved brassinosteroid-mediated BES1-CERP-EXPA3 signaling cascade controls plant cell elongation. Cell Rep. 2023;42(4) doi: 10.1016/j.celrep.2023.112301. [DOI] [PubMed] [Google Scholar]
  • 39.Yang Z., Ge X., Yang Z., Qin W., Sun G., Wang Z., et al. Extensive intraspecific gene order and gene structural variations in upland cotton cultivars. Nat Commun. 2019;10(1):1–13. doi: 10.1038/s41467-019-10820-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Langmead B., Salzberg S.L. Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012;9(4):357–359. doi: 10.1038/nmeth.1923. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Pertea M., Kim D., Pertea G.M., Leek J.T., Salzberg S.L. Transcript-level expression analysis of RNA-seq experiments with HISAT. StringTie and Ballgown Nat Protoc. 2016;11(9):1650–1667. doi: 10.1038/nprot.2016.095. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Pertea M., Pertea G.M., Antonescu C.M., Chang T.-C., Mendell J.T., Salzberg S.L. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat Biotechnol. 2015;33(3):290–295. doi: 10.1038/nbt.3122. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Calviello L., Mukherjee N., Wyler E., Zauber H., Hirsekorn A., Selbach M., et al. Detecting actively translated open reading frames in ribosome profiling data. Nat Methods. 2016;13(2):165–170. doi: 10.1038/nmeth.3688. [DOI] [PubMed] [Google Scholar]
  • 44.Pertea G., Pertea M. GFF utilities: GffRead and GffCompare. F1000Research. 2020;9:304. doi: 10.12688/f1000research.23297.1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Love M.I., Huber W., Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15(12):1–21. doi: 10.1186/s13059-014-0550-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Wei T., Simko V., Levy M., Xie Y., Jin Y., Zemla J., et al. Visualization of a correlation matrix. 2013;230(11) R package version 073. [Google Scholar]
  • 47.Wu T., Hu E., Xu S., Chen M., Guo P., Dai Z., et al. clusterProfiler 4.0: A universal enrichment tool for interpreting omics data. The. Innovation. 2021;2(3) doi: 10.1016/j.xinn.2021.100141. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Aoki Y., Okamura Y., Tadaka S., Kinoshita K., Obayashi T. ATTED-II in 2016: a plant coexpression database towards lineage-specific coexpression. Plant Cell Physiol. 2016;57(1):e5–e. doi: 10.1093/pcp/pcv165. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.You Q, Xu W, Zhang K, Zhang L, Yi X, Yao D, et al. ccNET: Database of co-expression networks with functional modules for diploid and polyploid Gossypium. Nucleic Acids Res 2017; 45(D1):D1090-D1099. [DOI] [PMC free article] [PubMed]
  • 50.Ge X., Xu J., Yang Z., Yang X., Wang Y., Chen Y., et al. Efficient genotype-independent cotton genetic transformation and genome editing. J Integr Plant Biol. 2022 doi: 10.1111/jipb.13427. [DOI] [PubMed] [Google Scholar]
  • 51.Livak K.J., Schmittgen T.D. Analysis of relative gene expression data using real-time quantitative PCR and the 2(-Delta Delta C(T)) Method. Methods. 2001;25(4):402–408. doi: 10.1006/meth.2001.1262. [DOI] [PubMed] [Google Scholar]
  • 52.Brar G.A., Weissman J.S. Ribosome profiling reveals the what, when, where and how of protein synthesis. Nat Rev Mol. 2015;16(11):651–664. doi: 10.1038/nrm4069. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Langdon W.B. Performance of genetic programming optimised Bowtie2 on genome comparison and analytic testing (GCAT) benchmarks. BioData mining. 2015;8(1):1–7. doi: 10.1186/s13040-014-0034-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Bazzini A.A., Johnstone T.G., Christiano R., Mackowiak S.D., Obermayer B., Fleming E.S., et al. Identification of small ORF s in vertebrates using ribosome footprinting and evolutionary conservation. EMBO J. 2014;33(9):981–993. doi: 10.1002/embj.201488411. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Li L., Eichten S.R., Shimizu R., Petsch K., Yeh C.-T., Wu W., et al. Genome-wide discovery and characterization of maize long non-coding RNAs. Genome Biol. 2014;15(2):1–15. doi: 10.1186/gb-2014-15-2-r40. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Liu J., Jung C., Xu J., Wang H., Deng S., Bernad L., et al. Genome-wide analysis uncovers regulation of long intergenic noncoding RNAs in Arabidopsis. Plant Cell. 2012;24(11):4333–4345. doi: 10.1105/tpc.112.102855. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Michel A.M., Choudhury K.R., Firth A.E., Ingolia N.T., Atkins J.F., Baranov P.V. Observation of dually decoded regions of the human genome using ribosome profiling data. Genome Res. 2012;22(11):2219–2229. doi: 10.1101/gr.133249.111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Liu M.-J., Wu S.-H., Wu J.-F., Lin W.-D., Wu Y.-C., Tsai T.-Y., et al. Translational landscape of photomorphogenic arabidopsis. Plant Cell. 2013;25(10):3699–3710. doi: 10.1105/tpc.113.114769. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Kastenmayer J.P., Ni L., Chu A., Kitchen L.E., Au W.-C., Yang H., et al. Functional genomics of genes with small open reading frames (sORFs) in S. cerevisiae. Genome Res. 2006;16(3):365–373. doi: 10.1101/gr.4355406. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Kondo T., Plaza S., Zanet J., Benrabah E., Valenti P., Hashimoto Y., et al. Small peptides switch the transcriptional activity of Shavenbaby during Drosophila embryogenesis. Science. 2010;329(5989):336–339. doi: 10.1126/science.1188158. [DOI] [PubMed] [Google Scholar]
  • 61.Hanada K., Higuchi-Takeuchi M., Okamoto M., Yoshizumi T., Shimizu M., Nakaminami K., et al. Small open reading frames associated with morphogenesis are hidden in plant genomes. PNAS. 2013;110(6):2395–2400. doi: 10.1073/pnas.1213958110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Shan C.-M., Shangguan X.-X., Zhao B., Zhang X.-F., Chao L.-m., Yang C.-Q., et al. Control of cotton fibre elongation by a homeodomain transcription factor GhHOX3. Nat Commun. 2014;5(1):1–9. doi: 10.1038/ncomms6519. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Meydan S., Marks J., Klepacki D., Sharma V., Baranov P.V., Firth A.E., et al. Retapamulin-assisted ribosome profiling reveals the alternative bacterial proteome. Mol Cell. 2019;74(3):481–493. doi: 10.1016/j.molcel.2019.02.017. e486. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Lee S, Liu B, Lee S, Huang S-X, Shen B, Qian S-B. Global mapping of translation initiation sites in mammalian cells at single-nucleotide resolution. Proc Natl Acad Sci U S A 2012; 109(37):E2424-E2432. [DOI] [PMC free article] [PubMed]
  • 65.Qanmber G., Lu L., Liu Z., Yu D., Zhou K., Huo P., et al. Genome-wide identification of GhAAI genes reveals that GhAAI66 triggers a phase transition to induce early flowering. J Exp Bot. 2019;70(18):4721–4736. doi: 10.1093/jxb/erz239. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Yang Z., Qanmber G., Wang Z., Yang Z., Li F. Gossypium genomics: trends, scope, and utilization for cotton improvement. Trends Plant Sci. 2020;25(5):488–500. doi: 10.1016/j.tplants.2019.12.011. [DOI] [PubMed] [Google Scholar]
  • 67.Rinn J.L., Chang H.Y. Genome regulation by long noncoding RNAs. Annu Rev Biochem. 2012;81:145–166. doi: 10.1146/annurev-biochem-051410-092902. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Cabili M.N., Trapnell C., Goff L., Koziol M., Tazon-Vega B., Regev A., et al. Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. Genes Dev. 2011;25(18):1915–1927. doi: 10.1101/gad.17446611. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Blanvillain R., Young B., Ym C., Hecht V., Varoquaux F., Delorme V., et al. The Arabidopsis peptide kiss of death is an inducer of programmed cell death. EMBO J. 2011;30(6):1173–1183. doi: 10.1038/emboj.2011.14. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Ikeuchi M., Yamaguchi T., Kazama T., Ito T., Horiguchi G., Tsukaya H. ROTUNDIFOLIA4 regulates cell proliferation along the body axis in Arabidopsis shoot. Plant Cell Physiol. 2011;52(1):59–69. doi: 10.1093/pcp/pcq138. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Valdivia E.R., Chevalier D., Sampedro J., Taylor I., Niederhuth C.E., Walker J.C. DVL genes play a role in the coordination of socket cell recruitment and differentiation. J Exp Bot. 2012;63(3):1405–1412. doi: 10.1093/jxb/err378. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.De Coninck B., Carron D., Tavormina P., Willem L., Craik D.J., Vos C., et al. Mining the genome of Arabidopsis thaliana as a basis for the identification of novel bioactive peptides involved in oxidative stress tolerance. J Exp Bot. 2013;64(17):5297–5307. doi: 10.1093/jxb/ert295. [DOI] [PubMed] [Google Scholar]
  • 73.Johnstone T.G., Bazzini A.A., Giraldez A.J. Upstream ORF s are prevalent translational repressors in vertebrates. EMBO J. 2016;35(7):706–723. doi: 10.15252/embj.201592759. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Sagor G., Berberich T., Tanaka S., Nishiyama M., Kanayama Y., Kojima S., et al. A novel strategy to produce sweeter tomato fruits with high sugar contents by fruit-specific expression of a single bZIP transcription factor gene. Plant Biotechnol J. 2016;14(4):1116–1126. doi: 10.1111/pbi.12480. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Xu G., Yuan M., Ai C., Liu L., Zhuang E., Karapetyan S., et al. uORF-mediated translation allows engineered plant disease resistance without fitness costs. Nature. 2017;545(7655):491–494. doi: 10.1038/nature22372. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Zhang H., Si X., Ji X., Fan R., Liu J., Chen K., et al. Genome editing of upstream open reading frames enables translational control in plants. Nat Biotechnol. 2018;36(9):894–898. doi: 10.1038/nbt.4202. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary data 1
mmc1.zip (8.3MB, zip)

Data Availability Statement

All the data generated during this study and identified novel transcripts including ORFs, sORFs (uORFs, and dORFs), and lncRNAs/ncORFs are provided in supplementary data. This data is available on the public database GRAND (Gossypium Resource and Network Database) at https://grand.cricaas.com.cn/page/download/download.


Articles from Journal of Advanced Research are provided here courtesy of Elsevier

RESOURCES