Skip to main content
Plant Physiology logoLink to Plant Physiology
. 2020 Nov 17;185(1):179–195. doi: 10.1093/plphys/kiaa003

Full-length annotation with multistrategy RNA-seq uncovers transcriptional regulation of lncRNAs in cotton

Xiaomin Zheng 1,#, Yanjun Chen 1,#, Yifan Zhou 1, Keke Shi 1, Xiao Hu 1, Danyang Li 1, Hanzhe Ye 1, Yu Zhou 1,2, Kun Wang 1,#,3,
PMCID: PMC8133545  PMID: 33631798

Abstract

Long noncoding RNAs (lncRNAs) are crucial factors during plant development and environmental responses. To build an accurate atlas of lncRNAs in the diploid cotton Gossypium arboreum, we combined Isoform-sequencing, strand-specific RNA-seq (ssRNA-seq), and cap analysis gene expression (CAGE-seq) with PolyA-seq and compiled a pipeline named plant full-length lncRNA to integrate multi-strategy RNA-seq data. In total, 9,240 lncRNAs from 21 tissue samples were identified. 4,405 and 4,805 lncRNA transcripts were supported by CAGE-seq and PolyA-seq, respectively, among which 6.7% and 7.2% had multiple transcription start sites (TSSs) and transcription termination sites (TTSs). We revealed that alternative usage of TSS and TTS of lncRNAs occurs pervasively during plant growth. Besides, we uncovered that many lncRNAs act in cis to regulate adjacent protein-coding genes (PCGs). It was especially interesting to observe 64 cases wherein the lncRNAs were involved in the TSS alternative usage of PCGs. We identified lncRNAs that are coexpressed with ovule- and fiber development–associated PCGs, or linked to GWAS single-nucleotide polymorphisms. We mapped the genome-wide binding sites of two lncRNAs with chromatin isolation by RNA purification sequencing. We also validated the transcriptional regulatory role of lnc-Ga13g0352 via virus-induced gene suppression assay, indicating that this lncRNA might act as a dual-functional regulator that either activates or inhibits the transcription of target genes.


An accurate annotation reveals the genomic features and transcriptional regulation functions of long noncoding RNAs in cotton.

Introduction

Long noncoding RNAs (LncRNAs) have been proven to play an essential role in gene transcriptional regulation. In plants, AUXIN-REGULATED PROMOTER LOOP RNA (APOLO) acts as a scaffold to control chromatin looping and DNA methylation (Ariel et al., 2014); the antisense lncRNA Cold-Induced Long Antisense Intragenic RNA (COOLAIR) mediates chromatin switching at FLOWERING LOCUS C (FLC) during vernalization (Csorba et al., 2014); INDUCED BY PHOSPHATE STARVATION1/2 (IPS1/2) sponges miR399 to regulate its target, PHOSPHATE2 (PHO2), and maintain correct Pi homeostasis (Franco-Zorrilla et al., 2007); Long-day-specific Male-fertility-Associated RNA (LDMAR) is required for normal pollen development by causing promoter methylation under long-day conditions (Ding et al., 2012); alternative splicing competitor long noncoding RNA (ASCO-lncRNA) can hijack nuclear alternative splicing (AS) regulators to modulate AS patterns during development (Bardou et al., 2014); and over-expression of LRK Antisense Intergenic RNA (LAIR) might regulate the expression of neighboring gene clusters to increase grain yield in Oryza sativa (Wang et al., 2018b).

However, the gene structures, biogenesis, expression, and molecular functions of plant lncRNAs remain poorly understood (St. Laurent et al., 2015). Currently, the saturated annotation resources for lncRNAs are limited to few plants, such as Arabidopsis thaliana (Liu et al., 2012; Blein et al., 2020), Zea mays (Li et al., 2014b), O. sativa (Zhang et al., 2014; Zheng et al., 2019), and cotton (Gossypium hirsutum; Hu et al., 2018; Zhao et al., 2018a). Although more than 12,000 lncRNAs have been deposited in the GreeNC database (Paytuví Gallart et al., 2016), most lncRNAs in the database were identified only by using Illumina next generation sequencing (NGS) RNA-seq, in which the annotations are obtained from short reads assembly, thus producing incomplete and inaccurate annotations on lncRNA boundaries, namely junction sites and 5ʹ-/3ʹ-ends (Boley et al., 2014; Uszczynska-Ratajczak et al., 2018).

With the improvement in high-throughput sequencing technology and associated algorithms, revolutionary methods for lncRNAs identification have been developed. The Cap analysis gene expression (CAGE-seq) and PolyA-seq capture sequencing short reads that originate from the 5ʹ cap and 3ʹ polyA tail, offering precise information of transcription start site (TSSs) and transcription termination site (TTSs) for RNA transcripts. PacBio long-read sequencing directly collects full-length transcripts to synchronously produce splicing, TSS, and TTS information. These methods have already been combined with RNA-seq for RNA annotation in humans and other animals (Boley et al., 2014; Hon et al., 2017), which has greatly improved the annotation quality of lncRNAs in these species.

Cotton is an important cash crop worldwide. In addition to producing animal feed and cooking oil, cotton seeds are known to be the source of pure cellulose fiber. The fiber cell development of cotton is an excellent model for studying cell elongation and cell wall biosynthesis (Wang et al., 2016). To date, by using NGS RNA-seq, lncRNA identification has been performed in several cotton species (Wang et al., 2015a; Hu et al., 2018; Zhang et al., 2018; Zhao et al., 2018a).

In previous studies, we have sequenced and assembled the genome and performed transcriptomic annotation for the cultivated diploid cotton G. arboreum (Li et al., 2014a; Du et al., 2018; Wang et al., 2019). In order to obtain systematic and accurate information for lncRNAs in G. arboreum, we combined multiple RNA-seq methods, including Isoform-sequencing (Iso-seq), strand-specific RNA sequencing (ssRNA-seq), CAGE-seq, and PolyA-seq to investigate lncRNA expression in cotton (G. arboreum). We compiled a pipeline, a plant full-length lncRNA (PULL) pipeline, to integrate these data and finally identified 9,240 lncRNAs across 21 tissue samples. This study represents a fundamental resource to enrich the knowledge base on lncRNAs, allowing the cotton community to utilize these results in future functional studies.

Results

Cotton lncRNA transcriptome with accurate TSSs and TTSs

To obtain comprehensive annotation for lncRNAs in the G. arboreum genome (Du et al., 2018), we analyzed four types of sequencing data across 16 tissues and 5 stress-treated samples obtained from Iso-seq, ssRNA-seq (ribosome depletion-based sequence library), CAGE-seq, and PolyA-seq (Figure 1A). We compiled the PULL pipeline to systematically integrate multi-strategy RNA-seq data and aimed to identify accurate gene structures, including TSSs, TTSs, and junction sites for lncRNAs (Figure 1B;Supplemental Figure S1). Transcriptome assembled by ssRNA-seq and Iso-seq was corrected by CAGE-seq/PolyA-seq to obtain accurate TSS/TTS. The whole set of isoforms was then filtered to obtain the lncRNAs through removing protein-coding genes (PCGs), known ncRNAs in Rfam (tRNAs, rRNAs, snoRNAs, and snRNAs; Kalvari et al., 2018), lncRNAs with high coding potential scores, and possible false-positive long noncoding natural antisense transcripts (lnc-NATs; Supplemental Figure S1; details in Methods).

Figure 1.

Figure 1

Full-length annotation of lncRNAs with multi-strategy RNA-seq data in cotton (G. arboreum). A, The four RNA-seq technologies applied in this study. B, The overview of plant full-length lncRNA (PULL) pipeline. The details of procedures 1–3 are shown in Supplemental Figure 1. C, Evaluation of CPC and CNCI for lncRNAs and PCGs. Low score indicates weak capability of protein encoding. D, Numbers of lncRNAs with 5ʹ- and/or 3ʹ-end signals. E, Schematic illustration of four types of lncRNAs (left) and their proportions (right). F, The proportion of transcripts with a 5ʹ cap and 3ʹ polyA tail in lncRNAs and PCGs.

We ultimately obtained 9,240 isoforms from 8,113 lncRNAs. The canonical features with Coding Potential Calculator (CPC) score <0.5 and Coding-Non-Coding Index (CNCI) score <0 are evident in the lncRNAs (Figure 1C). TSSs from 4,405 lncRNA transcripts and TTSs from 4,805 lncRNA transcripts were supported by CAGE-seq and PolyA-seq. 2,858 lncRNA transcripts had both 5ʹ- and 3ʹ-end signals from CAGE-seq and PolyA-seq (Figure 1D;Supplemental Table S1). According to their genomic locations, the lncRNAs could be divided into four groups: 3,345 long intergenic noncoding RNAs (lincRNAs) that were located in the intergenic region (>2 kb from PCGs); 3,787 lnc-NATs that were partially or completely complementary to PCGs; 1,860 long genic noncoding RNAs (lnc-Genics) (< 2 Kb from PCGs); and 248 long intron noncoding RNAs (lnc-Intronics), which were located in the introns of PCGs (Figure 1E;Supplemental Table S1). Over 70% of lnc-Genic and lincRNA have CAGE-seq or PolyA-seq signals or both, whereas lnc-NAT and lnc-Intronic are slightly fewer (>60%; Figure 1F).

To assess the accuracy of 5ʹ- and 3ʹ-end signals of lncRNAs from PULL on the basis of CAGE-seq and PolyA-seq, 20 lncRNAs for each of the 5ʹ- and 3ʹ-end signals of lncRNAs were randomly selected to perform validation by employing 5ʹ and 3ʹ rapid amplification of cDNA ends (RACE). The Sanger sequencing results clearly showed that the 5ʹ- and 3ʹ-ends predicted by PULL were of extremely high accuracy, with a positive ratio of 95% (Supplemental Figures S2 and S3). We found that numerous TSSs and TTSs predicted by Stringtie had significant deviations from those predicted by the PULL pipeline (Supplemental Figures S2 and S3), because the Stringtie can only use ssRNA-seq data to assemble transcripts.

These above data reflect that at least half of the lncRNAs in G. arboreum are similar to PCGs in the transcriptional processing of 5ʹ- or 3ʹ-ends. In summary, we constructed a landscape of lncRNAs in cotton G. arboreum and identified their accurate transcription regions in the genome.

Structural and Genomic Features of lncRNAs in G. arboreum

In order to investigate the features of lncRNAs in G. arboreum, we performed a systematic comparative analysis on the exons, transcripts, isoforms, GC contents, and expression levels for lncRNAs and PCGs. We found that cotton lncRNAs contain fewer exons than PCGs, and about 55% of lncRNAs contain only one exon (Supplemental Figure S4A). Compared with PCGs, lncRNAs have relatively longer exons, shorter transcripts, fewer isoforms, and lower expression (Supplemental Figure S4C, S4E, S4G, and S4J; Supplemental Tables S2and S3). In addition, gene bodies of the lncRNAs have lower GC contents than PCGs (Supplemental Figure S4I). Then, we compared these features among the four types of lncRNAs. The results showed that lnc-NATs contain shorter, more abundant exons, longer transcripts, lower expression levels, and higher GC content (Supplemental Figure S4) than other types of lncRNAs. These features of lncRNAs in G. arboreum are parallel with those of other plant species (Liu et al., 2012; Li et al., 2014b; Zhang et al., 2014; Wang et al., 2015a). It shows again that lncRNAs in plants share similar features in terms of gene structure and expression, although lncRNAs are modestly conserved in sequence across plant species (Deng et al., 2018).

A previous study showed that TE is highly enriched in the G. arboreum genome, reaching up to 68% (Wang et al., 2016). Herein, we found that 48% of lncRNAs contain TE-derived sequences (overlap with TEs by at least 10 nt). LncRNAs with more than half of TE sequences account for almost a quarter of all lncRNAs (2,125 out of 9,240, 23%; Supplemental Figure S5A), which resembles the phenomenon in the plant genomes with high-content TE, such as G. hirsutum, G. raimondii, Z. mays, and O. sativa. This is in contrast to that in Arabidopsis, which has a compact genome with low TE content (Supplemental Figure S5A). In G. arboreum genome, the lincRNAs have the highest ratio of overlapping with TE (62%), followed by lnc-Intronics, lnc-Genics, and lnc-NATs (Supplemental Figure S5B). Besides, by comparing TE composition in lncRNAs and PCGs, we found that the gene body regions of PCGs, including 5ʹ UTRs, 3ʹ UTRs, exonic-CDS, and introns contain more DNA transposons (Class II TEs, such as Helitron and TIR) than the whole genome. Whereas the lincRNAs (lncRNAs from intergenic regions) have the lowest composition of DNA transposons, they possess the highest composition of Gypsy, a kind of retrotransposon (Class I TEs; Supplemental Figure S5C).

We further investigated the flanking sequence of the transcription region of lincRNAs. Compared with PCGs, the nearby regions in lincRNAs also have higher sequence compositions for Gypsy (Supplemental Figure S5D). The closer physical distance indicates that Gypsy may be more likely to affect or regulate lincRNA transcription. In line with our expectation, the higher expression correlation of lincRNA to its nearest Gypsy as compared with that of PCG to its nearest Gypsy was observed (Supplemental Figure S5E). These results indicate that TEs might play roles in sequence composition of lncRNAs and suggest a possible hypothesis that TEs are involved in the regulation of expression of lncRNAs.

Cis regulation of lncRNAs on expression of adjacent PCGs

Previous studies reported that lncRNAs tend to exhibit spatiotemporal expression specificity across tissues in plants (Golicz et al., 2018). To investigate the specificity of lncRNAs expression in G. arboreum, we first evaluated the reliability of the ssRNA-seq data. In the tissues tested, most of the RT-qPCR results for the 30 lncRNAs exhibited high-level consistency with that from the ssRNA-seq data (Supplemental Figure S6 and Table S4). Thereafter, we further characterized the expression of the lncRNAs based on the ssRNA-seq data. Figure 2A shows that the four types of lncRNAs have obvious tissue specificity, with the degree of specificity ranking from high to low as follows: lnc-NAT > lincRNA/lnc-Intronic/lnc-Genic > PCG.

Figure 2.

Figure 2

LncRNA regulates expression of adjacent PCG in cis. A, Violin plots showing tissue-specificity scores (the maximum value “1” indicates a perfect tissue-specific pattern) of lncRNAs and PCGs. The scores were calculated by Jensen–Shannon divergence with ssRNA-seq data across 16 tissues. One-way ANOVA analysis and Tukey honest significant difference post-hoc test were used. B, Expression correlation between lncRNAs and their closest PCGs. The significance levels are indicated by asterisks (Mann–Whitney test, ***P value < 0.001). C, The number of coexpressions of lncRNA-PCG pairs. The percentages of pairs with correlation coefficients > 0.5 (red) are shown. D, Heat maps showing the concordant expression of 2,010 lnc-NATs (top) and their complementary PCGs (bottom) across 21 samples. The corresponding number IDs of tissues are shown in Supplemental Table S5. The rows in top panel are arranged based on k-means clustering of lnc-NATs, whereas the rows in bottom panel for complementary PCGs are in the same order. Their expression values (FPKM) are scaled by z-score in row direction. E, An example of an lnc-Genic coexpressing with its closest PCG. S and A represent the sense and antisense strands, respectively. RT-semi-qPCR validation assays are shown below. Red stars indicate the four tissues with RNA-seq data shown above. F, An lncRNA-related TSS switching of a PCG under UV stress. TSS-1 and TSS-2 represent the two switching TSSs. S and A represent the sense and antisense strands, respectively. G, RT-qPCR validation of upregulated expression of lncRNA and PCG under UV treatment. The significance levels are indicated by asterisks (two-tailed t test, error bars represent SD of mean (n = 3), ***P value < 0.001).

Emerging evidence supports that some lncRNA loci act locally (in cis) to regulate the expression of nearby PCGs (Engreitz et al., 2016). To investigate cis regulation of lncRNAs in G. arboreum, we examined the expression correlation between lncRNAs and their nearby PCGs across tissues. The results showed that serial and divergent lnc-Genic, lnc-NAT, and lnc-Intronic types exhibited significant expression correlation with their surrounding PCGs (Figure 2B), whereas the exception is the convergent-type of lnc-Genic which forms a tail-tail type with PCGs.

Notably, expression correlation between lnc-NATs and complementary PCGs was the highest. Over half (53%) had Cor ≥ 0.5 (Figure 2C), suggesting that lnc-NATs have a strong tendency to express synchronously with their complementary PCGs. The concordant expression patterns of 2,010 lnc-NATs and their complementary PCGs (Cor ≥ 0.5) across 21 tissues are shown in Figure 2D and Supplemental Table S5. Supplemental Figure S7 shows a representative example of lnc-NAT in which ssRNA-seq, CAGE-seq, PolyA-seq, and RT-PCR consistently supported the coexpression relationships across tissues.

In addition, several serial and divergent types of lnc-Genic identified in this study also showed a high-expression correlation with adjacent PCGs (36% and 33% of their Cor ≥ 0.50; Figure 2C). Considering their genomic location, the two types of lncRNAs might resemble the promoter-associated lncRNA upstream antisense RNAs, promoter upstream transcripts (PROMPTs), and enhancer lncRNAs reported in mammals (Wu et al., 2017). Figure 2E shows a divergent type of lnc-Genic, lnc-Ga08g0365, in which expression is highly synchronized with adjacent coding genes across tissues.

Intriguingly, we found a phenomenon of gene transcriptional regulation which might be mediated by lncRNAs (Figure 2F). Under UV treatment, a PPR protein-encoding gene, Ga01g02119, switched the distal TSS (TSS-1) to proximal TSS (TSS-2), accompanied by transcriptional initiation of an lncRNA lnc-Ga01g0469 on the antisense strand of the TSS switch region. RT-qPCR analysis supported the UV-induced expression of lnc-Ga01g0469 and PPR (Ga01g02119;Figure 2G). Totally, 64 such examples were found in the genome (Supplemental Table S6). It will be extremely interesting to check whether lncRNAs could cause TSS switches of PCGs.

These results indicate that lncRNAs might have diverse correlations with adjacent PCGs, including the possibility of affecting the expression intensity and tissue specificity, as well as the promoter selection of PCGs.

TSS and TTS switches of lncRNAs

RNA transcription in eukaryotes relies on three evolutionarily conserved RNA polymerases, Pol I, -II, and -III. The studies in animals have shown that most lncRNAs are derived from transcription by polymerase II (Pol II), carrying a 5ʹ-cap and 3ʹ-polyA tail (Wu et al., 2017). However, current studies in plants still lack enough evidence. The accurate definition of 5ʹ- and 3ʹ-ends for lncRNA transcripts in cotton gives us the clue to address the issue. First, we performed motif discovery for lncRNA TSSs and TTSs using Homer software (Heinz et al., 2010). As shown in Supplemental Figure S8, the TATA and Y (pyrimidine) patch of promoters (Yamamoto et al., 2009) and the PAS and U-rich of terminators (Wang et al., 2018a) for canonical Pol-II-mediated transcription have been identified around TSSs/TTSs of lncRNAs. However, the canonical Pol-I and Pol-III elements were not detected (Supplemental Figure S9; Heix and Grummt, 1995; Schramm and Hernandez, 2002).

Next, we surveyed the nucleotide composition of ±50 nts of TSSs and TTSs of lncRNAs and found that nucleotide composition around TSSs and TTSs has similar bias with that of PCGs (Figure 3A and C; Supplemental Figure S10), resembling TSSs of PCGs in Arabidopsis (Tokizawa et al., 2017) and TTSs of PCGs in O. sativa (Fu et al., 2016). We also analyzed the Pol II chromatin immunoprecipitation sequencing (ChIP-seq) data of leaves according to previously published methods (Zhao et al., 2018a). Around 3,390 high-confidence binding peaks were identified in the G. arboreum genome. The similar ratio (13% vs 10%) mapping to the expressed lncRNAs and PCGs in leaves (FPKM ≥ 1). These results also support that most of lncRNAs in cotton are transcribed by Pol II.

Figure 3.

Figure 3

Alternative usage of TSSs and TTSs of lncRNAs. A, Nucleotide composition at ±50 nt around TSSs in lncRNAs. B, Numbers of multi-TSSs in lncRNAs and PCGs; Pie charts indicate the percentage of TSS switching across tissues for lncRNAs and PCGs. C, Nucleotide composition at ±50 nt around TTSs in lncRNAs. D, Numbers of multi-TTSs in lncRNAs and PCGs; pie charts indicate the percentage of TTS switching across tissues for lncRNAs and PCGs. E, A DNA motif was identified in promoter regions of lncRNAs with a 5ʹ cap or 3ʹ polyA. Other lncRNA represents lncRNAs without a 5ʹ cap and without a 3ʹ polyA. F, An example of an lncRNA with alternative TTS usage across tissues. PolyA-seq and ssRNA-seq signals (top left), RNA secondary structure (top right), and validation with 3ʹ-RACE assays (bottom) are shown. TTS-1 and TTS-2 represent the two switching TTSs. The blue triangle indicates the position of RT primer for the 3ʹ RACE. Red stars indicate the four tissues with RNA-seq data shown above. G, The lncRNAs with alternative TSS and TTS usage across tissues. The CAGE-seq, PolyA-seq, and ssRNA-seq signals (top right); RNA secondary structure (left); and validation with 5ʹ- and 3ʹ-RACE assays (bottom right) are shown. TSS-1 and TSS-2, and TTS-1 and TTS-2 represent the two switching TSSs and TTSs, respectively. The red and blue triangles indicate the positions of RT primers for the 5ʹ- and 3ʹ-RACE, respectively. Red stars indicate the five tissues with RNA-seq data shown above.

The above results support that most of lncRNAs in cotton are transcribed by Pol II and processed with 5ʹ m7G caps and 3ʹ polyA tails. Notably, an unknown 4-nt motif, GTAG, was found to be significantly enriched around 75 nt upstream of the TSSs (Figure 3E) in some lncRNAs with 5ʹ caps or 3ʹ polyA tails. The function of this motif for lncRNA transcription deserves in-depth study. However, for lncRNAs without the 5ʹ- and 3ʹ-end signals, these features of Pol II, Pol I, and Pol III transcription were not detected (Supplemental Figure S9), indicating that transcription of those lncRNAs (∼25%) in cotton might depend on other RNA polymerases.

To date, genome-wide investigation of TSS/TTS selection regulation for lncRNA is rare. Herein, we performed a pilot investigation in G. arboreum. Using the CAGEr program and requiring TPM >0.5 in every tissue, we determined 3,857 TSSs and 4,279 TTSs corresponding to 3,709 and 4,067 lncRNA loci, respectively. There were on average 1.04 TSSs and 1.05 TTSs per lncRNA, which was lower than PCGs. 6.7% and 7.2% of lncRNAs were multi-promoter and multi-terminator (≥2 TSSs/TTSs; Supplemental Figure S10C and D) RNAs. Next, we performed comparative analysis on TSS/TTS usage dynamics across 21 tissues and identified 40 and 77 switching events for TSSs and TTSs of lncRNAs between any two tissue samples. This accounted for 16.2% and 26.7% of multi-promoter and multi-terminator lncRNAs, respectively (Figure 3B and D).

Although alternative use of TSS/TTS was lower in number and frequency in lncRNAs than that of PCGs (Figure 3B and D), we suggest that it still deserves attention. For lnc-Ga12g0506, differential usage of TTS sites across tissues would cause significant 3ʹ-end length and structure changes (Figure 3F). The lnc-Ga04g0334 has two TSSs and two TTSs during ovule development. With the development from young ovules to matured seeds, switching of both the TSS and TTS of the lncRNA occurs (Figure 3G). In the seeds, a longer transcript completely changes its original 5ʹ terminal secondary structure. Other examples of lncRNAs with TSS/TTS switches are shown in Supplemental Figure S10E and F. Because the function of lncRNAs includes regulating chromatin topology and scaffolding protein or RNAs (Ransohoff et al., 2017), changes in length produced by TSSs/TTSs switch will inevitably alter the structure of lncRNA, thereby possibly change its function.

Genetic variations and GWAS sites associated with lncRNAs of G. arboreum

As PCGs are subject to strong selection pressure to conserve protein sequences during evolution, their coding sequences maintain a lower mutation frequency. To investigate the genetic selection pressure in lncRNAs, statistical analysis for natural mutation frequency distribution of lncRNA locus was performed. The 17,883,108 high-quality single-nucleotide polymorphisms (SNPs) were derived from natural mutations in 230 G. arboreum lines (Du et al., 2018). As expected, the mutation frequency of lncRNAs was higher than that of PCGs overall. However, similar to PCGs, mutation frequency in lncRNA gene bodies (2.85 SNP/kb) was significantly lower than its flanking sequence, and the mutation frequency of TSS upstream was significantly lower than TTS downstream (3.19 vs 3.91 SNP/kb). Exons and introns, and their partial flanking sequences also revealed a similar trend as PCGs in mutation frequency curves (Figure 4A). An intriguing phenomenon worth noting is that the mutation frequency in lncRNA gene bodies is the same as the TSS upstream region of PCGs (Figure 4A). To rule out the possible influence of sequence overlap between PCGs and lncRNAs, we only used the lincRNAs to perform the analysis, and the same trend was observed (Supplemental Figure S11).

Figure 4.

Figure 4

Genetic variations and linkage to GWAS of lncRNAs in ovule development. A, SNP frequency distribution on the gene body (left), exons (middle), and introns (right) of lncRNAs (red lines) and PCGs (blue lines). B, The lncRNAs linked to GWAS traits. The heatmap (left) shows tissue expression of the lncRNAs. C, The linkage disequilibrium (LD) plot of lnc-Ga08g0021 locus. A magnified section of the LD plot and the sequence of two haplotypes are shown at the bottom. One SNP in the promoter and three SNPs in the gene body of lnc-Ga08g0021 are highlighted. The black triangle indicates the target site of gra-miR8739. D, A bubble chart indicating the seed phenotype of the two haplotypes. The P value is marked above (Chi-squared test). E, The RNA secondary structures of two haplotypes. The minimum free energy is given below.

According to previous studies (Mercer et al., 2009), lncRNAs can recruit chromatin remodeling or mediate DNA and histone modification to achieve effects on PCG promoters or enhancers. LncRNAs and their target DNA elements in promoters are directly or indirectly combined together to form transcriptional regulatory machinery in the regulatory region of PCGs. Therefore, the sequence mutation from the lncRNA or promoters of PCGs would affect their downstream gene expression, suggesting that they are under the same genetic selection pressure, and experience comparable mutation frequencies.

In addition, GWAS sites associated with cotton agronomic traits have been identified previously (Fang et al., 2017; Wang et al., 2017; Du et al., 2018; Hou et al., 2018; Li et al., 2018; Ma et al., 2018). To discover the possible links between lncRNAs and cotton agronomic traits, we mapped the 932 GWAS sites into lncRNAs loci and found the GWAS sites overlapping with the gene body regions of 13 lncRNAs, including 2 lincRNAs, 9 lnc-NATs, and 2 lnc-Genics. The tissue expression analysis showed that three lncRNAs: one lnc-Genics and two lincRNAs exhibited obvious tissue-specific expression during ovule development (0–10 DPA). The three GWAS sites were associated with seed fuzz (lnc-Ga08g0021), seed oil accumulation (lnc-Ga08g0277), and fiber quality (lnc-Ga10g0003), respectively (Figure 4B). We then further analyzed the GWAS region controlling seed fuzz based on a previous study (Du et al., 2018). As shown in Figure 4C, a distinct linkage disequilibrium (LD) block encompassing the lnc-Ga08g0021 and a PCG overlaps with the peak signal of GWAS. Based on the four SNPs in the block across 230 G. arboreum lines, the lnc-Ga08g0021 could be divided into two haplotypes (Figure 4C), which were closely related to fuzz and fuzzless phenotypes, respectively (Figure 4D). The RNA structure analysis showed that the three SNPs in transcript of the lncRNA could lead to great structural variation (Figure 4E). Intriguingly, we found that the gra-miR8739 (a miRNA in G. raimondii) could target this lncRNA (Supplementary Table S1). Thus, it could be a candidate trans-acting siRNA-producing locus (TAS) with potential to produce ta-siRNAs or phased siRNAs. Therefore, we infer that the RNA structural variation may affect the ta-siRNA processing, and finally cause the different seed phenotypes.

These results reveal that lncRNAs might be involved in the developmental regulation of ovules and fiber, either alone or with their neighboring PCG.

Regulatory functions of lncRNAs in G. arboreum

To reveal the putative functions of lncRNAs, we applied the weighted correlation network analysis (WGCNA) to analyze the coexpression between lncRNAs and PCGs by using transcriptomic data from 21 samples. A total of 7,684 lncRNAs and 25,253 PCGs were integrated in the network construction after filtering genes with low expression (FPKM < 10 for PCGs, FPKM < 1.5 for lncRNAs). Finally, 43 coexpressing modules were identified (Figure 5A;Supplemental Table S7). Modules 16 and 13 contain the homologous genes of GhMYB25-like and GhEX1 in G. hirsutum, which were previously identified as key regulators of fiber initiation and elongation (Shan et al., 2014; Wang et al., 2016). The two lincRNAs in these two modules were selected to test their expression by RT-qPCR. The results reflect that both of the lncRNAs and PCGs exhibit ovule-specific expression (Figure 5B).

Figure 5.

Figure 5

ChIRP-seq analysis reveals genome-wide binding sites for lnc-Ga13g0352 and lnc-Ga08g0313. A, The WGCNA association analysis for the module and tissue. The modules containing known PCGs involved in cotton fiber initiation and elongation are indicated. The rows and columns correspond to modules and tissues. Each cell is color-coded by correlation according to the color legend. The color scale means the correlation coefficient (−1 to 1) between the module and tissue sample. B, The coexpression of lncRNA and PCG in ssRNA-seq (lines) and corresponding validation by RT-qPCR (bars), two-tailed t test, error bars represent SD of mean (n = 3). C, Experimental flow chart of ChIRP. The biotinylated probes (green) for lncRNA were used. The enriched RNAs were quantified by RT-qPCR, and the enriched chromatin DNAs were used to perform DNA sequencing. D-E, Schematic genomic structure of two lincRNAs and the enrichment of lncRNA transcript in ChIRP. The positions of the biotinylated ChIRP probes are shown with green lines. VIGS target region of lnc-Ga13g0352 is highlighted by a purple box. P1 and P2, which are marked by black lines, represent two pairs of RT-qPCR primers. The lacZ probes and RNase-treated samples were used as non-targeting control and negative control, respectively. The GAPDH transcript was used as a negative internal gene control. The significance levels are indicated by asterisks (two-tailed t test, three biological replicates, error bars represent SD, ***P value <0.001). F-G, Percentage of the lncRNA binding sites localization in the genome for lnc-Ga13g0352 (F) and lnc-Ga08g0313 (G). H-I, The putative three binding motifs of the lnc-Ga13g0352 (H) and lnc-Ga08g0313 (I).

Based on our analysis, the two lincRNAs were ruled out for the possibility of being pri-miRNA, ceRNA, or precursors of phased siRNA (Supplemental Table S1). Thus, we examined their putative interaction with chromatin by performing chromatin isolation by RNA purification sequencing (ChIRP-seq; Figure 5C; Percharde et al., 2018; Zhao et al., 2018b). The RNA enrichment detection validated by RT-qPCR showed that the biotinylated probes for lnc-Ga13g0352 and lnc-Ga08g0313 efficiently captured the corresponding lncRNAs (Figure 5D and E). Respectively, 969 and 440 high-credibility binding peaks in two biological replicates for the two lncRNA were found (Supplemental Figure S12). Genome-wide distribution analysis showed that 33% and 27% of their binding peaks are in gene body regions, corresponding to 317 and 114 PCGs, respectively (Figure 5F and G; Supplemental Table S8). The motif analysis of these binding sites identified three core motifs for these lncRNAs, suggesting that the two lncRNAs might bind targets in the specific DNA motifs (Figure 5H and I).

The high-enriched binding peaks of lnc-Ga13g0352 and lnc-Ga08g0313 were chosen to perform ChIRP-qPCR validations. The peak regions and other regions of target genes were chosen as the detection sites and negative control for qPCR, respectively (Figure 6B and C; Supplemental Figures S13B and S14B). The significant enrichment between lncRNA-bound chromatin fragments (probes) and total chromatin input (none-probes) were observed in the examples, showing a high-credibility of our ChIRP-seq data. The GO enrichment (Figure 6A;Supplemental Figure S14 and Table S8) analysis for the candidate PCGs regulated by the lncRNA showed that the target PCGs (ChIRP-seq peaks <3 kb around gene body) of lnc-Ga13g0352 displayed remarkable enrichment of transcription regulator activity (GO: 0140110, p-value = 1.63 × 10−4) and DNA-binding transcription factor activity (GO: 0003700, p-value = 1.51 × 10−4), indicating that this lncRNA might regulate genes with transcriptional functions.

Figure 6.

Figure 6

Lnc-Ga13g0352 potentially regulates transcription of ovule development–associated genes via activation and inhibition mechanisms. A, Top 10 GO terms in ontology enrichment for lncRNA-binding genes. B-C, The ChIRP-seq peaks and ChIRP-qPCR for Ga12g02719 (ARIA) and Ga05g03705 (MT). The peaks from one biological replicate are shown. The lacZ and lnc-Ga08g0313 probes were used as non-targeting control, and RNase-treated samples were used as negative control for ChIRP. The significance levels of qPCR data are indicated by asterisks (two-tailed t test, three biological replicates, error bars represent SD, ***P value < 0.001). D-G, Relative transcription levels of lnc-Ga13g0352 (D, E), ARIA (F), and MT (G) in the leaves of six independent VIGS seedlings (TRV2: lnc-Ga13g0352) and one vector control (TRV2: 00) plant. The P1 and P2 primers in Figure 6D were used for RT-qPCR analysis of lnc-Ga13g0352. The significance levels are indicated by asterisks (two-tailed t test, error bars represent SD, *P value < 0.05; **P value < 0.01; ***P value < 0.001; n.s. represents not significant). H, GAPDH was used as a negative control for VIGS. The significance levels are indicated by asterisks (two-tailed t test, error bar represents SD, *P value <0.05; **P value < 0.01; ***P value < 0.001; n.s. represents not significant). I, The genome-wide regulatory model of lnc-Ga13g0352.

Next, we wondered what the regulatory effects of lnc-Ga13g0352 on its targets are. We analyzed the expression pattern of 317 target PCGs across 21 tissues (Supplemental Figure S13A). Three clusters were classified by k-means clustering (Euclidean distance metric), two of which exhibited significant differences based on their expression in ovules. 115 of the target PCGs were highly expressed in ovules (EO cluster), and 121 of the target PCGs were completely inhibited in ovules (IO cluster). Hence, we supposed that this lncRNA might regulate ovule development–associated genes through two opposite manners: activation and inhibition.

Currently, there is no feasible approach for stable transformation in G. arboreum, so we used virus-induced gene suppression (VIGS) experiments to test the effects caused by the silencing of lnc-Ga13g0352 (Figure 5D;Supplemental Figure S13C and D). The ectopic silencing assay in leaves of G. arboreum seedlings showed that the six independent VIGS lines exhibited different silencing effects for lnc-Ga13g0352 and could be divided into severe group (<5% of control; lines 1, 2, and 3) and mild group (∼75% of control; lines 4, 5, and 6; Figure 6D and E). We then investigated the transcriptional expression of two target PCGs, namely Ga12g02719 and Ga05g03705 that, respectively, belong to the EO and IO clusters of lnc-Ga13g0352 targets. The RT-qPCR results show that these two genes were inhibited and activated, respectively (Figure 6F–H). Moreover, the expression of these two target PCGs exhibited dose-dependent effects in control (line 7), mild (lines 4–6), and severe groups (lines 1–3). The inhibition of Ga12g02719 and activation of Ga05g03705 in the severe group was stronger than that in the mild group. Together, the above results support the genome-wide contrasting regulatory roles of lnc-Ga13g0352 for its target PCGs (Figure 6I).

Discussion

Previous lncRNA identification studies in cotton relied solely on RNA-seq (Wang et al., 2015b; Zhao et al., 2018a), which would produce incomplete annotations for lncRNAs due to transcript assembly errors, especially in 5ʹ- and 3ʹ-ends caused by the low reads coverage, and false-positive annotation for lncRNAs caused by non-strand-specific reads or background of ssRNA-seq. In this study, we compiled the PULL pipeline to integrate multi-strategy RNA-seq data to accurately annotate lncRNAs. Compared with previous lncRNA identification in leaves and 0-DPA ovules of G. arboreum (Zhao et al., 2018a), this is a more comprehensive and accurate lncRNA dataset, and it updated a considerable number of lncRNAs with accurate 5′- and/or 3′-ends.

Based on these accurate termini annotations for lncRNAs supported by CAGE-seq and PolyA-seq, we found that most lncRNAs (∼75%) in G. arboreum have 5′ caps and polyA tails. Moreover, the canonical Pol II transcription elements were also identified in their promoters and terminators. These results provide evidence to support that most lncRNAs (∼75%) are transcribed by RNA Pol II in G. arboreum, which is in line with a previous study (Zhao et al., 2018a). The remaining lncRNAs (∼25%) in G. arboreum must also possess promoters, but these are neither Pol-II nor Pol-I/III affiliated. Therefore, these lncRNAs might be transcribed by plant-specific RNA polymerases Pol IV or Pol V, which are required for the siRNA-directed DNA methylation pathway (Matzke and Mosher, 2014).

The compositional contribution of TEs to lncRNAs in plants has been reported in previous studies (Golicz et al., 2018; Zhao et al., 2018a). In this study, we also came to a similar conclusion that retrotransposons, especially Gypsy, contribute the most to the composition and transcriptional regulation of lncRNAs in cotton (G. arboreum). Considering the phenomenon that different plants might experience biased TE explosion events, such as the mPING (Type II TE) in O. sativa (Jiang et al., 2003), and that different TEs would have different insertion preferences (Pereira, 2004), it seems plausible that TE-derived lncRNAs would exhibit tremendous differences in sequence and genomic location in different plant species. This agrees with the fact that lncRNAs exhibit low conservation across species. Previous studies provided sporadic evidence supporting the important biological functions of alternative TSSs/TTSs of lncRNAs, such as (1) alternative polyadenylation of an antisense transcript in the important Arabidopsis floral repressor gene FLC, which controls FLC expression and thus flowering time (Csorba et al., 2014); and (2) an lncRNA named SVALKA in Arabidopsis that modulates response to cold via the acclimation gene C-repeat/dehydration-responsive element-binding factor 1 (CBF1) through lncRNA read-through transcription (Kindgren et al., 2018). Therefore, it is foreseeable that the systematic investigation of TSS/TTS switching of lncRNAs will uncover more cases that are involved in the control of development or stress response. The present study has pioneered a genome-wide identification and dynamic monitoring of TSSs and TTSs for lncRNAs in crop species. The resulting findings from this study deserve further attention.

Our results also showed that some lncRNAs might be correlated to the stress-triggered TSS switching of PCGs (Figure 2F and G) and have high tissue-specific expression and concordant expression with nearby PCGs (Figure 2A and E). Hence, the heterogeneity of lncRNA transcription and the possible regulatory effect of this on transcription of PCGs might make cardinal contributions to the rewiring of gene regulatory networks in plants. It will be of great value to address the biogenesis and mode of action of these characterized transcription events.

Our ChIRP-seq explored the genome-wide binding sites of lnc-Ga13g0352, and associated VIGS assays indicated that this lncRNA might impose two distinct effects on the transcriptional level of its target PCGs: activation and inhibition. Therefore, in addition to uncovering the potential regulation function of lncRNA on cotton development, our study also shed light on the complex mode of action of lncRNAs. How an lncRNA can modulate gene expression networks in two opposite manners on a genome-wide scale warrants further study.

Conclusion

In summary, we produced an accurate and comprehensive annotation atlas for lncRNAs in cotton G. arboreum. This was a pilot attempt for lncRNA identification based on multi-strategy RNA-seq data in plants. Several transcriptional regulation functions of lncRNAs in G. arboreum were uncovered. These findings provide valuable research resources for the plant community and broaden our understanding of biogenesis and regulatory functions of plant lncRNAs.

Materials and methods

Plant growth and treatments

Cotton G. arboreum cv Shixiya1 (SXY1) was cultivated in an automated greenhouse where the temperature was set to 30°C/25°C for 16-h day/8-h night cycle and the humidity was set to 65%. The matured tissue samples included anther, stigma, petal, bract, sepal, and whole flower at 0 days post-anthesis (DPA), and phloem, leaf, seedling root, seedling stem, cotyledon (2 weeks), seed, and ovules at four developmental stages (0, 5, 10, and 20 DPA). The seedlings (2 weeks) were used for stress treatment according to a previous report (Makarevitch et al., 2015). Stress conditions included heat (50°C for 5 h), cold (5°C for 15 h), salt (500-mM NaCl), and UV (1.24 μmol m−2 s−1 UV-B for 2 h). Untreated seedlings were used as control.

RNA-seq and PCR assays

Cotton RNA extraction and libraries construction for ssRNA-seq, CAGE-seq, and PolyA-seq were performed according to our previous publication (Wang et al., 2019). PacBio sequencing data of mixed samples from 16 tissues were from our previous study (Wang et al., 2019). For RT-qPCR, 1-µg total RNA was used to synthesize cDNA with oligo dT or gene-specific primers by a using reverse transcription kit (Vazyme, China). Relative gene expression was calculated using the 2-ΔCT method. The internal control Ga01g02318 in our study (Wang et al., 2019) was used for RT-semi-qPCR and RT-qPCR. For 5ʹ and 3ʹ RACE, the nested PCR products were analyzed by Qsep100DNA Analyzer (BiOptic, Taiwan, China) and sent to TSINGK Company for Sanger sequencing. All primers used in this study are listed in Supplemental Table S9.

Sequencing data processing

For ssRNA-seq data, raw reads were first processed by adapter trimming and removing low-quality or inaccurate reads. Then, clean reads were mapped to the reference genome of G. arboreum (Du et al., 2018) with STAR (Dobin et al., 2013) in EndToEnd and 2-pass mapping modes. For CAGE-seq and PolyA-seq data, raw reads were treated similarly to get clean reads. Then, clean reads were mapped by STAR with EndToEnd mode. The removal of PCR duplicates was achieved by FastUniq (Xu et al., 2012). The isoforms assembled by fixed Iso-seq data were adopted from our previous study (Wang et al., 2019). Supplemental Table S10 lists all sources and information of NGS datasets used in this study. Supplemental Table S11 lists all software and associated information regarding the version and options/parameters.

PULL pipeline

Alignment results from ssRNA-seq and Iso-seq were merged by StringTie (Pertea et al., 2015) with parameter -g 100. The program “clusterBed” in Bedtools (Quinlan and Hall, 2010) was used to get a whole non-redundant transcriptome collection. TSSs/TTSs from CAGE-seq/PolyA-seq were obtained by R package CAGEr (Haberle et al., 2015). To match a transcript with a TSS supported by CAGE-Seq, we considered two factors: (1) “dis,” the distance between a TSS from CAGE-seq and the 5ʹ-end of the assembled transcript (Hon et al., 2017). (2) “cor,” the coefficient between transcripts per million (TPM) of a TSS from CAGE-seq and fragments per kb of exonic sequence per million mapped reads (FPKM) of the transcript from ssRNA-seq across samples (Kawaji et al., 2014). For each assembled transcript, we retained all CAGE peaks between 500-bp upstream of the 5ʹ-end and the first exon. CAGE peak with the highest “cor” was selected as the TSS of the transcript. If there were multiple CAGE peaks with the same “cor,” the CAGE peak with the smallest “dis” was selected as the TSS of the transcript. The matching of a transcript with a TTS from PolyA-seq was followed a similar way.

To get a set of lncRNA transcriptomes, the following assembled transcripts were directly removed: (1) the known PCGs annotated in the genome and known ncRNAs in Rfam (tRNAs, rRNAs, snoRNAs, and snRNAs); (2) the transcripts with lengths <200 bp or ORF lengths >100 amino acids (aa); (3) the transcripts encoding known protein in Swiss-Prot or Pfam; (4) the transcripts with Coding-Non-Coding Index (CNCI) score >0 (Sun et al., 2013) or Coding Potential Calculator 2 (CPC2) score >0.5 (Kang et al., 2017). Besides, for the candidate long noncoding natural antisense transcripts (lnc-NATs) without TSS or TTS signals from CAGE-seq or PolyA-seq, we used the “intersectBed” program to filter the contamination from sense transcripts in ssRNA-seq. If their complementary sequence length occupied more than 50% of the length for both lnc-NATs and complementary PCGs, those lnc-NATs were removed. The specific implementation method can be referred to at the PULL pipeline at https://github.com/cotton-lab/PULL. To annotate the candidates of pri-miRNAs, precursors of phased siRNA, and competing endogenous RNAs (ceRNAs) in the lncRNAs, all the miRNAs of Gossypium genus (miRbase version 22.1) were used to perform prediction by RNAfold (v 2.4.14; Lorenz et al., 2011), psRNATarget (Dai et al., 2018), and PerRBase (Yuan et al., 2017). The 8,213 and 65 candidates of pri-miRNAs, precursor of phasiRNA, and ceRNAs were annotated in Supplemental Table S1.

Transposable element analysis

The genome and lncRNA annotation information for Z. mays (B73), Arabidopsis thaliana (Col-0), O. sativa (Japonica), and for G. hirsutum and G. raimondii were downloaded from CANTATA (v2.0; http://cantata.amu.edu.pl/) and Phytozome (v9.0; http://www.phytozome.net), respectively. Transposable elements (TEs) were annotated by using RepeatMasker (v4.0.9) and RepeatModeler (v2.0.1; http://repeatmasker.org). The genome and TE annotation for G. arboreum was from a previous study (Du et al., 2018). The FPKM values of TE were calculated using StringTie. Considering that many TEs are repetitive, multi-mapping reads were dropped, and only unique mapping reads were used to calculate FPKM. The lncRNAs or PCGs containing TE sequences with the lengths of >10 bp were considered as TE-related lncRNAs or PCGs, which were achieved by “intersectBed” program in Bedtools. Pearson correlation coefficient was calculated between the expression levels of the TE and TE-related lncRNA or PCG across samples. When comparing the correlation between TE-lncRNA or TE-PCG pairs, we only chose the pairs without overlap to avoid its impact on FPKM values.

Tissue-specific expression and coexpression analysis

The Jensen–Shannon (JS) divergence method (Cabili et al., 2011) was used for scoring tissue-specific expression. The et represented a standard expression mode that expressed only in one tissue, whereas the calculated transcript was defined as e. The distance between the two expression modes was calculated as follows. Tissue specificity score is defined as 1-JSdiste1,e2.

JSdiste1,e2= JS(e1,e2)
JSspet=1-JSdiste,et
JSspe=argmaxt JSspet, t=1n

Coexpression analysis was analyzed by R package WGCNA (Zhang and Horvath, 2005). The lncRNAs with FPKM ≥1.5 and PCGs with FPKM ≥ 10 in any tissues were retained.

TSS and TTS analysis

Based on the analysis of CAGE-seq and PolyA-seq, dominant TSS/TTS with TPM ≥0.5 were used for further study. The TSS/TTS frequency was defined as the average number of TSS/TTS per locus. TSS/TTS switching between any two tissues were considered as TSS/TTS selection. The screening of TSS/TTS motifs for lncRNAs was examined and identified by Homer and MEME software, respectively. The RNAfold was used to predict secondary structures of lncRNAs (Lorenz et al., 2016).

SNP and GWAS analysis

SNP data were collected from a previous study (Du et al., 2018). GWAS sites were collected from studies in G. arboreum (Du et al., 2018) and G. hirsutum (Wang et al., 2017; Li et al., 2018; Ma et al., 2018). GWAS sites in the At-subgenome of G. hirsutum were transformed into G. arboreum. Annovar (Wang et al., 2010) was used to detect the overlap between lncRNAs and SNP/GWAS sites. The LD values between SNPs were calculated by PLINK (1.9b; Purcell et al., 2007). The haplotype of each G. arboreum line was identified by Beagle (5.1.24; Browning and Browning, 2007). When investigating the seed phenotypes, we removed 17 lines with heterozygous genotype or without phenotypic data.

ChIRP-seq

ChIRP-seq was performed according to previous reports (Percharde et al., 2018; Zhao et al., 2018b) with some modification. Briefly, ovules (0 DPA) were crosslinked for 30 minutes in 0.4 M Sucrose, 10 mM Tris-HCl pH8.0, 5 mM β-ME, 1% (v/v) formaldehyde, protease inhibitor cocktail (Roche, Germany), RNase inhibitor, and then quenched with 0.125 M glycine for 5 min. The chromatin was sonicated into fragments of less than 300 bp. RNase-treated chromatin was used as negative control. To capture the lncRNAs, 1-mL chromatin fragment was incubated with 100 pmol biotinylated lncRNA or LacZ probes in 2-mL solution (750 mM NaCl, 1% (w/v) SDS, 50-mM Tris–HCl pH 7, 1 mM EDTA, 15% (v/v) formamide, protease inhibitor cocktail, RNase inhibitor) at 37°C overnight. Streptavidin-beads were added and incubated at room temperature for 30 min, and then were washed with 2× SSC, 0.5% (w/v) SDS, and 1-mM PMSF five times at 37°C. The precipitation was divided into two parts: 10% and 90% were performed quantification assay for RNA and DNA enrichment, respectively. The percentage of retrieved RNA/DNA to input was used to reflect RNA/DNA enrichment by using RT-qPCR/qPCR. The purified DNA fragments were used to build the sequencing library.

The clean data were mapped by STAR with EndToEnd. After removing PCR duplicates by picard, the surplus alignment results were delivered to MACS2 (Zhang et al., 2008) to call peaks. High-credibility peaks were filtered against its corresponding input with p-value cutoff 1e-5. The repeatability of ChIRP-seq was measured by calculating the Pearson’s correlation coefficient of the enrichment of common peaks identified in two replicates. By using CHIPseeker package (Yu et al., 2015) in R, we offered every peak an annotation with the definition for a promoter being 3 kb around the TSS. GO enrichment analysis for PCGs with high-credibility peak was finished by ChIP-Enrich (Welch et al., 2014) with P-value cutoff 0.05. Motif analysis was executed with sequences of all peaks within ±50 bp around peak summits. Only motifs with high significance were shown.

VIGS assay

VIGS assay was performed as previously reported (Gao et al., 2013). The 699- to 1,295-bp region of lnc-Ga13g0352, which has lower sequence similarity than other regions of the G. arboreum transcriptome, was amplified and ligated to pTRV2 vectors, and then introduced into Agrobacterium tumefaciens strain GV3101. The transformed Agrobacterium strains were resuspended with infiltration buffer (10 mM MgCl2, 10 mM MES, and 200 mΜ acetosyringone). The transformant with pTRV2:lnc-Ga13g0352 or pTRV2:00 (empty vector) were mixed with pTRV1 in a 1:1 ratio and infiltrated into cotyledons of seedlings (2 weeks). The injected seedlings continued to grow in growth chamber at 25°C for another 2 weeks. For detecting the effect of VIGS, the true leaves were harvested for total RNA isolation. The oligo dT was used as the reverse transcription primer of cDNA synthesis for RT-qPCR.

Accession numbers

Sequence data can be found in NCBI Sequence Read Archive under the accession numbers PRJNA542206, PRJNA507565, and PRJNA373801.

Supplemental data

The following materials are available in the online version of this article.

Supplemental Figure S1. The detailed information of three steps of PULL pipeline.

Supplemental Figure S2. The Sanger sequencing validation of 5ʹ-end signals of lncRNAs predicted by PULL.

Supplemental Figure S3. The Sanger sequencing validation of 3ʹ-end signals of lncRNAs predicted by PULL.

Supplemental Figure S4. The genomic features of lncRNAs and PCGs in G. arboreum.

Supplemental Figure S5. TEs might contribute to the sequence composition and expressional regulation of lncRNA.

Supplemental Figure S6. Comparison of RNA-seq quantification (left) and RT-qPCR (right) of 30 lncRNAs in eight tissues.

Supplemental Figure S7. The coexpression of an lnc-NAT with its cognate antisense PCG.

Supplemental Figure S8. The canonical cis-elements of Pol II transcription found in lncRNAs and PCGs.

Supplemental Figure S9. The distribution of known motifs of Pol I, -II, and -III in different types of lncRNAs.

Supplemental Figure S10. The switches of TSS and TTS and their effect on the secondary structure of lncRNAs.

Supplemental Figure S11. SNP frequency distribution on the gene body of lincRNAs (red line) and PCGs (blue line).

Supplemental Figure S12. The evaluation on the biological repeatability of ChIRP-seq for lnc-Ga13g0352 and lnc-Ga08g0313.

Supplemental Figure S13. The validation of lnc-Ga13g0352-targeted PCGs and phenotypic observation in VIGS.

Supplemental Figure S14. The analysis and validation of lnc-Ga08g0313-targeted PCGs.

Supplemental Table S1. Genomic coordinates of lncRNAs in BED12 format and category of lncRNAs based on terminal signal, genome location, and miRNA in G. arboreum.

Supplemental Table S2. The expression (FPKM) of lncRNAs across 21 tissues.

Supplemental Table S3. The expression (FPKM) of PCGs across 21 tissues.

Supplemental Table S4. The FPKM from RNA-seq analysis and relative expression from RT-qPCR analysis of 30 lncRNAs in tested eight tissues.

Supplemental Table S5. The corresponding number IDs of tissues.

Supplemental Table S6. The genomic locations of lncRNAs which were related TSS switches of PCGs.

Supplemental Table S7. WGCNA analysis of the expression of lncRNAs and PCGs.

Supplemental Table S8. The ChIRP-seq peaks and associated gene information.

Supplemental Table S9. All primers used in this study.

Supplemental Table S10. Sources and information of NGS datasets used in this study.

Supplemental Table S11. Information of software and programs used in this study.

Supplementary Material

kiaa003_Supplementary_Data

Acknowledgments

We thank Prof. Xueying Guan from Zhejiang University (Hangzhou, China) for providing the VIGS vectors.

Funding

This work was supported by the following grants: the National Program on Research and Development of Transgenic Plants (2016ZX08009003-004) and the National Natural Science Foundation of China (31770310 and 31711530706) to K.W.; “One Thousand” Youth Talent Program to Y.Z.; and Innovation Team Program from Wuhan University to Y.Z. and K.W. (2042017kf0233).

Conflict of interest statement. The authors affirm that they have no other conflicts of interest.

K.W. conceived the experiments; X.M.Z. performed RACE, VIGS, and ChIRP assays. Y.J.C., Y.F.Z., H.Z.Y., and Y.Z. analyzed the sequencing data and performed bioinformatics analysis; D.Y.L., K.K.S., and X.H. conducted RT-qPCR; K.W., Y.J.C., and X.M.Z. wrote the article. All authors reviewed and approved this work

The author responsible for distribution of materials integral to the findings presented in this article in accordance with the policy described in the Instructions for Authors (https://academic.oup.com/plphys) is: Kun Wang (wangk05@whu.edu.cn).

References

  1. Ariel F, Jegu T, Latrasse D, Romero-Barrios N, Christ A, Benhamed M, Crespi M (2014) Noncoding transcription by alternative RNA polymerases dynamically regulates an auxin-driven chromatin loop. Mol Cell 55: 383–396 [DOI] [PubMed] [Google Scholar]
  2. Bardou F, Ariel F, Simpson CG, Romero-Barrios N, Laporte P, Balzergue S, Brown JWS, Crespi M (2014) Long noncoding RNA modulates alternative splicing regulators in arabidopsis. Dev Cell 30: 166–176 [DOI] [PubMed] [Google Scholar]
  3. Blein T, Balzergue C, Roulé T, Gabriel M, Scalisi L, François T, Sorin C, Christ A, Godon C, Delannoy E, et al. (2020) Landscape of the noncoding transcriptome response of two arabidopsis ecotypes to phosphate starvation. Plant Physiol 183: 1058–1072 [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Boley N, Stoiber MH, Booth BW, Wan KH, Hoskins RA, Bickel PJ, Celniker SE, Brown JB (2014) Genome-guided transcript assembly by integrative analysis of RNA sequence data. Nat Biotechnol 32: 341–346 [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Browning SR, Browning BL (2007) Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am J Hum Genet 81: 1084–1097 [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Cabili MN, Trapnell C, Goff L, Koziol M, Tazon-Vega B, Regev A, Rinn JL (2011) Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. Genes Dev 25: 1915–1927 [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Csorba T, Questa JI, Sun Q, Dean C (2014) Antisense COOLAIR mediates the coordinated switching of chromatin states at FLC during vernalization. Proc Natl Acad Sci USA 111: 16160–16165 [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Dai X, Zhuang Z, Zhao PX (2018) PsRNATarget: a plant small RNA target analysis server (2017 release). Nucleic Acids Res 46: W49–W54 [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Deng P, Liu S, Nie X, Weining S, Wu L (2018) Conservation analysis of long non-coding RNAs in plants. Sci China Life Sci 61: 190–198 [DOI] [PubMed] [Google Scholar]
  10. Ding J, Lu Q, Ouyang Y, Mao H, Zhang P, Yao J, Xu C, Li X, Xiao J, Zhang Q (2012) A long noncoding RNA regulates photoperiod-sensitive male sterility, an essential component of hybrid rice. Proc Natl Acad Sci USA 109: 2654–2659 [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR (2013) STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29: 15–21 [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Du X, Huang G, He S, Yang Z, Sun G, Ma X, Li N, Zhang X, Sun J, Liu M, et al. (2018) Resequencing of 243 diploid cotton accessions based on an updated A genome identifies the genetic basis of key agronomic traits. Nat Genet 50: 796–802 [DOI] [PubMed] [Google Scholar]
  13. Engreitz JM, Haines JE, Perez EM, Munson G, Chen J, Kane M, Mcdonel PE, Guttman M, Lander ES (2016) Local regulation of gene expression by lncRNA promoters, transcription and splicing. Nature 539: 452–455 [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Fang L, Wang Q, Hu Y, Jia Y, Chen J, Liu B, Zhang Z, Guan X, Chen S, Zhou B, et al. (2017) Genomic analyses in cotton identify signatures of selection and loci associated with fiber quality and yield traits. Nat Genet 49: 1089–1098 [DOI] [PubMed] [Google Scholar]
  15. Franco-Zorrilla JM, Valli A, Todesco M, Mateos I, Puga MI, Rubio-Somoza I, Leyva A, Weigel D, García JA, Paz-Ares J (2007) Target mimicry provides a new mechanism for regulation of microRNA activity. Nat Genet 39: 1033–1037 [DOI] [PubMed] [Google Scholar]
  16. Fu H, Yang D, Su W, Ma L, Shen Y, Ji G, Ye X, Wu X, Li QQ (2016) Genome-wide dynamics of alternative polyadenylation in rice. Genome Res 26: 1753–1760 [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Gao X, Li F, Li M, Kianinejad AS, Dever JK, Wheeler TA, Li Z, He P, Shan L (2013) Cotton GhBAK1 mediates verticillium wilt resistance and cell death. J Integr Plant Biol 55: 586–596 [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Golicz AA, Singh MB, Bhalla PL (2018) The long intergenic noncoding RNA (LincRNA) landscape of the soybean genome. Plant Physiol 176: 2133–2147 [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Haberle V, Forrest ARR, Hayashizaki Y, Carninci P, Lenhard B (2015) CAGEr: precise TSS data retrieval and high-resolution promoterome mining for integrative analyses. Nucleic Acids Res 43: e51. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Heinz S, Benner C, Spann N, Bertolino E, Lin YC, Laslo P, Cheng JX, Murre C, Singh H, Glass CK (2010) Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol Cell 38: 576–589 [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Heix J, Grummt I (1995) Species specificity of transcription by RNA polymerase I. Curr Opin Genet Dev 5: 652–656 [DOI] [PubMed] [Google Scholar]
  22. Hon C-C, Ramilowski JA, Harshbarger J, Bertin N, Rackham OJL, Gough J, Denisenko E, Schmeier S, Poulsen TM, Severin J, et al. (2017) An atlas of human long non-coding RNAs with accurate 5′ ends. Nature 543: 199–204 [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Hou S, Zhu G, Li Y, Li W, Fu J, Niu E, Li L, Zhang D, Guo W (2018) Genome-wide association studies reveal genetic variation and candidate genes of drought stress related traits in cotton (Gossypium hirsutum L.). Front Plant Sci 9: 1276. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Hu H, Wang M, Ding Y, Zhu S, Zhao G, Tu L, Zhang X (2018) Transcriptomic repertoires depict the initiation of lint and fuzz fibres in cotton (Gossypium hirsutum L.). Plant Biotechnol J 16: 1002–1012 [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Jiang N, Bao Z, Zhang X, Hirochika H, Eddy SR, McCouch SR, Wessler SR (2003) An active DNA transposon family in rice. Nature 421: 163–167 [DOI] [PubMed] [Google Scholar]
  26. Kalvari I, Argasinska J, Quinones-Olvera N, Nawrocki EP, Rivas E, Eddy SR, Bateman A, Finn RD, Petrov AI (2018) Rfam 13.0: shifting to a genome-centric resource for non-coding RNA families. Nucleic Acids Res 46: D335–D342 [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Kang Y-J, Yang D-C, Kong L, Hou M, Meng Y-Q, Wei L, Gao G (2017) CPC2: a fast and accurate coding potential calculator based on sequence intrinsic features. Nucleic Acids Res 45: W12–W16 [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Kawaji H, Lizio M, Itoh M, Kanamori-Katayama M, Kaiho A, Nishiyori-Sueki H, Shin JW, Kojima-Ishiyama M, Kawano M, Murata M, et al. (2014) Comparison of CAGE and RNA-seq transcriptome profiling using clonally amplified and single-molecule next-generation sequencing. Genome Res 24: 708–717 [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Kindgren P, Ard R, Ivanov M, Marquardt S (2018) Transcriptional read-through of the long non-coding RNA SVALKA governs plant cold acclimation. Nat Commun 9: 4561. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Li C, Fu Y, Sun R, Wang Y, Wang Q (2018) Single-locus and multi-locus genome-wide association studies in the genetic dissection of fiber quality traits in upland cotton (Gossypium hirsutum L.). Front Plant Sci 9: 1083. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Li F, Fan G, Wang K, Sun F, Yuan Y, Song G, Li Q, Ma Z, Lu C, Zou C, et al. (2014a) Genome sequence of the cultivated cotton Gossypium arboreum. Nat Genet 46: 567–572 [DOI] [PubMed] [Google Scholar]
  32. Li L, Eichten SR, Shimizu R, Petsch K, Yeh C-T, Wu W, Chettoor AM, Givan SA, Cole RA, Fowler JE, et al. (2014b) Genome-wide discovery and characterization of maize long non-coding RNAs. Genome Biol 15: R40. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Liu J, Jung C, Xu J, Wang H, Deng S, Bernad L, Arenas-Huertero C, Chua N-H (2012) Genome-wide analysis uncovers regulation of long intergenic noncoding RNAs in Arabidopsis. Plant Cell 24: 4333–4345 [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Lorenz R, Bernhart SH, Höner zu Siederdissen C, Tafer H, Flamm C, Stadler PF, Hofacker IL (2011) ViennaRNA Package 2.0. Algorithms Mol Biol 6: 26. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Lorenz R, Hofacker IL, Stadler PF (2016) RNA folding with hard and soft constraints. Algorithms Mol Biol 11: 8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Ma Z, He S, Wang X, Sun J, Zhang Y, Zhang G, Wu L, Li Z, Liu Z, Sun G, et al. (2018) Resequencing a core collection of upland cotton identifies genomic variation and loci influencing fiber quality and yield. Nat Genet 50: 803–813 [DOI] [PubMed] [Google Scholar]
  37. Makarevitch I, Waters AJ, West PT, Stitzer M, Hirsch CN, Ross-Ibarra J, Springer NM (2015) Transposable elements contribute to activation of maize genes in response to abiotic stress. PLoS Genet 11: e1004915. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Matzke MA, Mosher RA (2014) RNA-directed DNA methylation: an epigenetic pathway of increasing complexity. Nat Rev Genet 15: 394–408 [DOI] [PubMed] [Google Scholar]
  39. Mercer TR, Dinger ME, Mattick JS (2009) Long non-coding RNAs: insights into functions. Nat Rev Genet 10: 155–159 [DOI] [PubMed] [Google Scholar]
  40. Paytuví Gallart A, Hermoso Pulido A, Anzar Martínez de Lagrán I, Sanseverino W, Aiese Cigliano R (2016) GREENC: a Wiki-based database of plant lncRNAs. Nucleic Acids Res 44: D1161–D1166 [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Percharde M, Lin CJ, Yin Y, Guan J, Peixoto GA, Bulut-Karslioglu A, Biechele S, Huang B, Shen X, Ramalho-Santos M (2018) A LINE1-nucleolin partnership regulates early development and ESC identity. Cell 174: 391–405.e19 [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Pereira V (2004) Insertion bias and purifying selection of retrotransposons in the Arabidopsis thaliana genome. Genome Biol 5: 1–10 [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Pertea M, Pertea GM, Antonescu CM, Chang T-C,, Mendell JT, Salzberg SL (2015) StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat Biotechnol 33: 290–295 [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, Maller J, Sklar P, De Bakker PIW, Daly MJ, et al. (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81: 559–575 [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Quinlan AR, Hall IM (2010) BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26: 841–842 [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Ransohoff JD, Wei Y, Khavari PA (2017) The functions and unique features of long intergenic non-coding RNA. Nat Rev Mol Cell Biol 19: 143–157 [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Schramm L, Hernandez N (2002) Recruitment of RNA polymerase III to its target promoters. Genes Dev 16: 2593–2620 [DOI] [PubMed] [Google Scholar]
  48. Shan C-M, Shangguan X-X, Zhao B, Zhang X-F, Chao L, Yang C-Q, Wang L-J, Zhu H-Y, Zeng Y-D, Guo W-Z, et al. (2014) Control of cotton fibre elongation by a homeodomain transcription factor GhHOX3. Nat Commun 5: 5519. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. St. Laurent G, Wahlestedt C, Kapranov P (2015) The Landscape of long noncoding RNA classification. Trends Genet 31: 249–251 [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Sun L, Luo H, Bu D, Zhao G, Yu K, Zhang C, Liu Y, Chen R, Zhao Y (2013) Utilizing sequence intrinsic composition to classify protein-coding and long non-coding transcripts. Nucleic Acids Res 41: e166. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Tokizawa M, Kusunoki K, Koyama H, Kurotani A, Sakurai T, Suzuki Y, Sakamoto T, Kurata T, Yamamoto YY (2017) Identification of Arabidopsis genic and non-genic promoters by paired-end sequencing of TSS tags. Plant J 90: 587–605 [DOI] [PubMed] [Google Scholar]
  52. Uszczynska-Ratajczak B, Lagarde J, Frankish A, Guigó R, Johnson R (2018) Towards a complete map of the human long non-coding RNA transcriptome. Nat Rev Genet 19: 535–548 [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Wang K, Huang G, Zhu Y (2016) Transposable elements play an important role during cotton genome evolution and fiber cell development. Sci China Life Sci 59: 112–121 [DOI] [PubMed] [Google Scholar]
  54. Wang K, Li M, Hakonarson H (2010) ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res 38: e164. [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Wang K, Wang D, Zheng X, Qin A, Zhou J, Guo B, Chen Y, Wen X, Ye W, Zhou Y, et al. (2019) Multi-strategic RNA-seq analysis reveals a high-resolution transcriptional landscape in cotton. Nat Commun 10: 4714. [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Wang M, Tu L, Lin M, Lin Z, Wang P, Yang Q, Ye Z, Shen C, Li J, Zhang L, et al. (2017) Asymmetric subgenome selection and cis-regulatory divergence during cotton domestication. Nat Genet 49: 579–587 [DOI] [PubMed] [Google Scholar]
  57. Wang M, Yuan D, Tu L, Gao W, He Y, Hu H, Wang P, Liu N, Lindsey K, Zhang X (2015) Long noncoding RNAs and their proposed functions in fibre development of cotton (Gossypium spp.). New Phytol 207: 1181–1197 [DOI] [PubMed] [Google Scholar]
  58. Wang R, Zheng D, Yehia G, Tian B (2018a) A compendium of conserved cleavage and polyadenylation events in mammalian genes. Genome Res 28: 1427–1441 [DOI] [PMC free article] [PubMed] [Google Scholar]
  59. Wang Y, Luo X, Sun F, Hu J, Zha X, Su W, Yang J (2018b) Overexpressing lncRNA LAIR increases grain yield and regulates neighbouring gene cluster expression in rice. Nat Commun 9: 3516. [DOI] [PMC free article] [PubMed] [Google Scholar]
  60. Welch RP, Lee C, Imbriano PM, Patil S, Weymouth TE, Smith RA, Scott LJ, Sartor MA (2014) ChIP-Enrich: gene set enrichment testing for ChIP-seq data. Nucleic Acids Res 42: e105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  61. Wu H, Yang L, Chen L-L (2017) The diversity of long noncoding RNAs and their generation. Trends Genet 33: 540–552 [DOI] [PubMed] [Google Scholar]
  62. Xu H, Luo X, Qian J, Pang X, Song J, Qian G, Chen J, Chen S (2012) FastUniq: a fast de novo duplicates removal tool for paired short reads. PLoS One 7: e52249. [DOI] [PMC free article] [PubMed] [Google Scholar]
  63. Yamamoto YY,, Yoshitsugu T, Sakurai T, Seki M, Shinozaki K, Obokata J (2009) Heterogeneity of Arabidopsis core promoters revealed by high-density TSS analysis. Plant J 60: 350–362 [DOI] [PubMed] [Google Scholar]
  64. Yu G, Wang L-G, He Q-Y (2015) ChIPseeker: an R/Bioconductor package for ChIP peak annotation, comparison and visualization. Bioinformatics 31: 2382–2383 [DOI] [PubMed] [Google Scholar]
  65. Yuan C, Meng X, Li X, Illing N, Ingle RA, Wang J, Chen M (2017) PceRBase: a database of plant competing endogenous RNA. Nucleic Acids Res 45: D1009–D1014 [DOI] [PMC free article] [PubMed] [Google Scholar]
  66. Zhang B, Horvath S (2005) A general framework for weighted gene co-expression network analysis. Stat Appl Genet Mol Biol. doi: 10.2202/1544-6115.1128 [DOI] [PubMed] [Google Scholar]
  67. Zhang L, Wang M, Li N, Wang H, Qiu P, Pei L, Xu Z, Wang T, Gao E, Liu J, et al. (2018) Long noncoding RNAs involve in resistance to Verticillium dahliae, a fungal disease in cotton. Plant Biotechnol J 16: 1172–1185 [DOI] [PMC free article] [PubMed] [Google Scholar]
  68. Zhang Y-C, Liao J-Y, Li Z-Y, Yu Y, Zhang J-P, Li Q-F, Qu L-H, Shu W-S, Chen Y-Q (2014) Genome-wide screening and functional analysis identify a large number of long noncoding RNAs involved in the sexual reproduction of rice. Genome Biol 15: 512. [DOI] [PMC free article] [PubMed] [Google Scholar]
  69. Zhang Y, Liu T, Meyer CA, Eeckhoute J, Johnson DS, Bernstein BE, Nusbaum C, Myers RM, Brown M, Li W, et al. (2008) Model-based analysis of ChIP-Seq (MACS). Genome Biol 9: R137. [DOI] [PMC free article] [PubMed] [Google Scholar]
  70. Zhao T, Tao X, Feng S, Wang L, Hong H, Ma W, Shang G, Guo S, He Y, Zhou B, et al. (2018a) LncRNAs in polyploid cotton interspecific hybrids are derived from transposon neofunctionalization. Genome Biol 19: 195. [DOI] [PMC free article] [PubMed] [Google Scholar]
  71. Zhao X, Li J, Lian B, Gu H, Li Y, Qi Y (2018b) Global identification of Arabidopsis lncRNAs reveals the regulation of MAF4 by a natural antisense RNA. Nat Commun 9: 5056. [DOI] [PMC free article] [PubMed] [Google Scholar]
  72. Zheng XM, Chen J, Pang HB, Liu S, Gao Q, Wang JR, Qiao WH, Wang H, Liu J, Olsen KM, et al. (2019) Genome-wide analyses reveal the role of noncoding variation in complex traits during rice domestication. Sci Adv 5: eaax3619. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

kiaa003_Supplementary_Data

Articles from Plant Physiology are provided here courtesy of Oxford University Press

RESOURCES