Skip to main content
Plant Biotechnology Journal logoLink to Plant Biotechnology Journal
. 2024 Aug 1;22(11):3151–3163. doi: 10.1111/pbi.14437

Conserved features and diversity attributes of chimeric RNAs across accessions in four plants

Jia Cong 1, , Sinan Zhang 1,2, , Qi Zhang 1, Xiting Yu 1, Jiazhi Huang 1, Xin Wei 1, Xuehui Huang 1, Jie Qiu 1, ,, Xiaoyi Zhou 1, ,
PMCID: PMC11500992  PMID: 39087631

Summary

As a non‐collinear expression form of genetic information, chimeric RNAs increase the complexity of transcriptome in diverse organisms. Although chimeric RNAs have been identified in plants, few common features have been revealed. Here, we systemically explored the landscape of chimeric RNAs across multi‐accession and multi‐tissue using pan‐genome and transcriptome data of four plants: rice, maize, soybean, and Arabidopsis. Among the four species, conserved characteristics of breakpoints and parental genes were discovered. In each species, chimeric RNAs displayed a high level of diversity among accessions, and the clustering of accessions using chimeric events was generally concordant with clustering based on genomic variants, implying a general relationship between genetic variations and chimeric RNAs. Through mass spectrometry, we confirmed a fusion protein OsNDC1‐OsGID1L2 and observed its subcellular localization, which differed from the original proteins. Phenotypic cues in transgenic rice suggest the potential functions of OsNDC1‐OsGID1L2. Moreover, an intriguing chimeric event Os01g0216500–Os01g0216900, generated by a large deletion in basmati rice, also exists in another accession without the deletion, demonstrating its convergence in evolution. Our results illuminate the characteristics and hint at the evolutionary implications of plant chimeric RNAs, which serve as a supplement to genetic variations, thus expanding our understanding of genetic diversity.

Keywords: chimeric RNAs, plants, conserved characteristics, genetic diversity

Introduction

Chimeric RNAs, also known as fusion transcripts, are joined by transcript segments from different genes (Gingeras, 2009; Gupta et al., 2018; Sutton and Boothroyd, 1986). From worm to human, chimeric RNAs are commonly present in eukaryotes, expanding the complexities of transcriptome and increasing the information content of genome (Gingeras, 2009). This offers additional functional and regulatory mechanisms critical for the adaptive evolution of species (Lei et al., 2016; Mukherjee and Frenkel‐Morgenstern, 2022).

Chimeric RNAs can be generated through gene fusion at DNA level (e.g. DNA arrangement) or through cis‐splicing (from the same precursor RNA) or trans‐splicing (from different transcripts) without changes in DNA sequences (Finta and Zaphiropoulos, 2002; Lei et al., 2016; Sun and Li, 2022). Cis‐splicing chimeric RNAs usually originate from the read‐through of adjacent genes, while trans‐splicing chimeric RNAs can originate from different genes located either inter‐ or intra‐chromosomes, or even from two opposite DNA strands, known as cross‐strand chimeric RNA (cscRNA) (Wang et al., 2021a). The exact mechanisms involved in the origination and regulation of chimeric RNAs remain unclear. Some studies showed that for certain types of chimeric RNAs, short homologous sequences (SHSs) at the junction sites are critical for their generation (Li et al., 2009).

The most extensively characterized chimeric RNAs are found in cancer, where many fusion transcripts, generated from DNA arrangements, are well‐known causes of cancer (Elfman et al., 2020; Hanahan and Weinberg, 2011; Jia et al., 2016; Mertens et al., 2015). Some of them have already been widely used in clinical practice as biomarkers or therapeutic targets (Ommer et al., 2020; Shaw et al., 2011). Additionally, numerous chimeric RNAs have also been identified in normal human tissues (Babiceanu et al., 2016; Christie et al., 2019; Singh et al., 2020). Interestingly, the JAZF1‐JJAZ1 fusion generated from gene rearrangement in endometrial cancer cells can also be produced through trans‐splicing in normal endometrial stroma cells without DNA rearrangement (Li et al., 2008). Similar phenomenon was observed in SLC45A3‐ELK4 in normal and neoplastic prostate cells (Rickman et al., 2009). These findings suggest that splicing‐mediated chimeric RNAs may serve as a precondition for genetic changes at the DNA level, indicating the evolutionary significance of chimeric RNAs.

Chimeric RNAs have also been identified in plants (Singh et al., 2019; Wang et al., 2016; Zhang et al., 2010). However, the functional and evolutionary significances of chimeric RNAs in plants remain largely unclear. Based on pan‐genome and transcriptome data, we systematically identified chimeric RNAs across genetically diverse accessions of four plant species: rice, maize, soybean, and Arabidopsis thaliana. We comprehensively analysed the diversity of chimeric RNAs among various accessions and observed a close relationship between the chimeric RNA profiles and the genetic alterations in these accessions. In addition, through cross‐organism comparison among the four plants, we revealed the conservation patterns of chimeric RNAs. From two dimensions, that is, short‐term intra‐species variation and long‐term species evolution, we explored the significance of fusion transcripts in adaptive evolution.

Results

Identification of chimeric RNAs

Transcriptome sequencing data of totally 138 rice, 167 maize, 234 soybean, and 215 Arabidopsis samples from our own and other labs were utilized to profile chimeric RNAs (Clauw et al., 2015; Gan et al., 2011; Hufford et al., 2021; Kawakatsu et al., 2016; Liu et al., 2020; Qin et al., 2021; Yang et al., 2017). These samples involved 44 rice accessions with seven tissue stages, 25 maize accessions with seven tissue stages, 26 soybean accessions with nine tissue stages, and 198 Arabidopsis ecotypes with nine tissue stages (Figure 1a; Tables S1–S4). Using the software Arriba (Uhrig et al., 2021) alongside a comprehensive set of filtering, we maintained fusion events supported by at least five unique high‐quality reads. Additionally, we excluded fusions that could potentially result in false positives due to homologous genes. As a result, 6927 fusion events were identified in rice, 23 495 in maize, 13 365 in soybean, and 807 in Arabidopsis. After removing redundancy, totally 1070, 1908, 1919, and 201 fusions were identified in rice, maize, soybean, and Arabidopsis, respectively (Tables S5–S8). These affected 2.7% (1030/37 849) of genes in rice, 3.2% (1231/39 005) in maize, 1.7% (955/54 841) in soybean, and 1.0% (260/27 206) in Arabidopsis (Figure 1b), where the number of chimeric RNAs is not correlated with the number of total genes. The intensity of chimeric events in gene regions was variable among and within species, ranging from 0.6 per megabase (Mb) to more than 2.0 per Mb, with maize exhibiting the highest prevalence (Figure 1c), showing that the number of chimeric RNAs was also not correlated with gene region size.

Figure 1.

Figure 1

The landscape of chimeric RNAs in four plants. (a) The tissue stages included in this study and all the fusions identified in rice, maize, soybean, and Arabidopsis. Number of fusions is indicated within parentheses following the tissue name. V: vegetative; R: reproductive; D: day; W: week. (b) Counts of chimeric RNAs and parental genes. (c) The prevalence of chimeric RNAs in gene loci (length from the start to the end of a gene region) across accessions and organisms. Each dot represents one accession, and the horizontal lines represent the median number. (d) The average number of fusions per chromosome. (e) Compilation of chimeric RNAs categorized according to their characteristics relevant to the mechanism of fusion generation.

Chimeric RNAs were identified in all the tissues included in this study (Figure 1a). We further analysed previously reported single‐cell transcriptome sequencing data of roots and leaves from Nipponbare rice (Wang et al., 2021 b), to explore the cell‐type specificity of chimeric RNAs. Among the 15 clusters of leaf cells and nine clusters of root cells, chimeric RNAs were predominantly enriched in mesophyll cells, leaf mesophyll precursor cells, and root cortex cells (Figure S1).

Chimeric RNAs exhibit the highest density on maize chromosomes and lowest in Arabidopsis. It should be noted that the number of fusions in Arabidopsis maybe underestimated due to single‐end sequencing approach for most of the Arabidopsis samples used in Arabidopsis 1001 genome project (Kawakatsu et al., 2016). In maize and soybean, fusions were more likely to occur between chromosomes, while rice and Arabidopsis tended to have fusions within chromosomes (Figure 1d). Intra‐chromosomal chimeric RNAs tend to occur between gene pairs that are in close proximity, with the frequency of chimerism being moderately correlated with gene density (Figure S2).

To validate the chimeric RNAs, we randomly selected 50 fusions in rice and 10 fusions in Arabidopsis Ler ecotype for RT‐PCR and Sanger sequencing. Of these, 45 (90%) and 7 (70%) chimeric RNAs in rice and Arabidopsis, respectively, were successfully validated in independent biological replicate samples, confirming the reliability and inheritance of our chimeric RNAs (Figure S3).

According to previous report on chimeric RNAs generation, including cis‐splicing, trans‐splicing (Sun and Li, 2022), and SHSs‐related fusions (Li et al., 2009), we analysed the cases in our datasets where parental genes were adjacent, fusions occurred at exact splice sites, and SHSs at fusion breakpoints. In total, fusions in all three conditions contribute to less than half of the overall counts (Figure 1e), implying that a substantial proportion of chimeric RNAs may be generated via currently unknown mechanisms. Specially, only less than 5% of chimeric RNAs exhibits SHSs at the breakpoints (Figure 1e), significantly lower than the approximately 50% observed in animals (Li et al., 2009). This suggests the differences in mechanisms of chimeric RNA production between plants and animals, or it may be due to our stringent criteria for identifying fusions.

Conserved features of the fusion breakpoints

To investigate the mechanisms underlying the production and regulation of chimeric RNAs, we analysed the location and base preferences of fusion breakpoints. In all four species examined, both 5′ and 3′ breakpoint sites were mostly located in coding sequence (CDS), followed by untranslated region (UTR) and intron (Figure 2a). We used ‘S’ (Splicing site) and ‘N’ (Not splicing site) to indicate whether the breakpoints occur at original splicing sites of parental genes. Therefore, ‘S/S’ denotes that both the 3′ and 5′ fragments of chimeric RNA utilize the parental gene's splicing sites. This type of fusion accounts for only 5.5% to 9.5% in rice, maize, and soybean, while in Arabidopsis, it constitutes 29.3% (Figure 1e). More than half (52%~83%) of the fusions in the four species belong to the N/N type.

Figure 2.

Figure 2

The conserved features of the location and base preference of the breakpoints in parental genes. (a) The location of breakpoints in parental genes. The width of the stream indicated the proportion of chimeric RNAs (adjusted by the length proportion of CDS/UTR/intron in genome). (b) The proportion of base composition types of breakpoints for N/N fusions. _ _GU‐AG_ _: breakpoints that comply with the GU‐AG rule; _ _GU or AG_ _: breakpoints with only one end, either 5′ or 3′, complied with the rule; others: both the 5′ and 3′ breakpoints deviated from the rule. The frequency of 256 combinations of the four bases at the breakpoint in 5′ (c) or 3′ (d) parental genes. 256 combinations were further divided into 16 categories according to the two intron nucleotides. The pattern of NNCU is enriched in 5′ breakpoints, while ACNN is enriched in 3′ breakpoints. The red arrows indicate the conserved breakpoint signatures which exhibited in at least two species.

Furthermore, we analysed base preference at the breakpoints of N/N type fusions. Averagely, 12.7% of them still adhered to the GU‐AG rule (Figure 2b), suggesting that these chimeric RNAs are still produced by the splicing machinery. To investigate whether there are breakpoint sequence patterns other than GU‐AG, we delved deeper into the four‐nucleotide patterns at breakpoints (2 bp upstream and 2 bp downstream, depicted as 5′‐N2N1n1n2 and 3′‐n2n1N1N2, where uppercase and lowercase letters indicate exonic and intronic bases in fusion transcripts, respectively). We found that in all four plants, 5′‐n1n2 was enriched in CU, whereas 3′‐n2n1 was enriched in AC (Figure 2c). We further identified breakpoint signatures, which are defined as four‐nucleotide combinations whose occurrence probability was greater than 0.01 and at least two times more frequent than those in the corresponding gene regions of each species. Nine to 14 signatures were identified in each species, including 5′‐ACcu, 5′‐GCcu, 5′‐UAcu, 5′‐GGcg, and 3′‐acCA, 3′‐acCU and 3′‐cgUC, which were present in at least two species (Figure 2c). These conserved breakpoint signatures might be related to the mechanisms of chimeric RNA production.

Conserved patterns of parental genes

We analysed the characteristics of chimeric RNA parental genes, aiming to explore what kind of genes chimeric RNAs tend to arise from. Examination of the transcriptome data showed that both the 5′ and 3′ parental genes were highly expressed. Specifically, the percentage of 5′ parental genes surpassing the median expression value (P50) ranges from 59.9% to 90.6% across the four species, while that for 3′ parental genes ranges from 60.3% to 92.5%. Among them, maize parental genes exhibited the highest expression levels (Figure 3a). This indicates that chimeric RNAs tend to arise between actively transcribed genes. The correlation r between 5′ and 3′ parental gene expression levels ranges from 0.25 to 0.72 (Figure S4). By categorizing fusions based on parental gene expression level, we observed a consistent trend: higher expression of one parental gene corresponds to higher expression of the other gene, while lowly expressed genes tend to form chimeras with other lowly expressed genes. This suggests that chimeric events tend to occur between genes with comparable transcriptional activity (Figure S5).

Figure 3.

Figure 3

Comparative analysis of parental genes across four plants. (a) The ranking of 5′ or 3′ parental gene expression levels within the transcriptome of four plants. The average TPM values of each gene in samples within the same species were calculated. The dashed line represents the P50 threshold. (b) Overlaps of parental genes among the four species. ‘Overlap’ refers to the orthologous genes. The different coloured numbers represent the orthologous gene counts in each of the four species. (c) Major gene ontology terms of the overlapped genes. (d) One representative chimeric RNA where both the 5′ and 3′ parental genes are homologous in rice and maize. The 5′ gene is coloured in blue, while the 3′ gene is purple; the darkened region represents the sequence corresponding to the chimeric transcript; the red lines correspond to the PCR bands on the right panel, which have been confirmed by sanger‐sequencing.

Scanning the parental genes across the four species discovered a collection of orthologous genes (Figure 3b; Tables S9–S12). Gene Ontology (GO) categories analysis showed that these genes are involved in fundamental metabolic and environmental response processes, such as protein metabolic, nucleotide binding, response to stress or defence, and signal transduction (Figure 3c). The appearance of chimeric RNAs could be potentially important for expanding biological functionality in these processes.

In rice, there are totally 58 parental genes with their orthologues undergoing chimeric events in the other three species, mainly involved in defence response (18 genes) and protein kinase activity (24 genes) (Figure 3b; Tables S9–S12). For instance, for a chimeric RNA Os01g0837900OsMTP11/ Zm00001d042938ZmMTP11, both 5′ and 3′ parental genes exhibited conservation between rice and maize. We confirmed the full‐length chimeric transcripts through RT‐PCR and Sanger sequencing. The 5′ portions of the fusion transcripts encompass the entire CDS of Os01g0837900 and Zm00001d042938, encoding serine/threonine‐protein kinase, whereas the 3′ segments are provided by OsMTP11 and ZmMTP11, contributing to the 3′ sequences of the novel chimeric transcripts (Figure 3d).

Impact of chimeric RNA on parental gene expression levels

We noted that a considerable number of chimeric RNAs showed expression levels equal to or higher than their parental genes. For instance, in rice, 38% of fusion transcripts exceed the expression levels of their 5′ parental transcripts, while 26% exceed those of the 3′ parental transcripts (Figure 4a). The substantial expression level of chimeric transcripts implies their biological significance.

Figure 4.

Figure 4

Chimeric RNAs affect the expression of parental genes. (a) Cumulative frequency of the ratio of expression levels between chimeric transcripts and their parental genes. PG: parental gene. (b) Scatterplot shows expression levels [log2(TPM + 1)] of parental genes in samples with (PG‐F) and without (PG‐N) corresponding fusions. Coloured dots indicate fusions that caused significant changes in parental gene expression (fold change > = 2). The p value represents the significance of the number of red dots (chi‐square test, where the same number of genes without chimeric events were randomly selected in the genome as a control, and repeated 30 times).

Chimeric transcripts are commonly regarded to operate by influencing parental genes or creating new chimeric proteins (Chwalenia et al., 2017). In regard to influencing parental genes, for each fusion, we compared the expression levels of parental genes between samples with this fusion and those without it, and observed that most of the fusions (51%~57%) altered their parental gene expression by more than twofold difference (Figure 4b; Figure S6). This result emphasized the profound impact of chimeric RNAs on the expression of their parental genes in plants.

Potential for encoding novel functional proteins

To analyze the poterntial of chemeric RNAs in geneating novel proteins, we deduced the composition of chimeric produces according to the breakpoint site in the parental genes (Figure S7a). If both parental gene sequences potentially participated in coding and result in a new fusion protein, the chimeric RNA is referred to as NPF (novel protein potential fusion). Totally, there were 567, 1296, 1386, and 129 NPFs in rice, maize, soybean, and Arabidopsis, respectively (Figure S7b).

Among our 45 verified fusions, one NPF in rice, OsNDC1‐OsGID1L2, was selected for follow‐up investigations. OsNDC1 encodes a NADH dehydrogenase, while OsGID1L2 encodes a putative gibberellin receptor. The chimeric transcript of OsNDC1‐OsGID1L2 was found in most tissues of all the rice accessions we examined (Figure 5a). The breakpoint occurs at 27 bp before the stop codon of OsNDC1 and at the 45th base of the CDS of OsGID1L2. Through liquid chromatography–tandem mass spectrometry (LC–MS/MS) assay, we successfully detected cross‐junction peptide segments of the chimeric protein. Additionally, we identified parental gene‐specific peptide fragments, confirming that this fusion transcript indeed encodes a novel protein distinct from parental genes (Figure 5a,b). We also employed 3′RACE and 5′RACE to obtain the full‐length sequences of chimeric transcripts, revealing three transcripts of OsNDC1OsGID1L2 fusion. One thranscript fusion‐03 inherited all exons from both parental genes, while the other two loss 228 bp (fusion‐01) and 474 bp (fusion‐02) of OsGID1L2 sequence (Figure 5a). Since fusion‐01 was the major band in RACE, our subsequent analysis in this study focued on fusion‐01.

Figure 5.

Figure 5

An example of chimeric RNAs generating novel proteins. (a) Schematic diagram and validation of the chimeric event OsNDC1‐OsGID1L2. Left: RACE products; right: schematic of the fusion event, depicting the regions corresponding to chimeric transcripts with deepened shades of colour. Grey, orange, and yellow bands indicate shared, parent‐specific, and chimera‐specific peptide segments, respectively, which were detected through mass spectrometry. (b) Spectrum of LC–MS/MS of the peptide across the junction (the yellow bands in a). (c) Protein structures predicted by AlphaFold2 for the proteins encoded by OsNDC1 (in yellow), OsGID1L2 (in blue), and the chimeric transcript fusion‐01(in red), respectively. (d) Subcellular location of the fusion protein OsNDC1‐OsGID1L2 in rice protoplasts. eGFP was fused to the C terminal of OsNDC1, OsGID1L2, and OsNDC1–OsGID1L2, respectively. Bar = 10 μm. eGFP: eGFP fluorescent signal; RFP: chloroplast auto‐fluorescent signal; BF: bright field; Merge: overlapping of eGFP, RFP, and BF. (e) Tissue expression patterns of OsNDC1, OsGID1L2, and OsNDC1–OsGID1L2. (f) OsNDC1–OsGID1L2 overexpressing rice exhibited an earlier heading date. Bar = 10 cm. Values indicate means ± SD, ** indicate significance levels of P < 0.05 (two‐sided Student's t‐test).

The structures of the two parental proteins and the chimeric protein were predicted using Alpha‐fold2 (Jumper et al., 2021), displaying the chimeric protein that resembles a complex assembled from the two parental proteins (Figure 5c; Figure S8a). We fused OsNDC1, OsGID1L2, and the chimeric protein OsNDC1‐OsGID1L2 to eGFP and then transiently expressed the three constructs in rice protoplasts to survey their subcellular localizations. The original protein OsNDC1 displayed the chloroplast localization, while the strong OsGID1L2 signal dispersed in the protoplast cells. The chimeric protein OsNDC1‐OsGID1L2 was primarily located in chloroplasts, cytoplasm, and plasma membrane (Figure 5d; Figure S8b), which differs from the parent proteins.

We examined the tissue expression patterns of OsNDC1‐OsGID1L2 and two parent genes in Nipponbare using qPCR. Overall, the expression levels of the chimeric RNA were closer to one of the parent genes with higher expression. Interestingly, in the flag leaf at heading date, the expression level of OsNDC1‐OsGID1L2 was significantly higher than that of the two parent genes (Figure 5e). Subsequently, we generated transgenic plants expressing p35S⸬OsNDC1‐OsGID1L2 in the Nipponbare background. The heading date in T1 transgenic rice was significantly earlier compared to the wild type (Figure 5f; Figure S8c). These results offer insights for exploring the function of chimeric RNA. Further comprehensive transgenic studies can help reveal the biological function of the NPFs.

Intra‐specific diversity of chimeric RNAs

We further analysed the chimeric RNA features across diverse accessions in each organism. To overcome false negatives, fusions detected in different tissues of the same accessions were combined. That is, if a fusion was detected in any one tissue, it was considered as presented in that specific accession. Due to the limited number of tissues profiled in Arabidopsis ecotypes, this section will not include analysis related to Arabidopsis.

Using rice as an illustration, a total of 44 accessions encompasses all the major O. sativa subpopulations (including indica, temperate japonica, tropical japonica intermediate, aus, and basmati) and O. glaberrima. The fusion numbers vary from 69 to 165 in these accessions (Figure 6a). A moderate correlation exists between the fusion quantity and the depth of sequencing (Pearson r ranging from 0.48 to 0.78, Figure S9). Two accessions, DG and LJ, exhibiting the highest sequencing depth, displayed the greatest number of fusions. The number of NPFs spans from 36 to 73, while the range for S/S type fusions is between 8 and 29. There is a significant excess of intrachromosomal fusions over interchromosomal ones, while the proportion of interchromosomal fusions displays significant variations among different accessions (Figure 6a). Interestingly, in maize and soybean, when considering all accessions together with redundancies excluded, interchromosomal fusions are more common (Figure 1d). However, within each individual accession, intrachromosomal fusions prevail (Figures S10a and S11a). This is because intrachromosomal fusions tend to be shared between accessions, while interchromosomal fusions are more specific. Thus, after merging, the opposite trend emerges.

Figure 6.

Figure 6

The diversity of chimeric RNAs is concordant with the genetic changes of rice accessions. (a) The profile of chimeric RNAs among accessions. The phylogenetic tree of 44 rice accessions was constructed based on whole‐genome SNPs. Subpopulations were differentiated by colours: red for indica, blue for japonica, pink for intermediate, purple for aus, orange for basmati, yellow for O. glaberrima. Numbers of fusion events are indicated within parentheses following the accession names. (b) The prevalence of recurrent chimeric RNAs in rice accessions. The x‐axis represents the number of accessions. (c) Similarity matrix of accessions based on the numbers of shared fusion events. (d) Unsupervised hierarchical clustering of 16 rice accessions based on the presence or absence status of fusions in each accession. A collection of all fusion events identified in 16 accessions was used for the binary analysis [1 indicating presence (yellow), 0 indicating absence (blue)].

Subsequently, we analysed the specificity of chimeric RNAs among accessions. In rice, 67.2% of fusions (719 of 1070) are unique to a single accession, with 391 of them being NPFs. Only 1.3% of fusions (14 of 1070) are shared by all 44 accessions (Figure 6b). The constructed similarity matrix, born out of the tally of shared fusions among accessions, displayed the diversity prevalent across the majority of the accessions, while a relatively high similarity level within certain subpopulations, particularly among temperate japonica (Figure 6c). The rich diversities of chimeric RNAs were also observed among maize and soybean accessions (Figures S10c and S11c).

In order to assess whether the diversity of fusions is attributed to the stochastic nature or to interconnections among accessions, we carried out unsupervised clustering analysis based on the presence or absence status of fusions in each accession. As transcriptome data of 44 rice accessions originated from two panels, we conducted two separate clustering analyses to avoid potential technical interferences. Rice accessions belonging to the same subpopulation were precisely clustered together (Figure 6d; Figure S12). The clustering patterns of maize and soybean based on fusions were also generally consistent with those based on genomic SNPs (Figures S10d and S11d). This observation suggests that the diversity of chimeric RNAs among accessions is not attributed to randomness, but rather closely related to genetic variations, implying their role in the domestication and environmental adaptation of plants.

Gene fusions at DNA level

While the majority of chimeric RNAs arise from ‘non‐canonical’ transcription events, we also identified a total of 18 chimeric transcripts generated from gene fusions that occur due to structural variations (SVs) among accessions (Table S13). Gene fusions in Oryza were reported to give rise to novel genes and then contribute to phenotype evolution (Zhou et al., 2022). Our previous work in maize demonstrated a gene fusion caused by a large deletion in genome alters organ morphology through affect the function of its parental gene (Wang et al., 2023). Here, the majority of fusions we identified are caused by large deletions (Table S13). For example, in almost all japonica rice (except G46) as well as aus, a 22.8 kb deletion leads to an S/S‐type fusion between the first five exons of Os03g0679300 (encoding a Ubiquitin‐conjugating enzyme) and the last two exons of the neighbouring gene Os03g0679100 (Figure 7a). Moreover, in several japonica and aus accessions, OsCCR26 encoding an NAD(P)‐binding protein fuses with Os01g0978500 due to a 3.6 kb deletion, which results in the formation of Os01g0978400‐Os01g0978500 (Figure 7b). In these cases, the presence of structural variations (SVs) in the genome and the sequence of the fusion transcript junctions were all confirmed through PCR and Sanger sequencing.

Figure 7.

Figure 7

Schematic diagram and validation of gene fusions at DNA level. (a–b) Two gene fusions caused by large deletions in genome. (c) An example of gene fusion Os11g0106200–Os12g0105800 caused by DNA rearrangement. (d) A convergent chimeric event. Os01g0216500 fused to Os01g0216900 due to a 12 241 bp deletion on Chr1 in Basmati rice, whereas there is no DNA‐level fusion between these two genes in the accession N22, which also produced a chimeric RNA identical to Basmati. Mid panel: qPCR of the expression level of parental genes and chimeric transcript in N22 and Basmati. In (a–d): Red lines in the schemas indicate the PCR bands in the electrophoretic gel panel. The regions corresponding to chimeric transcripts are in deepened shades of colour. Breakpoints in the parental genes are indicated by blue stars.

In addition to deletion‐induced gene fusions, a few gene rearrangement resulted gene fusion was also identified. For instance, a gene fusion Os12g0105800–Os11g0106200 was caused by a rearrangement event present in all 34 of the 44 rice accessions. A ~1.8 kb fragment on chromosome 12, which includes a portion of Os12g0105800, was copied and inserted into the coding region of gene Os11g0106200 on chromosome 11 (Figure 7c). Os12g0105800 encodes a tyrosine‐protein kinase domain containing protein, while Os11g0106200 encodes an outer membrane protein. The biological function of this chimera requires further investigation.

Convergent chimeric events caused by different mechanisms

There were reports of similar chimeric RNAs generated by trans‐splicing and gene rearrangements in normal and tumour cells, respectively (Li et al., 2008; Rickman et al., 2009). This raising the question of whether chimeric RNAs generated from trans‐splicing could predispose to DNA recombination. Intriguingly, we detected a similar case of chimeric RNA among rice accessions. In the accession Basmati genome, a deletion of 12 241 bp occurred on chromosome1 resulted the fusion of Os01g0216500 and Os01g0216900. In the genome of N22, there was not the 12 241 bp deletion but only a 4332 bp deletion distant from the breakpoints. No gene fusion was observed between Os01g0216500 and Os01g0216900. However, we discovered the exact same chimeric transcript in N22 transcriptome as that in Basmati. Through qPCR, we evaluated the expression level of this chimeric transcript and its parental genes. The results showed that chimera of Os01g0216500–Os01g0216900 in N22 exhibited even higher expression level compared to Basmati (Figure 7d). The detection of convergent chimeric events in rice accessions, generated with and without genomic variations, reaffirms the complex relationship between chimeric RNAs and genetic changes.

Discussion

As the advances of sequencing technology, a number of chimeric RNAs have been identified in diverse organisms, illuminating they are not only unique in tumours but also present in normal human tissues and a wide range of species (Babiceanu et al., 2016; Xie et al., 2016). However, apart from comprehensive analysis of chimeric RNA in humans, most of the focus is on the identification and description of fusions within 1–2 accession(s) of a particular species (Singh et al., 2019; Wang et al., 2016; Zhang et al., 2010). In this study, leveraging the wealth of plant pan‐genome and transcriptome data generated very recently, we systematically profiled chimeric RNAs in numerous accessions of four plant species, analysed their conservation across organisms and the diversity across accessions, thereby delved into the significance of these chimeric RNAs in two dimensions: short‐term domestication and long‐term evolution.

Through cross‐organism analysis, we have found conserved characteristics in the breakpoints and parental genes of chimeric RNAs. In the across‐accession analysis, we demonstrated a global association between the chimeric RNAs and genomic variations in plants.

Apart from mRNA, chimeric event has also been reported to occur between long noncoding RNA (lncRNA) in cancer. These chimeric events are positively correlated with tumour stemness and DNA damage (Guo et al., 2020). Based on the transcriptome data we used, we also identified a considerable number of chimeric events occurring between lncRNAs or between lncRNAs and mRNAs in rice, maize, and Arabidopsis (Figure S13), apart from unannotated lncRNAs in soybean (Liu et al., 2020). This indicates that lncRNA fusion is a common phenomenon that also exists in plants. The functions of these chimeric transcripts require further research and analysis.

We discovered an intriguing phenomenon where completely identical chimeric transcripts are produced in two different rice accessions through different mechanisms. In Basmati, a deletion of 12 kb in the genome leads to the fusion of two adjacent genes, generating chimeric RNA Os01g0216500–Os01g0216900. On the other hand, in another accession N22, the two genes do not undergo DNA‐level fusion, but the completely same chimeric RNA was detected. This bears a striking resemblance to previous findings of convergent chimeric RNAs in human benign and tumour tissues, which raises the possibility that chimeric RNAs, generated through aberrant splicing, may predispose DNA at those specific sites to recombination. While there are more and more evidence supporting the hypothesis of RNA‐mediated genome rearrangement, the precise mechanisms underlying this phenomenon remain unclear (Fang and Landweber, 2013; Nowacki et al., 2008; Yan et al., 2019). The abundance of genetic material and ease of genetic manipulation in plants provide new possibilities for further investigating the underlying mechanism.

Moreover, different proteins with related functions in one organism could be fused into a single protein in another organism (Gao et al., 2022; Marcotte et al., 1999). For example, the DUC1 protein is a dual orange/far‐red and blue light photoreceptor found in marine phytoplankton, Pycnococcus provasolii. It is a chimeric protein composed of two segments that are similar to the blue light receptor cryptochrome 2 (CRY2) and the red‐light receptor phytochrome B (PHYB) in Arabidopsis (Makita et al., 2021). This indicates that chimeric proteins can achieve the functions of both parental proteins. We also analysed a chimeric transcript, OsNDC1‐OsGID1L2, through RACE and protein spectrometry, revealing that it encompasses nearly the entire coding regions of both parent genes, forming a chimeric protein resembling the protein complex of the parental genes. In rice protoplasts, chimeric protein OsNDC1‐OsGID1L2 is localized in chloroplasts (where OsNDC1 is located), as well as in cytoplasm and plasma membrane (where OsGID1L2 is located). This phenomenon also implies that some chimeric proteins probably inherit the characteristics of both parents. Further researches, including overexpression and knockout of the parent genes as controls, along with knockout materials of OsNDC1‐OsGID1L2 targeted fusion sites are needed to explore the functionality of the chimeric proteins.

Conclusion

Our work systemically characterized the chimeric RNA profiles across plant accessions, suggesting the functional and evolutionary significance of chimeric RNAs in plants.

Experimental procedures

Data sources

Sources for genome assemblies for accessions of the four plants include rice (http://ricerc.sicau.edu.cn/RiceRC/download/downloadBefore and DOI:10.6084/m9.figshare.18972851), maize (https://maizegdb.org/NAM_project), soybean (NGDC, PRJCA002030), and Arabidopsis (https://1001genomes.org/).

For transcriptome, the raw transcriptome reads of our own 16 rice accessions have been deposited into EBI (https://www.ebi.ac.uk/ena/browser/home) under the bioproject accession number PRJEB45847. RNAseq data for the other 28 rice accessions (PRJCA002103) and all soybean accessions (PRJCA002030) were downloaded from the National Genomics Data Center (NGDC). RNAseq data for maize were obtained from EBI (PRJEB36014), and RNAseq data of Arabidopsis from NCBI (PRJDB4993, PRJEB8427, PRJNA231089, PRJNA264397, PRJNA292478, PRJNA319904, PRJNA339285, PRJNA361532, PRJNA371597, PRJNA387601, PRJNA392252, PRJNA415634).

Annotation files and reference genome information were downloaded from Ensembl (http://plants.ensembl.org/, IRGSP‐1.0 for rice, AGPv4 for maize, Glycine_max_v2.0 for soybean) and TAIR (https://www.arabidopsis.org/, TAIR10).

Single‐cell transcriptome data of Nipponbare leaf and root were downloaded from the National Genomics Data Center (NGDC, CRX235964, CRX235965, CRX235966, CRX235967).

Detailed information was elaborated in Tables S1–S4.

Chimeric RNA detection

To identify chimeric RNAs in plants, we used Arriba (Uhrig et al., 2021) with parameters ‘‐R 1000 ‐f blacklist’ to detect fusion transcripts from RNA sequencing data, which were mapped to the reference genome (IRGSP‐1.0 for rice, AGPv4 for maize, Glycine_max_v2.0 for soybean, TAIR10 for Arabidopsis) using STAR (v2.7.9a; Dobin et al., 2013) for Arabidopsis and Hisat2 (v2.1.0; Kim et al., 2019) with default parameters. The output files were filtered using the following criteria: (a) the mapping quality of reads should be 60 and 255 for Hisat2 and STAR, respectively; (b) the length of the alignment match should not be shorter than the length of the reads by more than 2 bp; (c) the breakpoints must in the gene region (encompassing all the exons and introns); (d) the predicted transcript sequences must be supported by at least five supporting unique reads; (e) to prevent false positives caused by homologous sequences, we also ensured the junction reads do not belong to any transcripts of parental genes, and the split reads do not belong to both transcripts of parental genes. Subsequently, the resulting fusion events were used for the further analysis, and the parental gene pairs were shown in Circos plot (Krzywinski et al., 2009).

To detect chimeric RNAs in single‐cell transcriptome, scRNA‐seq raw reads (Wang et al., 2021a,b) were aligned to the reference genome using STAR, and chimeric events were identified according to the description above. Cell classification and annotation were performed as described by Wang et al. The raw expression matrix was obtained using Cell Ranger, and then the unique molecular identifier (UMI) count matrix was transformed into Seurat objects using the R package Seurat. The top 2000 genes were identified as highly variable genes and were used for the principal component analysis. The top 50 principal components were used for Harmony, followed by clustering and dimensionality reduction. The results were shown on the UMAP by DimPlot and FeaturePlot.

Plant material and fusions validation

All the plant materials for DNA and RNA extraction were prepared according to their original literature reports. Specifically, we collected germinated seeds (after 48 h soaking), filling seeds (15 days after flowering), flag leaf (2–3 days before heading date), young leaf (38‐day‐old plants), and young roots (38‐day‐old plants) of 16 rice accessions of our lab (Tables S1–S4). Anthers of three maize accessions (CML103, CML277, and CML322) were also collected. These plants were grown under natural conditions in fields located in Shanghai. The other 28 rice accessions and p35S⸬OsNDC1‐OsGID1L2 transgenic rice were grown in a greenhouse under conditions of 28 °C with a 16‐h light/8‐h dark cycle. Afterwards, the shoots and roots were harvested. The rosette leaves of Arabidopsis Landsberg erecta ecotype were collected after grown at 22 °C under 16‐h light/8‐h dark for 14 days.

For transgene constructs, the full CDS of OsNDC1‐OsGID1L2 (fusion‐01) was amplified from cDNA of Nipponbare flag leaf at heading stage and introduced into the pCAMBIA1300 vector to generate p35S⸬OsNDC1‐OsGID1L2. The transgenic rice plants were generated via Agrobacterium‐mediated transformation as previously described (Hiei et al., 1994). The relevant primer sequences are presented in Table S14.

Total RNA was prepared with QIAzol Lysis Reagent, and then cDNA was synthesized using PrimeScript™ RT reagent Kit with gDNA Eraser (Takara). The fragments spanning the junctions of candidate fusions were amplified using primers listed in Table S14. Subsequently, they were sequenced by Sanger sequencing.

Expression‐level analysis

The gene expression levels based on Transcripts Per Million (TPM) were analysed using StringTie (v1.3.5; Pertea et al., 2016) with the default parameters. Percentile rank of 5′ or 3′ parental gene for each fusion event was calculated in the corresponding sample. To assess the impact of chimeric RNAs on the expression of parental genes, we compared the average expression of parental genes in samples with the corresponding fusion and the average expression level in the other samples. The chi‐square test was employed to assess the significance of the number of genes showing changes, where randomly selected non‐fusion genes in genome as the control. This process was conducted 30 times to ensure the credibility of the results.

To compare the expression level between chimeric RNAs and their parental genes, we recalculated the RPKM values of the parental genes, considering only the reads within the transcription units that correspond to the fusion events. The RPKM values of chimeric transcripts were calculated by summing the counts of junction and split reads.

Homologues identification

To identify the orthologous genes cross‐organism, we used BLASTP (v2.7.1) with e‐value threshold of 1e−20. Furthermore, genes with the same Pfam detected by InterProScan (v5; Jones et al., 2014) were regarded as homologous genes.

RACE and qPCR experiment

Rapid amplification of cDNA ends (RACE) assay was employed to obtain the full‐length sequences of chimeric transcripts. cDNAs of samples with the candidate fusion events were used as the template for 3′‐RACE and 5′‐RACE with SMARTer® RACE 5′/3′ Kit (Takara) according to the manufacturer's instructions. Subsequently, all of the amplified fragments were recovered by gel electrophoresis and subjected to Sanger sequencing.

qPCR was performed using TB Green® Premix Ex Taq™ II (TaKaRa) by LightCycler® 480 (Roche Diagnostics). The OsACTIN1 gene was amplified as an internal control for normalization.

Primers for amplification are listed in Table S14.

LC–MS/MS assay

The plant tissues (100 mg/sample) were cryogenically ground, sonicated, and precipitated by methanol–chloroform, and then dissolved in 8 M urea. After digestion with Trypsin (Promega), peptides were desalted and pre‐fractionation by a high‐pH reverse‐phase spin column into eight fractions according to the manufacturer's user guide (Pierce, 84 868). The resulting fractions were analysed by reversed‐phase C18 column connected to an Easy‐nLC 1200 HPLC system. The eluted peptides were ionized and introduced into an Orbitrap eclipse Tribrid mass spectrometry (Thermo). Full MS spectra were acquired using the precursor ion scan with the Orbitrap analyser (resolution r = 60 000 at m/z 200) in top‐speed DDA mode. MS/MS events in Orbitrap analysis had a resolution of r = 15 000 at m/z 200. The ions were sequentially isolated and fragmented under an HCD mode (normalized collisional energy of 30%). Database searching was performed by Proteome Discoverer (version 2.1) in UniProtKB Oryza Sativa database (122 128 entries) combined with a predicted fusion protein database (563 entries). The mass tolerances for precursor ions and MS/MS were set to 20 ppm and 0.02 Da, respectively.

Subcellular localization observation

The CDS of OsNDC1, OsGID1L2, and the chimeric transcript fusion‐01 were cloned into pA7 vector. 35S⸬OsNDC1‐eGFP, 35S⸬OsGID1L2‐eGFP, 35S⸬OsNDC1‐OsGID1L2‐eGFP, and 35S⸬eGFP‐OsNDC1‐OsGID1L2 were transformed to rice protoplasts derived from 12 days seedlings (Nipponbare). The fluorescent signal was detected at 14–20 h after transfection by laser confocal microscope.

Identification of gene fusions at DNA level

To detect the SVs in genome, we used minigraph (https://github.com/lh3/minigraph; Li et al., 2020) and SYRI (https://github.com/schneebergerlab/syri; Goel et al., 2019) with the default parameters. We only focused on SVs that may have a direct impact on fusions, which included two possible forms: (a) the boundary of SV precisely corresponds to the breakpoints of chimeric transcripts; (b) the breakpoints of chimeric transcripts were located in the splice sites, while the boundaries of the SV fall within the intronic region between these splice sites.

Accession numbers

The raw transcriptome reads of 16 rice accessions have been deposited into EBI (https://www.ebi.ac.uk/ena/browser/home) under the bioproject accession number PRJEB45847.

Conflict of interest

The authors declare no competing interests.

Author contributions

X.Z. and X.H. conceived the project. X.Z. designed the studies. J.Q. and Q.Z. contributed to the transcriptome sequencing of the 16 rice accessions. J.C. conducted the bioinformatics analyses. S.Z., Q.Z., J.H., and X.Y. performed molecular experiments. X.Z., J.C., and X.W. wrote the manuscript.

Supporting information

Figure S1 The abundance of chimeric RNA in single‐cell level.

Figure S2 The relationship between the frequency of chimeric events and gene distance and gene density.

Figure S3 Validation of chimeric RNAs in rice and Arabidopsis.

Figure S4 Correlations between 5′ and 3′ parental gene expression levels.

Figure S5 Correspondence of expression levels between 5′ and 3′ parental gene expression levels.

Figure S6 Test values of 30 times chi‐squared test.

Figure S7 Identification of the NPFs.

Figure S8 The predicted protein structure and subcellular location of the fusion protein OsNDC1‐OsGID1L2.

Figure S9 Correlations between the sequencing depth and the number of identified chimeric RNAs.

Figure S10 The diversity of chimeric RNAs is associated with genetic changes of maize accessions.

Figure S11 The diversity of chimeric RNAs is associated with genetic changes of soybean accessions.

Figure S12 Unsupervised hierarchical clustering of the other 28 rice accessions.

Figure S13 The number of ncRNA‐involved chimeric RNAs.

PBI-22-3151-s002.docx (8.4MB, docx)

Tables S1–S4 Sample information.

Tables S5–S8 List of fusions identified in four plants.

Tables S9–S12 Homologous relationship of parental genes across four plants.

Table S13 Chimeric events caused by gene fusions.

Table S14 Primers used in this study.

PBI-22-3151-s001.xlsx (438KB, xlsx)

Acknowledgements

We thank Prof. Wenqin Wang (Shanghai Normal University) and Prof. Yongrui Wu (Chinese Academy of Sciences) for providing the plant materials of maize. We thank Jie Liu (Shanghai Normal University) and Qin Wang (Shanghai Normal University) for assistance in preparation of rice samples for transcriptome sequencing. This work was supported by National Key Research and Development Program of China (2023ZD04073), National Natural Science Foundation of China (32370671), and Natural Science Foundation of Shanghai (22ZR1445800). [Correction added on 6 August 2024, after first online publication: the funding information is updated in this version.]

Contributor Information

Jie Qiu, Email: qiujie@shnu.edu.cn.

Xiaoyi Zhou, Email: zhouxy@shnu.edu.cn.

Data availability statement

The data that support the findings of this study are openly available in EBI at https://www.ebi.ac.uk/ena/browser/home, reference number PRJEB45847.

References

  1. Babiceanu, M. , Qin, F.J. , Xie, Z.Q. , Jia, Y.M. , Lopez, K. , Janus, N. , Facemire, L. et al. (2016) Recurrent chimeric fusion RNAs in non‐cancer tissues and cells. Nucleic Acids Res. 44, 2859–2872. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Christie, E.L. , Pattnaik, S. , Beach, J. , Copeland, A. , Rashoo, N. , Fereday, S. , Hendley, J. et al. (2019) Multiple ABCB1 transcriptional fusions in drug resistant high‐grade serous ovarian and breast cancer. Nat. Commun. 10, 1295. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Chwalenia, K. , Facemire, L. and Li, H. (2017) Chimeric RNAs in cancer and normal physiology. Wiley Interdiscip. Rev. RNA, 8:e1427. [DOI] [PubMed] [Google Scholar]
  4. Clauw, P. , Coppens, F. , De Beuf, K. , Dhondt, S. , Van Daele, T. , Maleux, K. , Storme, V. et al. (2015) Leaf responses to mild drought stress in natural variants of Arabidopsis. Plant Physiol. 167, 800–816. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Dobin, A. , Davis, C.A. , Schlesinger, F. , Drenkow, J. , Zaleski, C. , Jha, S. , Batut, P. et al. (2013) STAR: ultrafast universal RNA‐seq aligner. Bioinformatics, 29, 15–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Elfman, J. , Pham, L.P. and Li, H. (2020) The relationship between chimeric RNAs and gene fusions: potential implications of reciprocity in cancer. J. Genet. Genomics, 47, 341–348. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Fang, W. and Landweber, L.F. (2013) RNA‐mediated genome rearrangement: hypotheses and evidence. BioEssays, 35, 84–87. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Finta, C. and Zaphiropoulos, P.G. (2002) Intergenic mRNA molecules resulting from trans‐splicing. J. Biol. Chem. 277, 5882–5890. [DOI] [PubMed] [Google Scholar]
  9. Gan, X.C. , Stegle, O. , Behr, J. , Steffen, J.G. , Drewe, P. , Hildebrand, K.L. , Lyngsoe, R. et al. (2011) Multiple reference genomes and transcriptomes for Arabidopsis thaliana . Nature, 477, 419–423. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Gao, M. , Nakajima An, D. , Parks, J.M. and Skolnick, J. (2022) AF2Complex predicts direct physical interactions in multimeric proteins with deep learning. Nat. Commun. 13, 1744. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Gingeras, T.R. (2009) Implications of chimaeric non‐co‐linear transcripts. Nature, 461, 206–211. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Goel, M. , Sun, H.Q. , Jiao, W.B. and Schneeberger, K. (2019) SyRI: finding genomic rearrangements and local sequence differences from whole‐genome assemblies. Genome Biol. 20, 277. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Guo, M. , Xiao, Z.D. , Dai, Z. , Zhu, L. , Lei, H. , Diao, L.T. and Xiong, Y. (2020) The landscape of long noncoding RNA‐involved and tumor‐specific fusions across various cancers. Nucleic Acids Res. 48, 12618–12631. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Gupta, S.K. , Luo, L. and Yen, L. (2018) RNA‐mediated gene fusion in mammalian cells. Proc. Natl. Acad. Sci. USA, 115, E12295–E12304. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Hanahan, D. and Weinberg, R.A. (2011) Hallmarks of cancer: the next generation. Cell, 144, 646–674. [DOI] [PubMed] [Google Scholar]
  16. Hiei, Y. , Ohta, S. , Komari, T. and Kumashiro, T. (1994) Efficient transformation of rice (Oryza sativa L.) mediated by Agrobacterium and sequence analysis of the boundaries of the T‐DNA. Plant J. 6, 271–282. [DOI] [PubMed] [Google Scholar]
  17. Hufford, M.B. , Seetharam, A.S. , Woodhouse, M.R. , Chougule, K.M. , Ou, S.J. , Liu, J.N. , Ricci, W.A. et al. (2021) De novo assembly, annotation, and comparative analysis of 26 diverse maize genomes. Science, 373, 655–662. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Jia, Y.M. , Xie, Z.Q. and Li, H. (2016) Intergenically spliced chimeric RNAs in cancer. Trends Cancer, 2, 475–484. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Jones, P. , Binns, D. , Chang, H.Y. , Fraser, M. , Li, W. , McAnulla, C. , McWilliam, H. et al. (2014) InterProScan 5: genome‐scale protein function classification. Bioinformatics, 30, 1236–1240. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Jumper, J. , Evans, R. , Pritzel, A. , Green, T. , Figurnov, M. , Ronneberger, O. , Tunyasuvunakool, K. et al. (2021) Highly accurate protein structure prediction with AlphaFold. Nature, 596, 583–589. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Kawakatsu, T. , Huang, S.S.C. , Jupe, F. , Sasaki, E. , Schmitz, R.J. , Urich, M.A. , Castanon, R. et al. (2016) Epigenomic diversity in a global collection of Arabidopsis thaliana accessions. Cell, 166, 492–505. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Kim, D. , Paggi, J.M. , Park, C. , Bennett, C. and Salzberg, S.L. (2019) Graph‐based genome alignment and genotyping with HISAT2 and HISAT‐genotype. Nat. Biotechnol. 37, 907–915. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Krzywinski, M. , Schein, J. , Birol, I. , Connors, J. , Gascoyne, R. , Horsman, D. , Jones, S.J. et al. (2009) Circos: an information aesthetic for comparative genomics. Genome Res. 19, 1639–1645. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Lei, Q. , Li, C. , Zuo, Z.X. , Huang, C.H. , Cheng, H.H. and Zhou, R.J. (2016) Evolutionary Insights into RNA trans‐splicing in vertebrates. Genome Biol. Evol. 8, 562–577. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Li, H. , Wang, J.L. , Mor, G. and Sklar, J. (2008) A neoplastic gene fusion mimics trans‐splicing of RNAs in normal human cells. Science, 321, 1357–1361. [DOI] [PubMed] [Google Scholar]
  26. Li, X. , Zhao, L. , Jiang, H.F. and Wang, W. (2009) Short homologous sequences are strongly associated with the generation of chimeric RNAs in eukaryotes. J. Mol. Evol. 68, 56–65. [DOI] [PubMed] [Google Scholar]
  27. Li, H. , Feng, X. and Chu, C. (2020) The design and construction of reference pangenome graphs with minigraph. Genome Biol. 21, 265. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Liu, Y.C. , Du, H.L. , Li, P.C. , Shen, Y.T. , Peng, H. , Liu, S.L. , Zhou, G.A. et al. (2020) Pan‐genome of wild and cultivated soybeans. Cell, 182, 162–176. [DOI] [PubMed] [Google Scholar]
  29. Makita, Y. , Suzuki, S. , Fushimi, K. , Shimada, S. , Suehisa, A. , Hirata, M. , Kuriyama, T. et al. (2021) Identification of a dual orange/far‐red and blue light photoreceptor from an oceanic green picoplankton. Nat. Commun. 12, 3593. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Marcotte, E.M. , Pellegrini, M. , Ng, H.L. , Rice, D.W. , Yeates, T.O. and Eisenberg, D. (1999) Detecting protein function and protein‐protein interactions from genome sequences. Science, 285, 751–753. [DOI] [PubMed] [Google Scholar]
  31. Mertens, F. , Johansson, B. , Fioretos, T. and Mitelman, F. (2015) The emerging complexity of gene fusions in cancer. Nat. Rev. Cancer, 15, 371–381. [DOI] [PubMed] [Google Scholar]
  32. Mukherjee, S. and Frenkel‐Morgenstern, M. (2022) Evolutionary impact of chimeric RNAs on generating phenotypic plasticity in human cells. Trends Genet. 38, 4–7. [DOI] [PubMed] [Google Scholar]
  33. Nowacki, M. , Vijayan, V. , Zhou, Y. , Schotanus, K. , Doak, T.G. and Landweber, L.F. (2008) RNA‐mediated epigenetic programming of a genome‐rearrangement pathway. Nature, 451, 153–158. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Ommer, J. , Selfe, J.L. , Wachtel, M. , O'Brien, E.M. , Laubscher, D. , Roemmele, M. , Kasper, S. et al. (2020) Aurora A kinase inhibition destabilizes PAX3‐FOXO1 and MYCN and synergizes with navitoclax to induce rhabdomyosarcoma cell death. Cancer Res. 80, 832–842. [DOI] [PubMed] [Google Scholar]
  35. Pertea, M. , Kim, D. , Pertea, G.M. , Leek, J.T. and Salzberg, S.L. (2016) Transcript‐level expression analysis of RNA‐seq experiments with HISAT, StringTie and Ballgown. Nat. Protoc. 11, 1650–1667. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Qin, P. , Lu, H.W. , Du, H.L. , Wang, H. , Chen, W.L. , Chen, Z. , He, Q. et al. (2021) Pan‐genome analysis of 33 genetically diverse rice accessions reveals hidden genomic variations. Cell, 184, 3542–3558. [DOI] [PubMed] [Google Scholar]
  37. Rickman, D.S. , Pflueger, D. , Moss, B. , VanDoren, V.E. , Chen, C.X. , de la Taille, A. , Kuefer, R. et al. (2009) SLC45A3‐ELK4 is a novel and frequent erythroblast transformation‐specific fusion transcript in prostate cancer. Cancer Res. 69, 2734–2738. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Shaw, A.T. , Yeap, B.Y. , Solomon, B.J. , Riely, G.J. , Gainor, J. , Engelman, J.A. , Shapiro, G.I. et al. (2011) Effect of crizotinib on overall survival in patients with advanced non‐small‐cell lung cancer harbouring ALK gene rearrangement: a retrospective analysis. Lancet Oncol. 12, 1004–1012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Singh, A. , Zahra, S. , Das, D. and Kumar, S. (2019) AtFusionDB: a database of fusion transcripts in Arabidopsis thaliana . Database, 2019, bay135. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Singh, S. , Qin, F.J. , Kumar, S. , Elfman, J. , Lin, E. , Pham, L.P. , Yang, A. et al. (2020) The landscape of chimeric RNAs in non‐diseased tissues and cells. Nucleic Acids Res. 48, 1764–1778. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Sun, Y.A. and Li, H. (2022) Chimeric RNAs discovered by RNA sequencing and their roles in cancer and rare genetic diseases. Genes, 13, 741. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Sutton, R.E. and Boothroyd, J.C. (1986) Evidence for trans splicing in trypanosomes. Cell, 47, 527–535. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Uhrig, S. , Ellermann, J. , Walther, T. , Burkhardt, P. , Frohlich, M. , Hutter, B. , Toprak, U.H. et al. (2021) Accurate and efficient detection of gene fusions from RNA sequencing data. Genome Res. 31, 448–460. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Wang, B. , Tseng, E. , Regulski, M. , Clark, T.A. , Hon, T. , Jiao, Y.P. , Lu, Z.Y. et al. (2016) Unveiling the complexity of the maize transcriptome by single‐molecule long‐read sequencing. Nat. Commun. 7, 11708. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Wang, Y.T. , Zou, Q. , Li, F.J. , Zhao, W.W. , Xu, H. , Zhang, W.H. , Deng, H.T. et al. (2021a) Identification of the cross‐strand chimeric RNAs generated by fusions of bi‐directional transcripts. Nat. Commun. 12, 4645. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Wang, Y. , Huan, Q. , Li, K. and Qian, W. (2021b) Single‐cell transcriptome atlas of the leaf and root of rice seedlings. J. Genet. Genomics, 48, 881–898. [DOI] [PubMed] [Google Scholar]
  47. Wang, Q. , Fan, J.J. , Cong, J. , Chen, M.J. , Qiu, J. , Liu, J. , Zhao, X.Y. et al. (2023) Natural variation of ZmLNG1 alters organ shapes in maize. New Phytol. 237, 471–482. [DOI] [PubMed] [Google Scholar]
  48. Xie, Z.Q. , Babiceanu, M. , Kumar, S. , Jia, Y.M. , Qin, F.J. , Barr, F.G. and Li, H. (2016) Fusion transcriptome profiling provides insights into alveolar rhabdomyosarcoma. Proc. Natl. Acad. Sci. USA, 113, 13126–13131. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Yan, Z. , Huang, N. , Wu, W. , Chen, W. , Jiang, Y. , Chen, J. , Huang, X. et al. (2019) Genome‐wide colocalization of RNA‐DNA interactions and fusion RNA pairs. Proc. Natl. Acad. Sci. USA, 116, 3328–3337. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Yang, M. , Wang, X.C. , Ren, D.Q. , Huang, H. , Xu, M.Q. , He, G.M. and Deng, X.W. (2017) Genomic architecture of biomass heterosis in Arabidopsis . Proc. Natl. Acad. Sci. USA, 114, 8101–8106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Zhang, G.J. , Guo, G.W. , Hu, X.D. , Zhang, Y. , Li, Q.Y. , Li, R.Q. , Zhuang, R.H. et al. (2010) Deep RNA sequencing at single base‐pair resolution reveals high complexity of the rice transcriptome. Genome Res. 20, 646–654. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Zhou, Y.L. , Zhang, C.J. , Zhang, L. , Ye, Q.N. , Liu, N.Y.W. , Wang, M.H. , Long, G.Q. et al. (2022) Gene fusion as an important mechanism to generate new genes in the genus Oryza. Genome Biol. 23, 130. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Figure S1 The abundance of chimeric RNA in single‐cell level.

Figure S2 The relationship between the frequency of chimeric events and gene distance and gene density.

Figure S3 Validation of chimeric RNAs in rice and Arabidopsis.

Figure S4 Correlations between 5′ and 3′ parental gene expression levels.

Figure S5 Correspondence of expression levels between 5′ and 3′ parental gene expression levels.

Figure S6 Test values of 30 times chi‐squared test.

Figure S7 Identification of the NPFs.

Figure S8 The predicted protein structure and subcellular location of the fusion protein OsNDC1‐OsGID1L2.

Figure S9 Correlations between the sequencing depth and the number of identified chimeric RNAs.

Figure S10 The diversity of chimeric RNAs is associated with genetic changes of maize accessions.

Figure S11 The diversity of chimeric RNAs is associated with genetic changes of soybean accessions.

Figure S12 Unsupervised hierarchical clustering of the other 28 rice accessions.

Figure S13 The number of ncRNA‐involved chimeric RNAs.

PBI-22-3151-s002.docx (8.4MB, docx)

Tables S1–S4 Sample information.

Tables S5–S8 List of fusions identified in four plants.

Tables S9–S12 Homologous relationship of parental genes across four plants.

Table S13 Chimeric events caused by gene fusions.

Table S14 Primers used in this study.

PBI-22-3151-s001.xlsx (438KB, xlsx)

Data Availability Statement

The data that support the findings of this study are openly available in EBI at https://www.ebi.ac.uk/ena/browser/home, reference number PRJEB45847.


Articles from Plant Biotechnology Journal are provided here courtesy of Society for Experimental Biology (SEB) and the Association of Applied Biologists (AAB) and John Wiley and Sons, Ltd

RESOURCES