Abstract
Wild rice, as the ancestor of cultivated rice, has accumulated a wide range of beneficial traits through prolonged natural selection and evolution. Oryza officinalis, belonging to the CC genome, differs significantly from the AA genome. In this study, we utilized second- and third-generation sequencing, along with Hi-C technology, to assemble the genome of MT10 (O. officinalis). The assembled genome is 552.58 Mb, with contigs and scaffold N50 values of 40.04 and 44.48 Mb, respectively, and 96.73% of the sequences anchored to 12 chromosomes. A total of 33,813 genes were annotated, and repetitive sequences account for 54.24% of the MT10 genome. The number of unique genes in MT10 exceeds that in the O. officinalis genome from Thailand, and their divergence time is estimated at 1.6 million years ago. The MT10 genome exhibits fewer expanded gene families compared to contracted ones, with the expanded families predominantly associated with disease and pest resistance. Comparative genomic analysis of MT10 and Nipponbare reveals sequence variations in biotic and abiotic resistance-related genes. In particular, the presence of R genes and cystatin gene families in MT10 may contribute to its unique insect resistance. Transcriptome analyses indicate that flavonoid biosynthesis and MAPK-related genes are expressed in response to brown planthopper infestation. This study represents the first chromosome-level genome assembly of MT10, providing a reference sequence for the efficient cloning of beneficial genes from O. officinalis, which holds significant potential for the genetic improvement of cultivated rice.
Supplementary Information
The online version contains supplementary material available at 10.1186/s12284-025-00769-5.
Keywords: Oryza officinalis, CC genome, Genome sequencing, Evolution, Resistance
Background
Rice production faces increasing challenges due to population growth, climate change, and various environmental stresses. Wild rice, the progenitor of cultivated varieties, has adapted to different geographical environments and exhibits tolerance to both biotic and abiotic stresses. The genus Oryza includes 27 species, spanning both cultivated and wild varieties characterized by diverse diploid and allopolyploid genomes (Wing et al. 2018). Although numerous studies have focused on the AA-genome of the cultivated species, the traits of non-AA genomic rice are still worthy of attention and hold substantial promise for improving cultivated rice. Specifically, O. officinalis demonstrates strong growth vigour and strong resistance to a range of abiotic and biotic stresses, highlighting its potential as a valuable genetic resource for improving cultivated rice (Henry 2022).
Genome assemblies are fundamental for identifying functional genes and genetic variations. Since the initial draft genomes of two rice subspecies were published in 2002 (Goff et al. 2002), followed by the complete sequencing of the O. sativa japonica rice variety Nipponbare (Nip) in 2005 (International Rice Genome Sequencing Project and Sasaki 2005), sequencing technologies have advanced rapidly. These advancements, particularly the advent of long-read sequencing technologies, have made de novo chromosome-level genome assembly increasingly feasible (Sharma et al. 2021). Consequently, multiple Oryza genomes have now been sequenced, covering at least 30 species and subspecies with diverse diploid and allopolyploid genomes (Sun et al. 2022), thereby providing powerful tools and genetic resources for leveraging wild rice gene banks. The recent assembly of the EE genome along with improvements in the continuity and quality of existing assemblies, highlights this rapid progress (Phillips et al. 2022). Currently, all diploid genome types within the Oryza genus rice have been assembled. Along with an increase in the number of assemblies, their quality has also improved. For example, the newly assembled Nip and 93–11 reference genomes are 2.2 times more contiguous compared with their earlier version (Zhang et al. 2018). The newly assembled IR64 genome has a scaffold N50 of 27,830 kb and a contig N50 of 1200 kb (Tanaka et al. 2020), significantly surpassing the previous versions, which had scaffold and contig N50 values of only 293 kb and 22.2 kb, respectively (Schatz et al. 2014). Nevertheless, the complexity of polyploid genomes remains challenging. Although assemblies of some allotetraploid rice types, such as KKLL (Mondal et al 2018; Bansal et al 2021) and CCDD (Yu et al 2021), have been assembled, BBCC, HHJJ, and HHKK genomes have yet to be assembled.
O. officinalis, a wild rice species with the CC genome, is known for its rich genetic diversity and its resistance to various biotic and abiotic stresses, making it a valuable genetic reservoir for rice breeding. Given its unique characteristics and its importance to rice improvement, it is crucial to characterize its genomic features. Recent studies utilizing metabolomics and transcriptomics have identified valuable genes in O. officinalis associated with early flowering, insect resistance, disease resistance, and stress tolerance (Ishimaru et al. 2010; Zhang et al. 2014; Kitazumi et al. 2018; Chen et al. 2022). Despite these advances, information on the CC genome remains limited. Shenton et al. generated the first O. officinalis genome assembly from a Thailand population (Shenton et al. 2020), revealing a genome of approximately 548 Mb, which is 1.6 times larger than that of Asian cultivated rice. While this assembly provides a valuable reference for CC genome studies, there is still a pressing need for higher-quality genome assemblies and sequences from subtropical ecotypes of O. officinalis. This is important given these ecotypes harbor important stress-resistance genes, including, resistance genes such as Xa29 (Tan et al. 2004a) (bacterial blight resistance), bph11 (Hirabayashi 1998), Bph12 (Qiu et al. 2012), Bph13 (Renganayaki et al. 2002), Bph14 (Du et al. 2009), and Bph15 (Yang et al. 2004) (brown planthopper resistance) were identified in the O. officinalis populations. Therefore, obtaining a higher quality genome sequence from subtropical O. officinalis would substantially enhance ongoing CC genome research and facilitate the discovery and application of the valuable genetic resources in O. officinalis, ultimately improving rice cultivation and resilience.
In this study, we utilized second- and third-generation sequencing, including Illumina, PacBio, Hi-C (High-throughput/resolution chromosome conformation capture), and transcriptome sequencing to assemble ~ 552.58 Mb of the O. officinalis genome. A whole-genome comparison with the Nip genome revealed structural variations (SVs) and differences in the homology of pests and disease resistance genes. A comprehensive analysis of the MT10 genome with other Oryza genome assemblies revealed significant diversity in resistance (R) genes. Furthermore, by combining comparative transcriptome analyses, we identified differentially expressed genes (DEGs) associated with response to (brown planthopper) BPH infestation. These DEGs belong to various gene families, including R genes, cystatins, and MAPK genes, which may contribute to BPH resistance in O. officinalis. This study provides a critical foundation for identifying novel disease resistance genes, understanding genetic variation, and advancing ongoing research in the field of rice genomics.
Results
Morphology an O. officinalis Accession
An O. officinalis accession (MT10) was collected from Hengzhou, Nanning, China by Rice Research Institute, Guangxi Academy of Agricultural Sciences. MT10 is a perennial species with underground stems. The plants reach a height of 3.5 m, and their ligule and auricle are not obvious. The leaf sheaths are light purple, and the length of anthers are 2.8 mm. The stigmas are purple, grain length are 4.9 mm and grain width are 2.5 mm, with a thousand grain weight of 8.4 g. The seeds are brown with black spots, and the brown rice have a red pericarp (Fig. 1a–e).
Fig. 1.
Morphologies of the O. officinalis accession MT10. a Plants in reproductive growth. b Roots and buds (red arrow). c Leaf sheath. d Seeds. e Brown rice
Genome Sequencing, Assembly, Pseudomolecule Construction and Annotation
We employed four sequencing methods (Illumina short-read sequencing, PacBio long-read sequencing, Hi-C, and RNA-seq) to generate the corresponding DNA and RNA libraries of MT10 (Table S1). Specifically, Illumina sequencing generated 63 Gb of high-quality data, with an effective coverage of approximately 109 × . The PacBio platform produced 29.48 Gb of high-quality data, with an effective coverage of around 51 × . Hi-C technology yielded 70.54 Gb of high-quality data, with an effective coverage of approximately 122 × . Additionally, RNA-seq generated a total of 25.8 Gb of high-quality data, with an effective coverage of approximately 45 ×.
Based on the aforementioned methods, the MT10 genome was assembled, and the statistical data is presented in Table S2. Illumina sequencing alone produced 412.75 Mb of assembled data, with contig and scaffold N50 lengths of 5.57 kb and 6.37 kb, respectively. Based on the 17-mer depth distribution, the predicted genome size of the MT10 was 587.25 Mb, with a revised genome size of 577.29 Mb. The genome's heterozygosity rate was calculated at 0.11%, while the proportion of repetitive sequences accounted for 53.67%, with a GC content of 44.61% (Fig. S1). By integrating PacBio and Hi-C technologies, a total of 552.58 Mb of assembled genome was generated, with contig and scaffold N50 lengths of 40.04 Mb and 44.48 Mb, respectively (Table S2). After adjusting the assembly using Hi-C technology, the final assembly included 12 pseudo-chromosomes and 633 scaffolds, with a sequence length of 534.51 Mb (96.73%) allocated on the 12 pseudo-chromosomes (Fig. 2; Fig. S2; Table S3). The lengths of these pseudo-chromosomes ranged from 32.63 to 62.65 Mb (Table S4).
Fig. 2.
Circos plot illustrating the multidimensional topography of the 12 chromosomes in the O. officinalis genome. The concentric circles, from outermost to innermost, represent: I Chromosomes with intervals marked at 25 Mb; II GC content, visualized as a heatmap where yellow denotes the baseline and increasing red intensity indicates higher GC content; III Gene density, also depicted as a heatmap with yellow as the baseline, red indicating higher density, and blue representing lower density; IV Repeat sequence density. These three metrics (GC content, gene density, and repeat sequence density) were calculated using 250 kb sliding windows. In the innermost circle, which displays collinear blocks, different colored lines represent both intra-chromosomal and inter-chromosomal collinear blocks (a total of 160 blocks)
We further assessed the quality of the assembled genome using four methods (Table S5): (1) BWA alignment showed that 99.75% of all short reads aligned to the assembled genome, demonstrating strong consistency. (2) Genome completeness was assessed using CEGMA (Core Eukaryotic Genes Mapping Approach, http://korflab.ucdavis.edu/datasets/cegma/) and BUSCO (Benchmarking Universal Single-Copy Orthologs, http://busco.ezlab.org/). CEGMA analysis successfully assembled 242 out of 248 Core Eukaryotic Genes (CEGs) across six eukaryotic model organisms, with a completeness ratio of 97.58%. (3) BUSCO analysis showed 98.27% of complete single-copy genes out of 1614 directly related homologous genes, indicating high completeness. (4) The assembly quality was also assessed using tissue-specific RNA-seq data that used for genome annotation. The results showed that the RNA-seq data of leaf, stem, and spike reads achieved mapping rates exceeding 97%, while RNA-seq data of root showed a mapping rate of 82.7%. These results indicate that the assembled genome is well-supported by transcriptomic evidence, reflecting good coverage and representation of expressed genes.
Based on de novo prediction, homology annotation, and transcriptome data, a total of 33,813 genes were annotated in the assembled genome (Table S6). Among these, 30,727 genes were supported by homology protein alignment, and 23,601 genes were supported by the transcriptome data (Fig. S3). Utilizing RNA sequencing data from four tissue types of the MT10, we identified that 24,561 (72.64%) genes are expressed in at least one of the tissue samples. Additionally, we identified 4,987 miRNAs, 4,123 tRNAs, 5811 rRNAs, and 579 snRNAs (Table S7). The genome annotation revealed 299.72 Mb (54.24%) of repetitive sequences. Among these, long terminal repeat (LTR) elements accounted for 221.02 Mb (40.00%) (Table S8).
Genome Evolution and Gene Family Analysis of MT10
Phylogenetic Evolution and Divergence Time Estimation of MT10
We conducted a comparative genomic analysis using genomic sequences from six representative species, consisting of one outgroup species, L.perrieri, and five species within the genus Oryza: O. sativa, O. rufipogon, O. longistaminata, O. officinalis (Thailand), and O. brachyantha, along with the MT10 genome assembled in this study. Through gene family clustering analysis (Table S9, Fig. 3a), we identified a total of 27,796 gene families across the seven species, including 10,348 shared gene families and 6811 gene families common to all species. The MT10 genome contains 7858 single-copy gene families, fewer than the other six species (Fig. S4a). Additionally, it has 719 unique genes, more than twice the number found in the O. officinalis genome accession W0002 (Shenton et al. 2020), but significantly fewer than those in cultivated O. sativa and O. rufipogon (Fig. S4a).
Fig. 3.
Comparative genomic analyses of O. officinalis (MT10), L. perrieri, and other genus Oryza. a Cluster analysis of gene families of MT10, O. sativa L. japanica, O. rufipogon, O. longistaminata, O. officinalis (Thailand), and O. brachyantha. The overlapping part of the circle represents the number of gene families shared between species, the unover lapping part represents the number of gene families unique to the species, and the sum of the numbers in a complete circle represents the total number of gene families of the species. b Evaluation of divergence time O. officinalis (MT10) and five other genus Orazy species using L. perrieri as an out-group. This phylogenetic tree was constructed utilizing 6,811 single-copy gene families. The values at the node positions represent the divergence times of the species or their ancestors, measured in millions years ago. The figures in parentheses denote the confidence intervals for these estimated divergence times. c Genome evolution of O. officinalis (MT10) and five other genus Orazy species using L. perrieri as an out-group. This phylogenetic tree was constructed utilizing 6,811 single-copy gene families. Green numbers represent the count of gene families that have undergone expansion throughout the evolutionary history of a species, while red numbers denote the count of gene families that have experienced contraction. d Synteny blocks between MT10 and Nipponbare (Nip) (336 syntenic blocks). Syntenic blocks within or between genomes are identified based on the genomic positions of genes and BLAST results (E-value threshold: 1e-5) using the McscanX software with default parameters
Utilizing all single-copy gene families from the seven species, we constructed a phylogenetic tree. The analysis reveals that MT10 is most closely related to the O. officinalis accession W0002, and it is positioned between the AA and FF genome groups. This finding aligns with their classification within the CC genome group. The divergence time between MT10 and W0002 is estimated at 1.6 million years ago (Mya), placing it between the divergence times of the AA genome species (1.5 Mya) and the FF genome species (10.3 Mya) (Fig. 3b).
Gene Family Dynamics and Synteny in MT10, L. perrieri and Genus Oryza Genome
The genome of MT10 displayed an expansion of 1,090 gene families and a contraction of 1925 gene families (Fig. 3c). KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway analysis of the 1090 expanded gene families revealed significant enrichment in pathways (Fig. S4b), many of which align with the strong pests and diseases resistance traits of O. officinalis. These include the Toll-like receptor signaling pathway (map04620), Toll and IMD signaling pathway (map04624), tropane, piperidine and pyridine alkaloid biosynthesis (map00960), tryptophan metabolism (map00380), and plant-pathogen interaction (map04626). Additionally, pathways related to metabolic synthesis and catabolism were enriched, such as photosynthesis (map00195), glycosaminoglycan degradation (map00531), cutin, suberine and wax biosynthesis (map00073), and biosynthesis of amino acids (map01230). Furthermore, we identified 336 syntenic blocks between MT10 and Nip genomes, representing 60.37% (20,375 genes) of the total gene count in the MT10 genome (Fig. 3d, Table S10). This result indicates a high level of conserved synteny between the AA and CC Oryza species.
Whole Genome Duplication (WGD) Analysis
Using single-copy orthologous gene families (Fig. S4a), we performed synonymous substitutions per synonymous site (Ks) and fourfold transversion (4DTv) analyses (Fig. S4c, d; Table S11). The results indicated that during the evolution of MT10, a WGD event occurred at a 4Dtv value of approximately 3.6 (Ks = 0.9), predating the divergence of MT10 from L. perrieri (4DTv = 0.12, Ks = 0.25), O. brachyantha (4DTv = 0.08, Ks = 0.2), O. longistaminata (4DTv = 0.04, Ks = 0.1), O. rufipogon (4DTv = 0.04, Ks = 0.1), O. sativa (4DTv = 0.04, Ks = 0.1), and O. officinalis (4DTv = 0, Ks = 0).
Comparative Genomics Analysis
The MT10 genome is approximately 1.48 times larger than the Nip genome. The length ratios of individual chromosomes range between 1.30 and 1.61. Chromosome 12 exhibits the smallest ratio (1.30), while chromosome 10 has the largest (1.61) (Table S12). Alignment of the MT10 genome to the Nip genome showed that 53.42 Mb (14.24%) of the MT10 genome aligns to the Nip reference genome (9.64%), with approximately 2.4 million single nucleotide polymorphisms (SNPs) identified (Fig. 4; Table S13). Furthermore, 12,073 Present and Absent Variations (PAVs) (≥ 500 bp) were identified in the MT10 genome, covering approximately 90 Mb, with the longest variation being 184.0 kb. Similarly, 12,283 PAVs, covering 86.3 Mb with the longest variation being 186.3 kb, were found in the Nip genome. Specifically, PAVs ≥ 50 kb were identified as 97 and 94 in MT10 and Nip, respectively.
Fig. 4.
Genome collinearity between O. officinalis MT10 and Nipponbare (Nip). The genomes of MT10 and Nip were fully aligned using MUMmer4 (v4.0.0) with the parameters –mum -c 100 -l 100
Further structural variations were identified, as depicted in Fig. 4 and detailed in Table S14. Notably, 1446 inversions, with fragment lengths greater than 1 kb, were observed, including 3 inversions greater than 10 kb, located specifically at positions 33.075–33.086 Mb and 33.456–33.467 Mb on MT10 Chromosome 1, and 36.975–36.989 Mb on Chromosome 7.
A comparative genomic analysis between MT10 and Nip revealed that 29,260 genes from MT10 share amino acid sequence similarities with 21,281 genes from Nip, with sequence similarities ranging from 18.79 to 100% (Table S15). At a similarity threshold of ≥ 50%, all 26,218 MT10 genes were identified as homologous to 20,876 genes from the Nip genome, while at ≥ 90% similarity, 15,800 MT10 genes exhibited homology with 14,527 Nip genes.
Further analysis of 2690 characterized functional gene (sourced from https://www.ricedata.cn/) revealed significant genetic divergence between MT10 and Nip (Table S16). Specifically, only 44 genes displayed 100% amino acid sequence identity between the two species. Several pest and disease resistance genes, including Bph32 (Ren et al. 2016), Pish (Takahashi et al. 2010), OsRP1L1 (Guo et al. 2012), and Pit (Hayashi et al. 2009), showed markedly low sequence identity between MT10 and Nip, with identities of 27.75%, 30.42%, 34.29%, and 37.45%, respectively. These findings indicate significant genetic divergence of resistance genes in MT10.
Cross Species Gene Family Analysis
Comparative Analysis of Resistance (R) genes in MT10 and other Oryza Genomes
The identification and characterization of R genes are pivotal for understanding the molecular underpinnings of pathogen defense. A comparative analysis of the distribution and abundance of various R gene types across different Oryza genome assemblies was conducted (Fig. 5a). Overall, the O. officinalis genomes from Guangxi (MT10) and Thailand (W0002) exhibit similar numbers of R genes (440 and 447, respectively) and comparable R gene categories (Table S17). However, compared to other rice genomes, the O. officinalis genome exhibits differences in both the number and categories of R genes, including 21 CC-NB-ARC genes, which is fewer than in O. sativa (39), O. rufipogon (36) and O. longistaminata (56), but higher than in O. brachyantha (17). Moreover, in the CC-NB-ARC-LRR category, MT10 contains 245 genes. This number, while significant, is surpassed by the counts in O. sativa (334), O. rufipogon (374) and O. longistaminata (341), suggesting a potential contraction of this R gene category in O. officinalis. The NB-ARC genes are modest in number in O. officinalis MT10 genome (29), yet they are more abundant in O. longistaminata (81), highlighting interspecies variation in the resistance gene repertoires. The NB-ARC-LRR category, with 140 genes in the O. officinalis MT10 genome, is also noteworthy being comparable in number to that found in O. sativa (117) and O. rufipogon (132), suggesting a common reliance on this category of R genes for immune responses across these species. For the CC-TIR-NB-ARC-LRR, TIR-CC-NB-ARC-LRR and TIR-NB-ARC-LRR categories, no substantial differences were observed between the MT10 genome and other rice genomes. The comparative analysis reveals that MT10 possesses a diverse and unique set of R genes that may contribute to its distinct resistance profile.
Fig. 5.
Analysis of R genes and cystatin genes family in the genus Oryza. a Distribution of R genes types across O. officinalis (MT10), O. officinalis (Thailand), O. sativa, O. rufipogon, O. brachyantha, and O. longistaminata. b Phylogenetic clustering analysis of cystatin genes in O. officinalis (MT10), O. sativa, O. rufipogon, O. brachyantha, and O. longistaminata was constructed using multiple sequence alignment with IQ-TREE v2.4. c Brown planthopper (BPH) were feeding on MT10 and Nipponbare (Nip) stems. d Phenotypic changes of MT10 and Nip after 5 days of feeding by BPH. e Volcano plot illustrating upregulated and downregulated differentially expressed genes (DEGs) for MT10 after 12 h BPH infestation (compared to 0 h). The x-axis (Log₂ Fold Change): Multiple change of gene expression. The y-axis (-Log10 (P-adjust)): degree of significance of differential genes. Red dots represent significantly altered DEGs, while the gray dots indicate non-significantly altered DEGs. f Enrichment scatter diagram representing GO terms of DEGs in MT10 after 12 h of BPH infestation (compared to 0 h). The x-axis represents degree of significance of GO terms. The y-axis indicates the entry of each enrichment category. The size of the dots corresponds to the number of significance -Log10(P), GO: gene ontology; BP: biological processes; CC: cellular components; MF: molecular functions
Cystatin Gene Family
The cystatin gene family encodes cystatins proteins known for their ability to inhibit the activity of cysteine proteases, a class of enzymes involved in essential biological processes such as digestion, immune response, and regulation of cell death. Plant cystatins play a crucial role in processes like plant germination and defense, and some are specifically linked to defense mechanisms against herbivores and pathogens. In response to attacks, plants may produce cystatins to disrupt the digestive enzymes of herbivores or the proteases of invading pathogens. Among the Oryza species analyzed, O. officinalis (MT10) stands out with 18 genes identified within this family (Table S18), making it the species with the highest number of cystatin genes. We further conducted a clustering analysis of cystatin genes across different rice genomes based on multiple sequence alignment (Fig. 5b). The results indicate that O. officinalis genes (Oo09g0012610, Oo03g0007980, and the tandem-repeat genes Oo03g0005480, Oo03g0005490, Oo03g0005500) cluster together with O. brachyantha genes (OB05G24170, OB03G15560, and the tandem-repeat genes OB03G18270, OB03G18280, OB03G18290), while no corresponding genes were identified in other Oryza species. In contrast, other clusters display a more balanced distribution of genes across different Oryza species, with similar gene numbers represented. The exclusive clustering of certain cystatin gene copies in O. brachyantha and O. officinalis suggests that the duplication events responsible for these genes likely arose in their common ancestor.
Transcriptome Analysis Provides Insights into DEGs Associated with MT10's Resistance to BPH
Transcriptome analysis is an important approach for understanding the molecular mechanisms underlying resistance to the BPH, an essential aspect for both research and breeding programs. We generated transcriptome sequencing of MT10 and Nip after BPH treatment at 0, 6, 12, and 24 h post-treatment (Fig. 5c). The phenotypes of MT10 and Nip after 5 days of infestations by BPH are showed in Fig. 5d. Firstly, we analyzed the differential gene expression of MT10 at different time points following BPH treatment. The results (Table S19-21) showed that there were 1,206 DEGs after 6 h of treatment, which increased to 4211 after 12 h, with 1706 upregulated and 2505 downregulated genes (Fig. 5e). After 24 h of treatment, 1540 differentially expressed genes were observed. We conducted GO enrichment analysis on the DEGs at each time point. The results revealed that some DEGs identified after 12 h of BPH treatment are associated with responses to biotic stress, whereas DEGs observed after 6 and 24 h of BPH treatment are primarily linked to responses to abiotic stimuli. Thus, 12 h after BPH treatment marks a key period for gene expression associated with BPH resistance in MT10. For Nip, a progressive increase in the number of DEGs was observed, with 1,855 DEGs identified at 6 h, 5788 DEGs at 12 h, and peaking at 6804 DEGs at 24 h (Table S22-24). Similar GO enrichment results for DEGs after BPH treatment were observed between the MT10 and Nip genomes. Notably, the term 'responses to biotic stress' was enriched only among the DEGs identified at 12 h post-treatment, whereas the DEGs observed at 6 and 24 h were primarily associated with responses to abiotic stimuli (Table S25).
We further analyzed the upregulated and downregulated genes after 12 h of BPH treatment identifying several specific biological processes (BP), molecular functions (MF), and cellular components (CC) that were significantly enriched among the differentially expressed genes (Fig. 5f, Table S26). In particular, RNA processing and modification were strongly associated with the “RNA modification” (GO:0009451, P = 4.33E-06) and “RNA binding” (GO:0003723, P = 6.17E-03) categories, highlighting the importance of post-transcriptional regulation in the plant's response to stress. Additionally, the enrichment of “tetrapyrrole metabolic process” (GO:0033013, P = 5.42E-06) and “S-adenosylmethionine-dependent methyltransferase activity” (GO:0008757, P = 8.11E-03) suggests that metabolic adjustments, particularly those involving tetrapyrrole compounds and methylation reactions, play a critical role in the plant’s defense mechanisms. The role of tetrapyrroles in plant defense is well-documented, with these compounds often being precursors to vital molecules such as chlorophylls and heme groups, which are essential for maintaining cellular homeostasis under stress conditions. Moreover, the response to biotic stimulus was also a significant feature, as evidenced by the enrichment of the “response to biotic stimulus” (GO:0009607, P = 3.04E-04) category, indicating that the plant's defense mechanisms are potentially engaged in recognizing and responding to the presence of BPH. Further analysis compared the upregulated genes after 12 h of treatment with the expanded gene families of MT10, resulting in the identification of 231 genes that were both upregulated and part of the expanded gene families. GO enrichment analysis of these genes indicated an enrichment in functions such as the mitogen-activated protein kinase (MAPK) cascade and lignin catabolic process. The MAPK cascade is a critical signaling module that translates environmental inputs into cellular programs, participating in physiological and pathological responses. It responds to various environmental stress stimuli and is involved in modulating the host's immune response. Notably, four genes associated with the MAPK cascade exhibited log2 fold changes greater than one after 12 h of BPH treatment (Oo01g0021600, Oo05g0025130, Oo05g0003010, and Oo05g0023940). These findings suggest the potential involvement of the MAPK cascade in MT10’s resistance to BPH.
To identify genes with higher expression in MT10 compared to Nip, we performed a comparative transcriptome analysis at different time points by mapping reads against Nip and MT10 genomes and identifying syntenic orthologs. Using a cutoff of expression levels greater than 2 TPM and an expression variation index greater than 10, we identified 1111, 1153, 1204, and 1120 genes with higher expression in MT10 compared to Nip across the respective time points (0 h, 6 h, 12 h, and 24 h) (Table S27-30). Among these, 623 genes were consistently detected at all four time points (Fig. S5). GO enrichment analysis of these highly expressed genes at each time point revealed that they are not functionally associated with disease resistance but are instead linked to rhythmic processes (Table S31-34). Using the same threshold, we identified a few syntenic orthologs with higher expression levels in Nip compared to MT10, totaling 80, 86, 84, and 106 orthologs at the respective time points (0 h, 6 h, 12 h, and 24 h) (Table S27-30). However, these orthologs did not exhibit significant enrichment for specific biological functions.
To further identify potential functional genes associated with BPH resistance, we.
focused on exploring R genes and cystatin genes that are differentially expressed and exhibit higher expression levels in MT10 compared to their homologous gene pairs in Nip. For the R genes, we identified six that exhibit higher expression levels in MT10 compared to their homologous gene pairs in Nip (Table S35). These include the CC-NBARC-type R gene Oo08g0006040, which shows increased expression in MT10 after 6 and 12 h of BPH treatment, and the NBARC-LRR R gene Oo10g0001280, which displays higher expression in MT10 after 12 and 24 h of treatment. Additionally, the CC-NBARC-LRR R gene Oo11g0007190 is consistently upregulated in MT10 after 6, 12, and 24 h of BPH treatment compared with Nip, indicating a sustained higher expression of this gene in MT10. Furthermore, using the 0-h BPH treatment as a control, we identified a total of 35 R-gene DEGs following BPH treatment. Among these, Oo06g0022450, Oo08g0004350, and Oo11g0007220 are consistently upregulated after 6 and 12 h, while Oo10g0006050 and Oo11g0019580 exhibit consistent upregulation after 12 and 24 h of treatment (Table S36). For the cystatin genes, only one gene, Oo05g0018820, exhibits higher expression levels in MT10 compared to its homologous gene pair Os05g0494200 in Nip after 24 h of BPH treatment. In addition, using the 0-h BPH treatment as a control, cystatin genes Oo01g0031480 and Oo05g0018820 re consistently upregulated after 6 and 12 h (Fig. S6). These findings suggest that both R and cystain genes may play a role in regulating BPH resistance in MT10.
Discussion
This study provides valuable insights into the genome size and composition of MT10, an O. officinalis accession from Guangxi, Southern China. The MT10 genome size was measured at 587.25 Mb and corrected to 577.29 Mb using 17-mer depth distribution. Previous analyses by Dai et al. (2022) reported substantial intra-species variation in O. officinalis, with genome sizes of 613.2 Mb and 592.7 Mb, suggesting that the O. officinalis genome is dynamic and may vary due to transposon amplification as well as differences in genome sequencing and assembly methods. Technological advancements played a crucial role in the accuracy of our genome assembly. By integrating Illumina, PacBio, Hi-C, and RNA-seq technologies, we assembled a high-quality MT10 genome, characterized by a longer scaffold N50, a higher count of annotated genes, and enriched repeat content, underscoring the importance of a multi-faceted genomic approach for high-quality assemblies.
A comparison with the initial O. officinalis assembly from Thailand by Shenton et al. (2020), which employed Illumina and PacBio RSII technologies highlights these improvements. The previous assembly had a scaffold N50 of approximately 0.5 Mb, with 29,930 annotated genes and 51.09% repetitive regions. In contrast, our assembly achieved a Scaffold N50 of 44.48 Mb, included 33,813 annotated genes, and had a repeat content of 54.24%. These results indicate the superior accuracy and completeness of our MT10 genome assembly, thereby facilitating more detailed genomic studies on O. officinalis.
In 2002, Goff et al (2002) successfully sequenced the Nip genome. Subsequently, in 2005, International Rice Genome Sequencing Project and Sasaki (2005) released a high-quality genome sequence for Nip, sized at 374.4 Mb (IRGSP-1.0). Most recently, Shang et al (2023) completed a comprehensive assembly of the Nip reference genome, achieving a genome size of 385.7 Mb (AGIS-1.0), with each chromosome present as a single continuous sequence boasting a base accuracy exceeding 99.9999%. In our current research, the assembly of the MT10 rice genome yielded a size of 577.29 Mb, which is 1.54 times and 1.50 times larger compared to the IRGSP-1.0 and AGIS-1.0 Nip genomes, respectively. Shang et al. (2023) reported that repetitive DNA sequences of IRGSP-1.0 and AGIS-1.0 accounted for 50.1% and 51.1% of these genomes, respectively, which was close to the proportion of MT10 (54.24%). However, the proportion of IRGSP-1.0 and AGIS-1.0 long terminal repeat (LTR) retrotransposons was 23.1% and 22.9%, respectively, which was much lower than that of MT10 (41.00%). These findings suggest that, relative to Nip, the increase in repetitive DNA in MT10 is modest, whereas the proportion of LTR retrotransposons is markedly higher. Given that polyploidy (Wang et al. 2019) and repeat sequences (McCann et al. 2020) influence plant genome size, the pronounced LTR retrotransposon content in MT10 likely drives the observed genomic differences between the Nip and O. officinalis assemblies.
In this study, we also observed that MT10 exhibits a higher number of unique genes compared to Thai O. officinalis. Gene Ontology (GO) functional analysis revealed that these unique genes are primarily associated with critical biological processes such as cell adhesion (GO:0007155), cysteine-type peptidase activity (GO:0008234), proteolysis (GO:0006508), peptidase activity acting on L-amino acid peptides (GO:0070011), and DNA integration (GO:0015074). Cell adhesion is fundamental to tissue structure and organ formation, playing crucial roles in mediating immune responses and tissue healing processes (Horwitz 2012). In parallel, genes associated with cysteine-type peptidase activity are linked to enhanced tolerance to abiotic stresses (Choe et al. 2013; Zhou et al. 2021; Wang et al. 2020; Sun et al. 2021), resistance to blast disease (Li et al. 2022), and chlorophyll biosynthesis (Zhao et al. 2019). This connection suggests that these unique genes potentially contribute to the robustness and adaptability of MT10. Moreover, KEGG pathway enrichment analysis identified significant links to pathways such as tropane, piperidine, and pyridine alkaloid biosynthesis (map00960), plant-pathogen interaction (map04626), tryptophan metabolism (map00380), and other metabolic pathways. Tryptophan metabolism is noteworthy as tryptophan serves as a precursor for plant auxins and various secondary metabolites, such as camalexin and glucosinolates. Existing studies indicate that tryptophan-derived compounds, such as serotonin, significantly enhance rice’s insect resistance (Lu et al. 2018) and mediate resistance to diseases, like verticillium wilt in cotton (Miao et al. 2019). Collectively, these findings underscore O. officinalis as a genetic reservoir of unique disease and pest resistance traits.
O. officinalis is notable for its to biotic stress resistance. The MT10 genome assembly provides a valuable opportunity to investigate the molecular basis of this resistance. Our analysis of R genes across multiple Oryza genomes revealed a distinct distribution and diversity of R gene types in MT10. For example, MT10 harbors 21 CC-NBARC genes. Although this is fewer compared to O. sativa, which has 39 CC-NBARC genes. The relative scarcity of specific R gene types such as CC-NBARC in MT10. MT10 possesses 245 CC-NBARC-LRR genes, fewer than Nip (334) and O. longistaminata (341). This relative contraction of the CC-NBARC-LRR category in O. officinalis may reflect different evolutionary pressures or pathogen exposure histories. To further identify R genes that are associated with BPH resistance, we conducted a comparative transcriptomics at various time points after treatment of BPH. And we identified six R genes specifically upregulated in MT10, as well as 35 R genes showing differential expression at various stages, 26 of which belong to the CC-NBARC-LRR category.
To date, 17 BPH resistance genes have been cloned from rice, eight of which are alleles (Yang et al. 2023). Most of these R genes, including, Bph6 (Os04g0431700) (Guo et al. 2018), Bph14 (Os03g0848700) (Du et al. 2009), Bph18 (Os12g0559400) (Ji et al. 2016), Bph30 (Os04g0115650) (Shi et al. 2021), and Bph40 (Os04g0166000) (Shi et al. 2021) are constitutively expressed with no significant induction after BPH infestation, indicating that these R genes belong to constitutive expression. Sequence homology comparative analyses of these gene sequences in MT10 and Nip revealed varying degrees of similarity (Table S15, S16) and our RNA-seq data showed no significant transcriptional changes for these genes (Oo04g0013350, Oo01g0014800, Oo04g0003900, Oo06g0010510, Oo04g0005080, and Oo02g0035610) following BPH infestation in MT10. Thus, if these genes do confer resistance, structural differences rather than transcriptional changes may be key to their function between species.
In addition to R genes, MT10 harbors a higher number of cystatin genes compared to other species, potentially due to gene duplication events. Cystatin (CYS) genes have been extensively studied in plant–insect interactions. For example, Zhu et al. (2019) identified a cystatin gene named SpCYS from the wild diploid potato Solanum pinnatisectum Dun, which is highly resistant to pests. Li et al. (2021) found that among the 18 SbCys gene expression profiles, only 2 genes were responsive to aphid infection. Of the 13 genes encoding cystatin in barley (Hordeum vulgare), only HvCPI-6 is the most potent inhibitor against Myzus persicae and Acrythosiphon pisum. Arabidopsis plants expressing HvCPI-6 significantly delayed adult development, suggesting that the barley cystatin gene may interfere with the growth of both aphid species (Carrillo et al. 2011). Cystatin genes are also involved in the defense response of tea plants to Myllocerinus aurolineatus (Zhang et al. 2020). Transforming soybean or lily with the rice cystatin gene Oc-IΔD86 (Sequence homology: Os01g0803200) enhance resistance to root lesion nematodes (RLN) (Vieira et al. 2015). Similarly, the Oc-IΔD86 transformed Easter lilies show good resistance to RLN (Westerdahl et al. 2023). Furthermore, the transcription levels of most rice cystatin genes changed under abiotic treatments such as cold, drought, and salt stress, suggesting their role in responding to different stresses (Zhang et al. 2015). In this study, we further narrowed down two candidate cystatin genes (Oo01g0031480 and Oo04g0009620) through transcriptomic comparisons and differential expression analysis. Notably, Oo01g0031480 shares homology with Os01g0803200 and exhibits stronger induction in MT10 compared to Nip following BPH infestation, suggesting a potential role for this gene in MT10’s enhanced resistance to BPH.
Overall, the comparative genomic and transcriptome analyses reveal that MT10 possesses a distinct complement of R genes and cystatin genes, with only a subset appearing to contribute directly to its BPH resistance profile. Accurate identification and functional characterization of these specific R genes and cystatin genes remain critical for understanding the molecular mechanisms of host resistance. These insights are particularly significant for breeding programs aimed at enhancing disease resistance in cultivated rice varieties. By leveraging the unique genetic repertoire of wild species like MT10, breeders can introduce novel alleles and broaden the genetic base of modern rice cultivars, potentially increasing their resilience to a wider range of pests and diseases. Ultimately, this knowledge may accelerate the development of more robust, high-yielding, and disease-resistant rice varieties.
Our additional transcriptome analyses revealed that pathways associated with flavonoid biosynthesis, transmembrane transport, and drug transmembrane transport were significantly enriched 12 h after BPH feeding on MT10. Similar findings were reported by Zhang et al. (2022), who observed significant enrichment of flavonoid biosynthesis in the GO enrichment analysis of BPH-susceptible and resistant rice varieties. This parallel is noteworthy as an increase in flavonoid content in rice correlates positively with BPH resistance. Specifically, the gene OsF3H plays a crucial role in the flavonoid biosynthesis pathway and has been shown to positively regulate rice resistance to BPH (Dai et al. 2019). Therefore, genes related to flavonoid biosynthesis may be crucial in conferring resistance to BPH in O. officinalis. Further, our transcriptome analysis identified four MAPK genes (Oo01g0021600, Oo05g0025130, Oo05g0003010 and Oo05g0023940), which were significantly upregulated after 12 h of BPH feeding. Table S15 reveals that the genes Oo01g0021600 and Oo05g0025130 exhibit similarities of 98.86% and 76.21% with the Os01g0629900 (OsMAPK10) gene, respectively. According to Chen et al. (2022), OsMAPK10 positively regulates lignin content in rice leaves and plays a role in sclerenchyma cell wall (SCW). Bph30 upregulates genes involved in cellulose and hemicellulose synthesis in sclerenchyma cells, enhancing cell wall thickness and hardness, thereby preventing BPH from feeding on the phloem sap (Shi et al. 2021). This suggests that the Oo01g0021600 and Oo05g0025130 genes in MT10 may regulate lignin synthesis pathways, contributing to physical resistance against BPH. Additionally, the Oo05g0003010 gene closely resembles the Os05g0143500 (OsMPK14) gene with 98.90% similarity. OsMPK14 expression is upregulated during the feeding by TN1-BPH populations on IR56 rice (Nanda et al. 2018), suggesting a potential role in MT10 to the BPH resistance. The Oo05g0023940 gene shows a similarity of 98.15% with the Os05g0566400 (OsMAPK20-5) gene. Silencing OsMAPK20-5 in irMAPK plants increases accumulation of ethylene and nitric oxide (NO) after infestation by gravid female BPH, thus enhancing rice resistance against BPH adults and oviposition (Li et al. 2019). Hence, the Oo05g0023940 gene in MT10 might negatively regulate BPH resistance. To sum, both flavonoid biosynthesis and MAPK pathways may play significant roles in the BPH resistance of Guangxi O. officinalis. Understanding these genetic mechanisms opens new avenues for developing BPH-resistant rice varieties, leveraging natural genetic diversity to enhance crop resilience and productivity.
At present, wild rice is predominantly utilized in two aspects. (1) Conventional hybridization methods involve introducing beneficial genes from wild rice into cultivated rice varieties. For example, Tan et al. (2004b) developed a series of new planthopper resistance breeding materials through extensive hybridization between wild rice and cultivated rice. One of the progenies, “B5”, is gradually derived from O. officinalis collected in China and is highly resistant to BPH. Bph14 and Bph15 have been cloned from B5 (Du et al. 2009; Cheng et al. 2013) and widely used in marker-assisted breeding in China (Hu et al. 2012; Wang et al. 2016; Jiang et al.2018). (2) De novo domestication involves using gene editing and other techniques to modify genes associated with undesirable traits in wild rice, such as seed shattering, awns, tall stalks, and creeping growth habits, while preserving genes responsible for beneficial traits. This approach enhances the suitability of wild rice for cultivation, production, and utilization. For example, Yu et al. (2021) assembled the genome of allotetraploid Oryza alta (CCDD), optimized the genetic transformation system, combined with multi-dimensional genomics and multi-target precision genome editing technology, and realized the improvement of related traits of Oryza alta. The above examples highlight the potential for effectively utilizing Oryza officinalis in rice breeding and production.
Conclusions
In the present study, we successfully assembled a high-quality genome of O. officinalis MT10, a typical accession from Southern China. Our comparative genomic analysis revealed distinct characteristics of the O. officinalis genome, which differ from those of previously assembled accessions from Southeast Asia. By combining comparative transcriptome analyses, we identified important candidate genes from various gene families, including R genes, cystatins, and MAPK genes. Some of these DEGs, such as Oo01g0031480, Oo01g0021600, Oo05g0025130, Oo05g0003010, and Oo05g0023940 may contribute to MT10’s resistance to BPH. This study provides detailed genomic information about the O. officinalis genome, offering a valuable resource for future investigations and the deployment of largely untapped genetic resources in rice breeding programs. By leveraging the unique genetic resources of O. officinalis, breeders can enhance the resilience of cultivated rice varieties, improving their ability to withstand BPH and ensuring sustainable rice production.
Materials and Methods
Plant Material
We selected the O. officinalis accession MT10 originating from Guangxi, for this study. The plants were cultivated in the greenhouse of the Guangxi Academy of Agricultural Sciences. Tissues from four parts of plants roots, stems, leaves, and panicles, were collected and preserved in liquid nitrogen at − 80 °C for future use.
DNA Extraction, Library Construction, and Sequencing
DNA was extracted from young leaves using the modified CTAB method. The DNA samples that met meeting quality standards were randomly fragmented. A 350 bp short-insert paired-end DNA library was constructed following the Illumina standard protocol and sequenced on the Illumina HiSeq 2000 platform. A 15 kb Pacbio SMRT Bell library was also constructed and sequenced using the PacCio Sequel II system.
Genome Assembly and Quality Assessment
The short reads from Illumina HiSeq sequencing were assembled using SOAPdenovo2 (Luo et al. 2012). The size of the MT10 genome, heterozygosity, and proportion of repeat sequences were assessed using a kmer = 17 frequency distribution (Marçais and Kingsford 2011) using Jellyfish (v2.2.10). Based on kmer = 41, short reads were assembled into contigs and scaffolds. The long reads from Pacbio Sequel II sequencing were assembled using Hifiasm (Cheng et al. 2021) (v0.14-r312)/Wtdbg2 (Ruan and Li 2020) (v2.5). Using sequencing-derived Hi-C data, Allhic (Berkum et al. 2010) (v0.9.8) was employed to anchor the assembled contigs/scaffolds to the chromosomal level, achieving a chromosome-scale genome assembly. The quality of the genome assembly was evaluated using multiple approaches: (1) Genome sequence completeness was assessed using BUSCO (Simao et al. 2015) (v4.1.2) and CEGMA (Parra et al. 2007) (v2.5). (2) BWA (Li and Durbin 2009) (v0.7.8) was used to align small fragment library reads to the assembled genome, and the alignment rate, genome coverage, and depth distribution were calculated to evaluate sequence consistency. (3) Samtools (Li et al. 2009) (v0.1.19) was utilized to analyze BWA alignment results, and genome assembly accuracy was assessed based on the homozygous SNP ratio.
RNA Extraction and Transcriptome Sequencing
RNA was extracted from the four tissues (roots, stems, leaves, and panicles) of MT10 using an RNA extraction kit. RNA samples were subjected to quality control using an Agilent 2100 bioanalyzer, which also precisely assessed RNA integrity and total RNA quantity. Libraries were constructed and quality checked, followed by paired-end sequencing on the Illumina HiSeq 2000 platform, generating a total of 25.8 Gb of RNA sequencing (RNA-Seq) data.
Genome Annotation
Genome annotation encompasses three main aspects: annotation of repetitive sequences, gene annotation (including gene structure prediction and gene function prediction), and non-coding RNA (ncRNA) annotation. Methods for repetitive sequence annotation inclued homology-based alignment and de novo prediction. Homology-based alignment was performed using RepeatMasker (Tarailo-Graovac and Chen 2009) (v4.1.2) and RepeatProteinMask (Tempel 2012) (v4.0.6). De novo prediction of repetitive elements was conducted using LTR_FINDER (Xu and Wang 2007) (v1.07), RepeatScout (Price et al. 2005) (v1.0.5), RepeatModeler (Flynn et al. 2020) (v2.0.2), and TRF (Benson 1999) (v4.09.1).
Gene structure prediction utilized homology-based, de novo, and RNA-seq data predictions. Homology predictions were made by aligning protein sequences of O. sativa L. japonica Nipponbare (Nip) (http://rice.uga.edu/), O. sativa L. indica R498 (Du et al. 2017) (R498), O. sativa L. indica ZS97 (Song et al. 2021) (ZS97), O. sativa L. indica MH63 (Song et al. 2021) (MH63), O. rufipogon (Li et al. 2020), and O. glaberrima (Zhang et al. 2015) to the genome sequences of MT10 using BLAST (McGinnis and Madden 2004) (v2.2.26) and GeneWise (Bimey et al. 2004) (v2.4.1). De novo prediction of gene structures was carried out using Augustus (Stanke et al. 2006) (v3.3.3), GlimmerHMM (Majoros et al. 2004) (v3.0.4), SNAP (Korf 2004) (v6.0), Geneid (Blanco et al. 2007) (v1.4.4) and Genscan (Burge and Karlin 1997) (v1.0).
The results from both homology-based and de novo predictions, combined with transcriptome alignment data, were integrated using EVidenceModeler (Haas et al. 2008) and further refined by PASA (Jia et al. 2020) (v2.4.1) incorporating transcriptome assembly results, adding information such as UTRs and alternative splicing, to derive the final gene set. Functional annotation of the genes was achieved by aligning the gene set against protein databases such as SwissProt (Bairoch and Apweiler 2000), Nr (http://www.ncbi.nlm.nih.gov/protein), Pfam (Bateman at al. 2004), KEGG (Kanehisa and Goto 2000), and InterPro (https://www.ebi.ac.uk/interpro/), to obtain functional information of the genes.
Gene Family Evolutionary Analysis
OrthoMCL (Li et al. 2003) (v1.4) was employed to conduct gene family clustering analysis on seven Oryza species, including MT10, Nip, O. rufipogon, O.longisatminata, O.officinalis (Thai), O.brachyantha, and L.perrieri (Leersia hexandra Sw.) with the parameters -mode 3 -inflation 1.5. All single-copy gene families from these Oryza species were aligned using MUSCLE (Edgar 2004a, b) (v3.8.31) with default parameters. Ambiguously aligned positions were trimmed using Gblocks (v0.91b). Subsequently, a phylogenetic tree was constructed based on these multiple sequence alignments using RAxML (Stamatakis 2014) (v8.2.12). Its parameters of coding DNA sequence and protein sequence parameters are -m GTRGAMMA -p 12,345—× 12,345 -# 100 -f ad -T 20, -m PROTGAMMAAUTO -p 12,345—× 12,345 -# 100 -f ad-T 20, respectively.
Divergence time estimation for these single-copy gene families was performed using MCMCTree (Puttick 2019) (v4.9) part of the PAML (Yang 2007) (v4.9) software package with default parameters. Based on the estimated divergence times and the results of family clustering statistics, gene family expansion and contraction analysis were conducted using CAFE (De Bie et al. 2006) (v4.2) software with the settings -p 0.05 -t 4 -r 10,000. Furthermore, genomic collinearity analysis between MT10 and Nip was performed using MCScanX (Wang et al. 2012).
Genome Comparison and Structural Variant Identification
The genomes of MT10 and Nip were fully aligned using MUMmer4 (Marcais et al. 2018) (v4.0.0) with the parameters –mum -c 100 -l 100. The alignment results were filtered using delta-filter software with the settings -i 80 -l 1000 −1. Variations were called from the alignment results using the dnadiff with default parameters. Based on the results from MUMmer4 alignments, structural variants were identified using Assemblytics (Nattestad and Schatz 2016) software (v1.2.1).
NLR Resistance (R) Gene and Cystatin Gene Families Studies of Cross Rice Species
The genome-wide nucleotide-binding site and leucine-rich repeat receptor (NLR) genes were annotated by the NLR-Annotator (Steuernage et al. 2020) tool across different rice genomes including MT10, O.sativa, O. longistaminata, O. rufipogon and O.barthii. In addition, a comprehensive genome-wide gene discovery pipeline was employed to identify cystatin in these rice genomes as well. First, well-characterized cystatin genes in Arabidopsis were extracted from a previous study (Martínez 2005) and used as queries to search against the studied rice genomes by BLASTP (Camacho et al. 2009) with an e value cut-off of 1e −10. The hidden Markov model (HMM) profiles of the Cystatin domain (PF00031) were downloaded from Pfam (Bateman et al. 2004). HMMER searches (v3.2) were conducted on the sequences using NB-ARC HMM profile with an e-value of 0.01, as suggested by the HMMER user manual (Finn et al. 2011). The gene sequences identified by BLASTP and HMM search was retrieved as cystatin genes. The identified cystatin gene sequences across different rice species were aligned using MUSCLE (Edgar 2004b) (v5), and phylogenetic clustering analysis was conducted with IQ-TREE (Nguyen et al. 2015) (v2.4).
Sample Preparation and Transcriptome Experiment Design of BPH treatment
The rice seeds of Nip and MT10 (MT10 is difficult to germinate by conventional methods and is usually de-shelled) were soaked and germinated and sown in the same tray. When the rice seedlings grew to 4–5 leaves, the 2nd-3rd instar BPH larvae were released. Leaf and stalk samples of Nip and MT10 were collected at 0 h, 6 h, 12 h and 24 h before, respectively. We obtained a total of 24 samples (2 genotypes × 4 treatment times × 3 biological replicates = 24 samples). The 24 samples collected were quick-frozen with liquid nitrogen and stored in a − 80 °C refrigerator for further testing. Total RNA from stalks and leaves was extracted with Trizol (Invitrogen). The constructed libraries were sequenced using Illumina NovaSeq X Plus.
Analysis of Differentiated Gene Expression after BPH treatment
Low-quality reads, adapters, and reads containing poly-N in the raw RNA-Seq data were removed using the fastp (v0.20.0) with default parameters (Chen et al. 2018). The RNA-seq data of Nip and MT10 were mapped to Nip reference genome and MT10 reference genome respectively using Hisat2 (v2.1.0) with default parameters (Kim et al. 2015), and the number of mapped reads was counted using HTSeq (Anders et al. 2015) (v0.6.1). RNA-Seq reads were normalized to TPM (Transcripts Per Kilobase Million) and low expression reads (TPM < 1) were removed. Differential expression analysis was performed using the DESeq2 R package 1.18.0 (Love et al. 2014) after BPH treatment at 0, 6, 12, and 24 h post-treatment, respectively. A corrected P-value ≤ 0.001 were used as thresholds for determining significantly DEGs. The SynGAP (Wu et al. 2024) tool was utilized to perform a comparative transcriptome analysis and identify DEGs among syntenic orthologs between the Nip and MT10 genomes. GO annotation of DEGs detected using the RNA-seq data was performed by TopGO (Alexa and Rahnenführer 2009).
Supplementary Information
Abbreviations
- BP
Biological processes
- BPH
Brown planthopper
- CC
Cellular components
- CC-NB-ARC
Coiled-coil-nucleotide-binding-adapter shared by APAF-1, R proteins, and CED-4
- CC-NB-ARC-LRR
Coiled-coil-nucleotide-binding-adapter shared by APAF-1, R proteins, and CED-leucine-rich repeat
- CC-TIR-NB-ARC-LRR
Coiled-coil- toll-interleukin-1 receptor-nucleotide-binding-adapter shared by APAF-1, R proteins, and CED-leucine-rich repeat GO gene ontology
- GO
Gene ontology
- Hi-C
High-throughput/resolution chromosome conformation capture
- HMM
Hidden Markov model
- KEGG
Kyoto encyclopedia of genes and genomes
- L. perrieri
Leersia perrieri
- LTR
Long terminal repeat
- MAPK
Mitogen-activated protein kinase
- MF
Molecular functions
- Mya
Million years ago (Mya)
- ncRNA
Non-coding RNA
- Nip
Nipponbare
- O. glaberrima
Oryza glaberrima
- O. officinalis
Oryza officinalis
- O. rufipogon
Oryza rufipogon
- O. sativa L
Oryza sativa L
- O. barthii
Oryza barthii
- O. brachyantha
Oryza brachyantha
- O. longisatminata
Oryza longisatminata
- PAVs
Present and absent variations
- R gene
Resistance gene
- SCW
Sclerenchyma cell wall
- SNPs
Single nucleotide polymorphisms
- SVs
Structural variations
- TIR-CC-NB-ARC-LRR
Toll-interleukin-1 receptor-coiled-coil-nucleotide-binding-adapter shared by APAF-1, R proteins, and CED-leucine-rich repeat
- TIR-NB-ARC-LRR
Toll-interleukin-1 receptor- nucleotide-binding-adapter shared by APAF-1, R proteins, and CED-leucine-rich repeat
Author Contributions
Can Chen and Hui Guo proposed the conceptualization and wrote this manuscript; Haifei Hu and conducted data analyses and contributed to writing this manuscript; Xiuzhong Xia made the visualizations; Zongqiong Zhang, Baoxuan Nong and Rui Feng planted materials and collected samples; Boheng Liu conducted data analyses; Jianhui Liu and Shuhui Liang identified the resistance of brown planthopper; Danting Li, Junliang Zhao and Xinghai Yang conceived the project and contributed to the editing of the manuscript.
Funding
This study was supported by the Guangxi Department of Science and Technology (GuikeAA22068087-2), the National Natural Science Foundation of China (32360519, 3226047, 32160436, 32060476 and 31860371), National Key Research and Development Program of China (2021YFD1200505), Guangxi Academy of Agricultural Sciences (2023YM62, 2025YP032).
Data Availability
The assembly raw data reported in this paper has been deposited in the Genome Warehouse in the National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences/China National Center for Bioinformation, under accession numbers PRJCA029552 that is publicly accessible at https:// ngdc.cncb.ac.cn/gwh. The genome and annotation files have been uploaded to https://osf.io/y4gb5. O. officinalis MT10 is preserved in the National Wild Rice Germplasm Resource Nursery (Nanning) at the Lijian Research Base of the Guangxi Zhuang Autonomous Region Academy of Agricultural Sciences. Individuals and institutions that satisfy the relevant requirements of Chinese regulations can submit the necessary application materials to the National Centre for Germplasm Resources Conservation of China. Upon approval of the application, seeds will be distributed.
Declarations
Ethics Approval and Consent to Participate
Not applicable.
Consent for Publication
Written informed consent for publication was obtained from all participants.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Can Chen, Haifei Hu and Hui Guo have contributed equally to this work.
Contributor Information
Danting Li, Email: ricegl@163.com.
Junliang Zhao, Email: zhao_junliang@gdaas.cn.
Xinghai Yang, Email: yangxinghai514@163.com.
References
- Alexa A, Rahnenführer J (2009) Gene set enrichment analysis with topGO. Biocond Improv 27:1–26 [Google Scholar]
- Anders S, Pyl PT, Huber W (2015) HTSeq–a Python framework to work with high-throughput sequencing data. Bioinformatics 31:166–169 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bairoch A, Apweiler R (2000) The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res 28:45–48 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bansal J, Gupta K, Rajkumar MS, Garg R, Jain M (2021) Draft genome and transcriptome analyses of halophyte rice Oryza coarctata provide resources for salinity and submergence stress response factors. Physiol Plant 173:1309–1322 [DOI] [PubMed] [Google Scholar]
- Bateman A, Coin L, Durbin R, Finn RD, Hollich V, Griffiths-Jones S, Khanna A, Marshall M, Moxon S, Sonnhammer EL, Studholme DJ, Yeats C, Eddy SR (2004) The Pfam protein families database. Nucleic Acids Res 32:D138–141 [DOI] [PMC free article] [PubMed]
- Benson G (1999) Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res 27:573–580 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Birney E, Clamp M, Durbin R (2004) GeneWise and Genomewise. Genome Res 14:988–995 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Blanco E, Parra G, Guigó R (2007) Using geneid to identify genes. Curr Protoc Bioinform Chapter 4: Unit 4.3 [DOI] [PubMed]
- Burge C, Karlin S (1997) Prediction of complete gene structures in human genomic DNA. J Mol Biol 268:78–94 [DOI] [PubMed] [Google Scholar]
- Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL (2009) BLAST+: architecture and applications. BMC Bioinform 10:421 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Carrillo L, Martinez M, Alvarez-Alfageme F, Castañera P, Smagghe G, Diaz I, Ortego F (2011) A barley cysteine-proteinase inhibitor reduces the performance of two aphid species in artificial diets and transgenic Arabidopsis plants. Transgen Res 20:305–319 [DOI] [PubMed] [Google Scholar]
- Chen L, Yin FY, Zhang DY, Xiao SQ, Zhong QF, Wang B, Ke X, Ji ZY, Wang LX, Zhang Y, Jiang C, Liu L, Li JJ, Lu YD, Yu TQ, Cheng ZQ (2022) Unveiling a novel source of resistance to bacterial blight in medicinal wild rice. Oryza Officinalis Life (Basel) 12:827 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen SF, Zhou YQ, Chen YR, Gu J (2018) fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34:i884–i890 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cheng HY, Concepcion GT, Feng XW, Zhang HW, Li H (2021) Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat Methods 18:170–175 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cheng X, Wu Y, Guo J, Du B, Chen R, Zhu L, He G (2013) A rice lectin receptor-like kinase that is involved in innate immune responses also contributes to seed germination. Plant J 76:687–698 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Choe YH, Kim YS, Kim IS, Bae MJ, Lee EJ, Kim YH, Park HM, Yoon HS (2013) Homologous expression of γ-glutamylcysteine synthetase increases grain yield and tolerance of transgenic rice plants to environmental stresses. J Plant Physiol 170:610–618 [DOI] [PubMed] [Google Scholar]
- Dai SF, Zhu XG, Hutang GR, Li JY, Tian JQ, Jiang XH, Zhang D, Gao LZ (2022) genome size variation and evolution driven by transposable elements in the genus oryza. Front Plant Sci 13:921937 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dai ZY, Tan J, Zhou C, Yang XF, Yang F, Zhang SJ, Sun SC, Miao XX, Shi ZY (2019) The OsmiR396-OsGRF8-OsF3H-flavonoid pathway mediates resistance to the brown planthopper in rice (Oryza sativa). Plant Biotechnol J 17:1657–1669 [DOI] [PMC free article] [PubMed] [Google Scholar]
- De Bie T, Cristianini N, Demuth JP, Hahn MW (2006) CAFE: a computational tool for the study of gene family evolution. Bioinformatics 22:1269–1271 [DOI] [PubMed] [Google Scholar]
- Du B, Zhang WL, Liu BF, Hu J, Wei Z, Shi ZY, He RF, Zhu LL, Chen RZ, Han B, He GC (2009) Identification and characterization of Bph14, a gene conferring resistance to brown planthopper in rice. Proc Natl Acad Sci USA 106:22163–22168 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Du HL, Yu Y, Ma YF, Gao Q, Cao YH, Chen Z, Ma B, Qi M, Li Y, Zhao XF, Wang J, Liu KF, Qin P, Yang X, Zhu LH, Li SG, Liang CZ (2017) Sequencing and de novo assembly of a near complete indica rice genome. Nat Commun 8:15324 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Edgar RC (2004a) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32:1792–1797 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Edgar RC (2004b) MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinform 5:113 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Finn RD, Clements J, Eddy SR (2011) HMMER web server: interactive sequence similarity searching. Nucleic Acids Res 39:W29-37 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Flynn JM, Hubley R, Goubert C, Rosen J, Clark AG, Feschotte C, Smit AF (2020) RepeatModeler2 for automated genomic discovery of transposable element families. Proc Natl Acad Sci U S A 117:9451–9457 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Goff SA, Ricke D, Lan TH, Presting G, Wang R, Dunn M, Glazebrook J, Sessions A, Oeller P, Varma H, Hadley D, Hutchison D, Martin C, Katagiri F, Lange BM, Moughamer T, Xia Y, Budworth P, Zhong J, Miguel T, Paszkowski U, Zhang S, Colbert M, Sun WL, Chen L, Cooper B, Park S, Wood TC, Mao L, Quail P, Wing R, Dean R, Yu Y, Zharkikh A, Shen R, Sahasrabudhe S, Thomas A, Cannings R, Gutin A, Pruss D, Reid J, Tavtigian S, Mitchell J, Eldredge G, Scholl T, Miller RM, Bhatnagar S, Adey N, Rubano T, Tusneem N, Robinson R, Feldhaus J, Macalma T, Oliphant A, Briggs S (2002) A draft sequence of the rice genome (Oryza sativa L. ssp. japonica). Science 296:92–100 [DOI] [PubMed] [Google Scholar]
- Guo LJ, Li M, Wang WJ, Wang LJ, Hao GJ, Guo CM, Chen L (2012) Over-expression in the nucleotide-binding site-leucine rich repeat gene DEPG1 increases susceptibility to bacterial leaf streak disease in transgenic rice plants. Mol Biol Rep 39:3491–3504 [DOI] [PubMed] [Google Scholar]
- Guo JP, Xu CX, Wu D, Zhao Y, Qiu YF, Wang XX, Yidan OY, Cai BD, Liu X, Jing SL, Shangguan XX, Wang HY, Ma YH, Hu L, Wu Y, Shi SJ, Wang WL, Zhu LL, Xu X, Chen RZ, Feng YQ, Bo Du, He GC (2018) Bph6 encodes an exocyst-localized protein and confers broad resistance to planthoppers in rice. Nat Genet 50:297–306 [DOI] [PubMed] [Google Scholar]
- Haas BJ, Salzberg SL, Zhu W, Pertea M, Allen JE, Orvis J, White O, Buell CR, Wortman JR (2008) Automated eukaryotic gene structure annotation using evidencemodeler and the program to assemble spliced alignments. Genome Biol 9:R7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hayashi K, Yoshida H (2009) Refunctionalization of the ancient rice blast disease resistance gene Pit by the recruitment of a retrotransposon as a promoter. Plant J 57:413–425 [DOI] [PubMed] [Google Scholar]
- Henry RJ (2022) Wild rice research: advancing plant science and food security. Mol Plant 15:563–565 [DOI] [PubMed] [Google Scholar]
- Hirabayashi H (1998) Identification of brown planthopper resistance gene derived from O. officinalis using molecular markers in rice. Breed Sci 48:82 [Google Scholar]
- Horwitz AR (2012) The origins of the molecular era of adhesion research. Nat Rev Mol Cell Biol 13:805–811 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hu J, Li X, Wu CJ, Yang CJ, Hua HX, Gao GJ, Xiao JH, He YQ (2012) Pyramiding and evaluation of the brown planthopper resistance genes Bph14 and Bph15 in hybrid rice. Mol Breed 29:61–69 [Google Scholar]
- International Rice Genome Sequencing Project (2005) Sasaki T. The Map-Based Sequence of the Rice Genome Nature 436:793–800 [DOI] [PubMed] [Google Scholar]
- Ishimaru T, Hirabayashi H, Ida M, Takai T, San-Oh YA, Yoshinaga S, Ando I, Ogawa T, Kondo M (2010) A genetic resource for early-morning flowering trait of wild rice Oryza officinalis to mitigate high temperature-induced spikelet sterility at anthesis. Ann Bot 106:515–520 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ji H, Kim SR, Kim YH, Suh JP, Park HM, Sreenivasulu N, Misra G, Kim SM, Hechanova SL, Kim H, Lee GS, Yoon UH, Kim TH, Lim H, Suh SC, Yang J, An G, Jena KK (2016) Map-based cloning and characterization of the BPH18 gene from wild rice conferring resistance to brown planthopper (BPH) insect pest. Sci Rep 6:34376 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jia HQ, Wei HC, Zhu DM, Ma JJ, Yang H, Wang RZ, Feng XZ (2020) PASA: identifying more credible structural variants of Hedou12. IEEE/ACM Trans Comput Biol Bioinform 17:1493–1503 [DOI] [PubMed] [Google Scholar]
- Jiang HC, Hu J, Li Z, Liu J, Gao GJ, Zhang QL, Xiao JH, He YQ (2018) Evaluation and breeding application of six brown planthopper resistance genes in rice maintainer line Jin 23B. Rice (n Y) 11:22 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kanehisa M, Goto S (2000) KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res 28:27–30 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kim D, Langmead B, Salzberg SL (2015) HISAT: a fast spliced aligner with low memory requirements. Nat Methods 12:357–360 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kitazumi A, Pabuayon ICM, Ohyanagi H, Fujita M, Osti B, Shenton MR, Kakei Y, Nakamura Y, Brar DS, Kurata N, de los Reyes BG, (2018) Potential of Oryza officinalis to augment the cold tolerance genetic mechanisms of Oryza sativa by network complementation. Sci Rep 8:16346 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Korf I (2004) Gene finding in novel genomes. BMC Bioinform 5:59 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25:1754–1760 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R (2009) 1000 genome project data processing subgroup. The sequence alignment/map format and SAMtools. Bioinformatics 25(16):2078–2079 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li J, Liu XH, Wang QM, Sun JY, He DX (2021) Genome-wide identification and analysis of cystatin family genes in Sorghum (Sorghum bicolor (L.) Moench). PeerJ 9:e10617 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li JC, Liu XL, Wang Q, Huangfu JY, Schuman MC, Lou YG (2019) A group D MAPK protects plants from autotoxicity by suppressing herbivore-induced defense signaling. Plant Physiol 179:1386–1401 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li L, Stoeckert CJ Jr, Roos DS (2003) OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res 13:2178–2189 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li W, Li K, Huang Y, Shi C, Hu WS, Zhang Y, Zhang QJ, Xia EH, Hutang GR, Zhu XG, Liu YL, Liu Y, Tong Y, Zhu T, Huang H, Dan Z, Zhao Y, Jiang WK, Yuan J, Niu YC, Gao CW, Gao LZ (2020) SMRT sequencing of the Oryza rufipogon genome reveals the genomic basis of rice adaptation. Commun Biol 3:167 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li YY, Liu PC, Mei L, Jiang GW, Lv QW, Zhai WX, Li CR (2022) Knockout of a papain-like cysteine protease gene OCP enhances blast resistance in rice. Front Plant Sci 13:1065253 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Love M, Anders S, Huber W (2014) Differential analysis of count data–the DESeq2 package. Genome Biol 15:10–1186 [Google Scholar]
- Lu HP, Luo T, Fu HW, Wang L, Tan YY, Huang JZ, Wang Q, Ye GY, Gatehouse AMR, Lou YG, Shu QY (2018) Resistance of rice to insect pests mediated by suppression of serotonin biosynthesis. Nat Plants 4:338–344 [DOI] [PubMed] [Google Scholar]
- Luo RB, Liu BH, Xie YL, Li ZY, Huang WH, Yuan JY, He GZ, Chen YX, Pan Q, Liu YJ, Tang JB, Wu GX, Zhang H, Shi YJ, Liu Y, Yu C, Wang B, Lu Y, Han CL, Cheung DW, Yiu SM, Peng SL, Zhu XQ, Liu GM, Liao XK, Li YR, Yang HM, Wang J, Lam TW, Wang J (2012) SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. Gigascience 1(1):18 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Majoros WH, Pertea M, Salzberg SL (2004) TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders. Bioinformatics 20:2878–2879 [DOI] [PubMed] [Google Scholar]
- Marçais G, Kingsford C (2011) A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27:764–770 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Marçais G, Delcher AL, Phillippy AM, Coston R, Salzberg SL, Zimin A (2018) MUMmer4: a fast and versatile genome alignment system. PLoS Comput Biol 14:e1005944 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Martínez M, Abraham Z, Carbonero P, Díaz I (2005) Comparative phylogenetic analysis of cystatin gene families from arabidopsis, rice and barley. Mol Genet Genom 273:423–432 [DOI] [PubMed] [Google Scholar]
- McCann J, Macas J, Novák P, Stuessy TF, Villaseñor JL, Weiss-Schneeweiss H (2020) Differential genome size and repetitive dna evolution in diploid species of melampodium sect. Melampodium (Asteraceae). Front Plant Sci 11:362 [DOI] [PMC free article] [PubMed] [Google Scholar]
- McGinnis S, Madden TL (2004) BLAST: at the core of a powerful and diverse set of sequence analysis tools. Nucleic Acids Res 32:W20–W25 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Miao YH, Xu L, He X, Zhang L, Shaban M, Zhang XL, Zhu LF (2019) Suppression of tryptophan synthase activates cotton immunity by triggering cell death via promoting SA synthesis. Plant J 98:329–345 [DOI] [PubMed] [Google Scholar]
- Mondal TK, Rawal HC, Chowrasia S, Varshney D, Panda AK, Mazumdar A, Kaur H, Gaikwad K, Sharma TR, Singh NK (2018) Draft genome sequence of first monocot-halophytic species Oryza coarctata reveals stress-specific genes. Sci Rep 8:13698 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nanda S, Wan PJ, Yuan SY, Lai FX, Wang WX, Fu Q (2018) Differential responses of OsMPKs in IR56 rice to two BPH populations of different virulence levels. Int J Mol Sci 19:4030 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nattestad M, Schatz MC (2016) Assemblytics: a web analytics tool for the detection of variants from an assembly. Bioinformatics 32:3021–3023 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nguyen LT, Schmidt HA, Von Haeseler A, Minh BQ (2015) IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol Biol Evol 32(1):268–274 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Parra G, Bradnam K, Korf I (2007) CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics 23:1061–1067 [DOI] [PubMed] [Google Scholar]
- Phillips AL, Ferguson S, Watson-Haigh NS, Jones AW, Borevitz JO, Burton RA, Atwell BJ (2022) The first long-read nuclear genome assembly of Oryza australiensis, a wild rice from northern Australia. Sci Rep 12:10823 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Price AL, Jones NC, Pevzner PA (2005) De novo identification of repeat families in large genomes. Bioinformatics 21(Suppl 1):i351-358 [DOI] [PubMed] [Google Scholar]
- Puttick MN (2019) MCMCtreeR: functions to prepare MCMCtree analyses and visualize posterior ages on trees. Bioinformatics 35:5321–5322 [DOI] [PubMed] [Google Scholar]
- Qiu YF, Guo JP, Jing SL, Zhu LL, He GC (2012) Development and characterization of japonica rice lines carrying the brown planthopper-resistance genes BPH12 and BPH6. Theor Appl Genet 124:485–494 [DOI] [PubMed] [Google Scholar]
- Ren JS, Gao FY, Wu XT, Lu XJ, Zeng LH, Lv JQ, Su XW, Luo H, Ren GJ (2016) Bph32, a novel gene encoding an unknown SCR domain-containing protein, confers resistance against the brown planthopper in rice. Sci Rep 6:37645 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Renganayaki K, Fritz A, Sadasivam S, Pammi S, Harrington SE, McCouch S, Kumar SM, Reddy AS (2002) Mapping and progress toward map-based cloning of brown planthopper biotype-4 resistance gene introgressed from Oryza officinalis into cultivated rice. O Sativa Crop Sci 42:2112–2117 [Google Scholar]
- Ruan J, Li H (2020) Fast and accurate long-read assembly with wtdbg2. Nat Methods 17:155–158 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schatz MC, Maron LG, Stein JC, Hernandez Wences A, Gurtowski J, Biggers E, Lee H, Kramer M, Antoniou E, Ghiban E, Wright MH, Chia JM, Ware D, McCouch SR, McCombie WR (2014) Whole genome de novo assemblies of three divergent strains of rice, Oryza sativa, document novel gene space of aus and indica. Genome Biol 15:506 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shang LG, He WC, Wang TY, Yang YX, Xu Q, Zhao XJ, Yang LB, Zhang H, Li XX, Lv Y, Chen W, Cao S, Wang XM, Zhang B, Liu XP, Yu XM, He HY, Wei H, Leng Y, Shi CL, Guo ML, Zhang ZP, Zhang BT, Yuan QL, Qian HG, Cao XL, Cui Y, Zhang QQ, Dai XF, Liu CC, Guo LB, Zhou YF, Zheng XM, Ruan J, Cheng ZK, Pan WH, Qian Q (2023) A complete assembly of the rice Nipponbare reference genome. Mol Plant 16:1232–1236 [DOI] [PubMed] [Google Scholar]
- Sharma P, Al-Dossary O, Alsubaie B, Al-Mssallem I, Nath O, Mitter N, Rodrigues Alves Margarido G, Topp B, Murigneux V, Kharabian Masouleh A, Furtado A, Henry RJ (2021) Improvements in the sequencing and assembly of plant genomes. GigaByte 2021: gigabyte24 [DOI] [PMC free article] [PubMed]
- Shenton M, Kobayashi M, Terashima S, Ohyanagi H, Copetti D, Hernandez-Hernandez T, Zhang J, Ohmido N, Fujita M, Toyoda A, Ikawa H, Fujiyama A, Furuumi H, Miyabayashi T, Kubo T, Kudrna D, Wing R, Yano K, Nonomura K-I, Sato Y, Kurata N (2020) Evolution and diversity of the wild rice oryza officinalis complex, across continents, genome types, and ploidy levels. Genome Biol Evol 12:413–428 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shi SJ, Wang HY, Nie LY, Tan D, Zhou C, Zhang Q, Li Y, Du B, Guo JP, Huang J, Wu D, Zheng XH, Guan W, Shan JH, Zhu LL, Chen RZ, Xue LJ, Walling LL, He GC (2021) Bph30 confers resistance to brown planthopper by fortifying sclerenchyma in rice leaf sheaths. Mol Plant 14:1714–1732 [DOI] [PubMed] [Google Scholar]
- Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM (2015) BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31:3210–3212 [DOI] [PubMed] [Google Scholar]
- Song JM, Xie WZ, Wang S, Guo YX, Koo DH, Kudrna D, Gong C, Huang Y, Feng JW, Zhang W, Zhou Y, Zuccolo A, Long E, Lee S, Talag J, Zhou R, Zhu XT, Yuan D, Udall J, Xie W, Wing RA, Zhang Q, Poland J, Zhang J, Chen LL (2021) Two gap-free reference genomes and a global view of the centromere architecture in rice. Mol Plant 14:1757–1767 [DOI] [PubMed] [Google Scholar]
- Stamatakis A (2014) RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30:1312–1313 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stanke M, Keller O, Gunduz I, Hayes A, Waack S, Morgenstern B (2006) AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res 34:W435-439 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Steuernagel B, Witek K, Krattinger SG, Ramirez-Gonzalez RH, Schoonbeek HJ, Yu G, Baggs E, Witek AI, Yadav I, Krasileva KV, Jones JDG, Uauy C, Keller B, Ridout CJ, Wulff BBH (2020) The NLR-annotator tool enables annotation of the intracellular immune receptor repertoire. Plant Physiol 183:468–482 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sun SK, Xu X, Tang Z, Tang Z, Huang XY, Wirtz M, Hell R, Zhao FJ (2021) A molecular switch in sulfur metabolism to reduce arsenic and enrich selenium in rice grain. Nat Commun 12:1392 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sun YQ, Shang LG, Zhu QH, Fan LJ, Guo LB (2022) Twenty years of plant genome sequencing: achievements and challenges. Trends Plant Sci 27:391–401 [DOI] [PubMed] [Google Scholar]
- Takahashi A, Hayashi N, Miyao A, Hirochika H (2010) Unique features of the rice blast resistance Pish locus revealed by large scale retrotransposon-tagging. BMC Plant Biol 10:175 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tan GX, Ren X, Weng QM, Shi ZY, Zhu LL, He GC (2004a) Mapping of a new resistance gene to bacterial blight in rice line introgressed from Oryza officinalis. Yi Chuan Xue Bao 31:724–729 [PubMed] [Google Scholar]
- Tan GX, Weng QM, Ren X, Huang Z, Zhu LL, He GC (2004b) Two whitebacked planthopper resistance genes in rice share the same loci with those for brown planthopper resistance. Heredity (Edinb) 92:212–217 [DOI] [PubMed] [Google Scholar]
- Tanaka T, Nishijima R, Teramoto S, Kitomi Y, Hayashi T, Uga Y, Kawakatsu T (2020) De novo genome assembly of the indica rice variety IR64 using linked-read sequencing and nanopore sequencing. G3 Bethesda 10:1495–1501 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tarailo-Graovac M, Chen NS (2009) Using RepeatMasker to identify repetitive elements in genomic sequences. Curr Protoc Bioinform Chapter 4: 4.10.11–14.10.14 [DOI] [PubMed]
- Tempel S (2012) Using and understanding RepeatMasker. Methods Mol Biol 859:29–51 [DOI] [PubMed] [Google Scholar]
- van Berkum NL, Lieberman-Aiden E, Williams L, Imakaev M, Gnirke A, Mirny LA, Dekker J, Lander ES (2010) Hi-C: a method to study the three-dimensional architecture of genomes. J vis Exp 39:1869 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vieira P, Wantoch S, Lilley CJ, Chitwood DJ, Atkinson HJ, Kamo K (2015) Expression of a cystatin transgene can confer resistance to root lesion nematodes in Lilium longiflorum cv. “Nellie White.” Transgen Res 24(3):421–432 [DOI] [PubMed] [Google Scholar]
- Wang CC, Zheng LH, Tang Z, Sun SK, Ma JF, Huang XY, Zhao FJ (2020) OASTL-A1 functions as a cytosolic cysteine synthase and affects arsenic tolerance in rice. J Exp Bot 71:3678–3689 [DOI] [PubMed] [Google Scholar]
- Wang HB, Ye ST, Mou TM (2016) Molecular breeding of rice restorer lines and hybrids for brown planthopper (BPH) resistance using the Bph14 and Bph15 genes. Rice (n Y) 9:53 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang JP, Qin J, Sun PC, Ma XL, Yu JG, Li YX, Sun SR, Lei TY, Meng FB, Wei CD, Li XY, Guo H, Liu XJ, Xia RY, Wang L, Ge WN, Song XM, Zhang L, Guo D, Wang JY, Bao ST, Jiang S, Feng YS, Li XP, Paterson AH, Wang XY (2019) Polyploidy index and its implications for the evolution of polyploids. Front Genet 10:807 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang YP, Tang HB, Debarry JD, Tan X, Li JP, Wang XY, Lee TH, Jin HZ, Marler B, Guo H, Kissinger JC, Paterson AH (2012) MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Res 40:e49 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Westerdahl B, Riddle L, Giraud D, Kamo K (2023) Field test of Easter lilies transformed with a rice cystatin gene for root lesion nematode resistance. Front Plant Sci 14:1134224 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wing RA, Purugganan MD, Zhang Q (2018) The rice genome revolution: from an ancient grain to Green Super Rice. Nat Rev Genet 19:505–517 [DOI] [PubMed] [Google Scholar]
- Wu FQ, Mai YX, Chen CJ, Xia R (2024) SynGAP: a synteny-based toolkit for gene structure annotation polishing. Genome Biol 25(1):218 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Xu Z, Wang H (2007) LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res 35:W265-268 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yan LH, Luo TP, Huang DH, Wei MY, Ma ZF, Liu C, Qin YY, Zhou XL, Lu YP, Li RB, Qin G, Zhang YX (2023) Recent advances in molecular mechanism and breeding utilization of brown planthopper resistance genes in rice. Int J Mol Sci 24(15):12061 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yang HY, You AQ, Yang ZF, Zhang FT, He RF, Zhu LL, He GC (2004) High-resolution genetic mapping at the Bph 15 locus for brown planthopper resistance in rice (Oryza sativa L.). Theor Appl Genet 10(1):182–191 [DOI] [PubMed] [Google Scholar]
- Yang Z (2007) PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol 24:1586–1591 [DOI] [PubMed] [Google Scholar]
- Yu H, Lin T, Meng XB, Du HL, Zhang JK, Liu GF, Chen MJ, Jing YH, Kou LQ, Li XX, Gao Q, Liang Y, Liu XD, Fan ZL, Liang YT, Cheng ZK, Chen MS, Tian ZX, Wang YH, Chu CC, Zuo JR, Wan JM, Qian Q, Han B, Zuccolo A, Wing RA, Gao CX, Liang CZ, Li JY (2021) A route to de novo domestication of wild allotetraploid rice. Cell 184:1156-1170.e1114 [DOI] [PubMed] [Google Scholar]
- Zhang Q, Li TZ, Gao MY, Ye M, Lin MX, Wu D, Guo JP, Guan W, Wang J, Yang K, Zhu LL, Cheng YC, Du B, He GC (2022) Transcriptome and metabolome profiling reveal the resistance mechanisms of rice against brown planthopper. Int J Mol Sci 23:4083 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang Q, Liang Z, Cui XA, Ji CM, Li Y, Zhang PX, Liu JR, Riaz A, Yao P, Liu M, Wang YP, Lu TG, Yu H, Yang DL, Zheng HK, Gu XF (2018) N (6)-methyladenine dna methylation in japonica and indica rice genomes and its association with gene expression, plant development, and stress responses. Mol Plant 11:1492–1508 [DOI] [PubMed] [Google Scholar]
- Zhang WL, Dong Y, Yang L, Ma BJ, Ma RR, Huang FD, Wang CC, Hu HT, Li CS, Yan CQ, Chen JP (2014) Small brown planthopper resistance loci in wild rice (Oryza officinalis). Mol Genet Genom 289:373–382 [DOI] [PubMed] [Google Scholar]
- Zhang YS, Zhang SL, Liu H, Fu BY, Li LJ, Xie M, Song Y, Li X, Cai J, Wan WT, Kui L, Huang H, Lyu J, Dong Y, Wang WS, Huang LY, Zhang J, Yang QZ, Shan QL, Li Q, Huang WQ, Tao DY, Wang MH, Chen MS, Yu YS, Wing RA, Wang W, Hu FY (2015) Genome and comparative transcriptomics of african wild rice Oryza longistaminata provide insights into molecular mechanism of rhizomatousness and self-incompatibility. Mol Plant 8:1683–1686 [DOI] [PubMed] [Google Scholar]
- Zhang X, Ran W, Liu FJ, Li XW, Hao WJ, Sun XL (2020) Cloning, expression and enzymatic characterization of a cystatin gene involved in herbivore defense in tea plant (Camellia sinensis). Chemoecology 30:233–244 [Google Scholar]
- Zhao Y, Qiang CG, Wang XQ, Chen YF, Deng JQ, Jiang CH, Sun XM, Chen HY, Li J, Piao WL, Zhu XY, Zhang ZY, Zhang HL, Li ZC, Li JJ (2019) New alleles for chlorophyll content and stay-green traits revealed by a genome wide association study in rice (Oryza sativa). Sci Rep 9:2541 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhou H, Zhou Y, Zhang F, Guan WX, Su Y, Yuan XX, Xie YJ (2021) Persulfidation of nitrate reductase 2 is involved in l-cysteine desulfhydrase-regulated rice drought tolerance. Int J Mol Sci 22:12119 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhu WJ, Bai X, Li GT, Chen M, Wang Z, Yang Q (2019) SpCYS, a cystatin gene from wild potato (Solanum pinnatisectum), is involved in the resistance against Spodoptera litura. Theor Exp Plant Physiol 31:317–328 [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The assembly raw data reported in this paper has been deposited in the Genome Warehouse in the National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences/China National Center for Bioinformation, under accession numbers PRJCA029552 that is publicly accessible at https:// ngdc.cncb.ac.cn/gwh. The genome and annotation files have been uploaded to https://osf.io/y4gb5. O. officinalis MT10 is preserved in the National Wild Rice Germplasm Resource Nursery (Nanning) at the Lijian Research Base of the Guangxi Zhuang Autonomous Region Academy of Agricultural Sciences. Individuals and institutions that satisfy the relevant requirements of Chinese regulations can submit the necessary application materials to the National Centre for Germplasm Resources Conservation of China. Upon approval of the application, seeds will be distributed.





