Skip to main content
BMC Genomics logoLink to BMC Genomics
. 2025 Jan 21;26:54. doi: 10.1186/s12864-025-11246-0

The genomes of the most diverse AA genome rice species provide a resource for rice improvement and studies of rice evolution and domestication

Muhammad Abdullah 1,2, Agnelo Furtado 1, Ardashir Kharabian Masouleh 1, Pauline Okemo 1,2, Robert Henry 1,2,
PMCID: PMC11748844  PMID: 39838314

Abstract

Rice (Oryza sativa) is a staple food crop globally, with origins in wild progenitors within the AA genome group of Oryza species. Oryza rufipogon and Oryza meridionalis are native to tropical Asia and Northern Australia and offer unique genetic reservoirs. Here we explored the relationships of the genomes of these wild rice species with the domesticated rice genome. We utilized long read sequencing (PacBio HiFi) and chromatin mapping (Hi-C) to produce de novo chromosomal level genomes of Oryza meridionalis, the most divergent AA gnome species, and the unique Australian Oryza rufipogon like taxon that is a sister to the clade of domesticated and wild AA genome rice species of Asia and Africa. Comparative genomic analyses were conducted to identify structural variations and syntenic relationships between these wild taxa and the domesticated rice variety Nipponbare. The genome assemblies of the wild rice species achieved high completeness and contiguity, revealing the shared and unique genes in each species. Both wild species uniquely shared some genes with domesticated rice many of which were associated with disease resistance and stress tolerance. Structural differences included the large 6 Mb inversion on chromosome 6 specific to Japonica rice. Functional annotation highlighted conserved biological functions and novel genes unique to the wild taxa. These findings provide a deeper understanding of rice domestication and highlight the genetic contributions of wild species to enhancing the genetic diversity and ecological adaptability of modern rice varieties. Our study emphasizes the importance of conserving wild rice populations as genetic resources for breeding and adaptation in changing environments.

Supplementary Information

The online version contains supplementary material available at 10.1186/s12864-025-11246-0.

Keywords: Rice domestication, Oryza rufipogon, Oryza Meridionalis, Rice evolution, Structural variation

Introduction

Rice (Oryza sativa), a pivotal staple food globally, originates from wild progenitors within the AA genome group and was first domesticated in Asia [1]. Among these progenitors, Oryza rufipogon, which remains the closest wild relative, is widely distributed in tropical Asia [2]. The interfertility between wild and domesticated species has allowed gene flow over thousands of years, significantly shaping the genetic landscape of rice cultivated across Asia [3]. In contrast, wild rice populations in Northern Australia have remained isolated from these domestication processes, potentially preserving characteristics of pre-domestication species [4]. This isolation has provided a unique genetic reservoir that can offer insights into the traits of ancestral rice species and the dynamics of rice domestication.

The precise origins of rice domestication in Asia are difficult to pinpoint due to extensive gene flow between cultivated rice and wild populations in areas where rice has been grown for a long time. Essentially, rice plants that appear wild today and closely resemble O. sativa may actually be descendants of cultivated rice that escaped into the wild, rather than true representatives of the original wild ancestors [5]. This phenomenon of cultivated plants reverting to the wild is not unique to rice; and is similar to how domesticated barley has been found growing wild in Tibet [6]. Furthermore, archaeological findings reveal that japonica rice, a variety of O. sativa, was present early in both Southeast and South Asia [7], suggesting that the domestication of rice was likely to have been a complex process that involved multiple regions. The presence of two functionally distinct chloroplast genotypes in the domesticated O. sativa gene pool also suggests at least two separate domestications [8].

Moreover, the introgression of genes from O. meridionalis into cultivated rice varieties has been reported. The introgression of chromosomal regions with the Pi-cd locus from O. meridionalis into O. sativa during rice domestication [9] and the potential of these introgressions to confer increased tolerance to excess iron [10], indicate that the genetic contributions of Australian wild rice to domesticated rice are both significant and perhaps underappreciated. High-quality genome assemblies are essential for such studies as they provide accurate gene annotations, resolve structural variations, and allow precise identification of introgressed regions. These high-resolution datasets enable a more comprehensive understanding of the genetic diversity and evolutionary contributions of wild rice to domesticated varieties. Both the nuclear and chloroplast genome analyses conducted in previous studies corroborate the hypothesis of multiple sources contributing to the domestication of Asian rice [4, 11].

We now report the de novo sequencing and assembly of two wild AA genome species and comparison with the genome of domesticated rice. O. meridionalis is the most distant of the AA genome species from domesticated rice and was considered endemic to Australia abut may have distribution that extends into Asia. The Australian populations of Australian O. rufipogon have a chloroplast genome that is closer to that of O. meridionalsis than to that of Australian O. rufipogon from Asia [12]. Wild hybrids between these two taxa (Australian O. rufipogon and O. meridionalis) have been identified indicating ongoing reticulate evolution in wild Oryza [13] despite reproductive barriers [14]. The sequencing of these two species allowed the genomes to be compared with that of domesticated rice.

Materials and methods

DNA extraction and sequencing

DNA from leaf tissue of the Australian Oryza rufipogon like taxon and Oryza meridionalis, maintained in a glasshouse at The University of Queensland in Brisbane, Australia, DNA was extracted using a modified CTAB method [15]. A detailed description of the wild rice collection has been provided [4]. High-quality DNA obtained was sequenced at The Australian Genome Research Facility (AGRF), University of Queensland, using two PacBio Sequel II SMRT cells with the circular consensus sequence method to generate PacBio high-fidelity (HiFi) reads. Hi-C sequencing was conducted at The Ramaciotti Centre for Genomics, University of New South Wales, Australia, with library preparation and analysis performed using Phase Genomics Proximo Plant Hi-C version 4.0. Additionally, flow cytometry was carried out at the University of Queensland using the BD Biosciences LSR II Flow Cytometer and analyzed with the FlowJo software package. For this, fresh leaves of the wild rice were co-chopped with the reference standard, Macadamia tetraphylla (presumed size 796 Mb), in Arumuganathan and Earle buffer (10.1007/BF02672073). The nuclei were then gently filtered through a pre-soaked 40-µm nylon mesh, stained with 50 µg/mL of propidium iodide and 50 µg/mL of RNase A, and assessed across three biological replicates on different days.

Genome assembly and scaffolding

Genome assembly was performed using PacBio HiFi reads and Hi-C reads with the Hifiasm de novo assembler, incorporating the Hi-C integration option to enhance haplotype resolution [16]. The resulting contig assemblies were scaffolded utilizing Hi-C data through a combination of three aligners: Bowtie2 [17], Chromap [18], and BWA [19], alongside three scaffolding techniques: SALSA [20], YaHS [21], and the Arima mapping + SALSA scaffolding pipeline (https://github.com/ArimaGenomics/mapping_pipeline). The YaHS scaffolding pipeline was selected for the final assembly due to its superior contiguity and the presence of telomeres in scaffolds. This assembly and scaffolding method were consistently applied to both haplotypes, Hap1 and Hap2.

Assembly validation and chromosome assignment

The completeness of genome assembly and annotation was assessed using the BUSCO v5.2.2 in the viridiplantae lineage [22] and contiguity was evaluated using QUAST (version 5.0.2) [23]. Scaffolds were compared and aligned with our previously published genome of Oryza sativa Nipponbare to assign them to pseudochromosomes using D-Genies v.1.4 [24]. Telomeres in the pseudochromosomes were identified both manually and using the telomere identification toolkit (tidk) (https://github.com/tolkit/telomeric-identifier).

K-mer analysis

K-mer analysis was conducted using Jellyfish (v2.2.10) and GenomeScope [25, 26], providing further insights into the genome structure and complexity.

Detection and masking of repeat elements

Repeat elements were identified using a de novo approach with RepeatModeler2 version 2.0.1 [27] and cross-referenced against the Oryza repeat database Rice Genome Annotation Project (uga.edu). The identified repeat elements were subsequently masked in the genome using RepeatMasker version 4.0.9_p2 [28] employing the soft masking option to preserve underlying genomic information.

RNA-seq data alignment and gene prediction

RNA-seq data from both taxa (Australian O. rufipogon and O. meridionalis) [29], were utilized for annotation. Quality and adapter-trimmed RNA-seq reads were aligned to the soft-masked genome using HISAT2 [30]. Gene prediction was conducted with Braker3 (https://github.com/Gaius-Augustus/BRAKER), utilizing RNA-seq data and Viridiplantae protein databases to enhance prediction accuracy.

Assessment of annotation completeness

The completeness of the genome annotation was evaluated using BUSCO, which provided a quantitative measure of the annotation’s coverage of expected gene content in the viridiplantae lineage.

Functional annotation

Functional annotation was carried out using OmicsBox version 2.2.4 (https://www.biobam.com/omicsbox/). Coding DNA sequences (CDS) were analyzed using the BLASTX program, querying against the non-redundant protein sequences database restricted to Viridiplantae taxonomy, with a stringent e-value threshold of 1.0E-10 to ensure high specificity of matches. The CDS were further subjected to InterProScan to identify protein domains and associated biological functions. Concurrently, Gene Ontology (GO) terms were retrieved for all hits obtained from the BLASTX search using the Blast2GO annotation tool within OmicsBox. The GO terms identified through InterProScan were integrated with those obtained from Blast2GO to create a comprehensive set of functional annotations for each sequence.

Assessment of coding potential

CDS sequences that did not yield any hits from the BLASTX searches were further analyzed to assess their coding potential. This assessment was performed using the coding potential assessment tool (CPAT) with a prebuilt model of Arabidopsis thaliana, complemented by creating Oryza-specific models. This step was crucial to distinguish between coding and non-coding transcripts, particularly for those sequences uniquely identified in Oryza species.

Comparative genomics of Oryza species

To analyze structural and sequence variations between Oryza species, we employed the Synteny and Rearrangement Identifier (SyRI) tool [31]. Whole genome alignments were performed using Nucmer with the --maxmatch option to capture comprehensive alignments between the genomes, fine-tuned with parameters -c 100 -b 500 -l 50 to optimize the process. The resulting alignments were refined using the delta-filter tool and converted into tab-delimited formats via the show-coords command. SyRI was applied with default settings to predict genomic structures, and the outputs were visualized using plotsr [32].

In parallel, synteny analysis was conducted using the Python version of MCScan, utilizing coding sequence (CDS) and BED files to enhance the resolution of our comparisons. The CDS were compared using the LAST algorithm, with outputs filtered at a c-score threshold of 0.7 to exclude tandem duplications and weak hits. Identified homologs, termed anchors, were clustered into synteny blocks using a single linkage clustering mechanism [33]. This approach enabled a detailed examination of genomic linkages and rearrangements, providing insights into the evolutionary dynamics between the studied Oryza species. To perform comparative genomic analyses, we used the high-quality genome assembly of the Indica rice cultivar Minghui 63 (MH63KL1), with a total length of 397.71 Mb and no gaps across 12 chromosomes. This assembly, featuring 98.7% BUSCO completeness, was obtained from the Genome Warehouse (accession no. GWHBCKX00000000) and BioProject PRJNA650033.

Identification of species specific genes

To identify genes present in Oryza sativa Nipponbare and Oryza meridionalis but absent from Australian Oryza rufipogon, we implemented a two-step analytical strategy using rigorous criteria and computational tools. The coding DNA sequences (CDS) of Australian Oryza rufipogon were compared with those of Oryza sativa Nipponbare using CDS.vs.CDS mapping approach in the CLC Genomics Workbench (https://digitalinsights.qiagen.com/products-overview/discovery-insights-portfolio/analysis-and-visualization/qiagen-clc-genomics-workbench/). Mapping was performed using the Large Gap Read Mapping tool with the following parameters: match score of 1, mismatch cost of 2, insertion cost of 3, deletion cost of 3, length fraction of 0.9, similarity fraction of 0.9, a maximum number of hits for a segment set to 10, and a maximum distance from seed set to 50,000. Non-specific matches were handled by mapping randomly. Tool accuracy was assessed through control data by self-mapping, and mapped files were visually inspected in the CLC Genomics Workbench to confirm alignment accuracy. The tool’s visualization capabilities allowed us to examine mapped regions interactively, ensuring reliable identification of unique genes and minimizing false positives. These criteria ensured high-confidence identification of genes unique to Nipponbare. CDS that did not align with Oryza rufipogon under these parameters were marked as potentially unique to Nipponbare.

To confirm the absence of these putative Nipponbare-specific genes in Oryza rufipogon, the unique CDS identified in the first step were mapped against the entire genome sequence of Australian Oryza rufipogon using the Large Gap Read Mapping tool and the same parameters. This additional step was included because annotation inaccuracies or incompleteness in the Oryza rufipogon gene models might lead to false negatives during the CDS-to-CDS comparison. By mapping directly to the genome sequence, we minimized the risk of overlooking genes due to annotation gaps or errors, ensuring a more reliable identification of truly absent genes.

The verified Nipponbare-specific CDS were subsequently mapped against the genome sequence of Oryza meridionalis to identify genes that were present in Oryza meridionalis but absent from Oryza rufipogon. The same mapping criteria and Large Gap Read Mapping tool parameters were applied for consistency.

Results

Contig assembly and phasing of wild Oryza genomes

High-fidelity (HiFi) reads and Hi-C paired-end Illumina reads were used for genome assemblies of the two wild Oryza, utilizing the Hi-C integrated option in the Hifiasm assembler (Table S1). For the Australian O. rufipogon, the integration of Hi-C and HiFi reads in contig assembly resulted in a primary/collapse assembly comprising 652 contigs, covering a total genome size of 450 Mb, and achieving a 99.6% BUSCO completeness with an N50 of 21.2 Mb. The assembly for the phased haplotypes, hap1 and hap2, consisted of 628 and 315 contigs with total lengths of 434 Mb and 435 Mb, respectively. Both haplotypes demonstrated high ortholog coverage, with 99.5% single-copy orthologs covered, where hap1 had an N50 of 21.2 Mb, and hap2 had an N50 of 18.1 Mb (Table S2).

Similarly, for O. meridionalis, the integrated contig assembly resulted in a primary/collapsed assembly of 643 contigs with a total size of 443 Mb, a BUSCO completeness of 99.1%, and an N50 of 32.4 Mb. The phased haplotypes, hap1 and hap2, included 602 and 221 contigs with overall lengths of 406 Mb and 428 Mb, respectively. Hap1 covered 98.9% of the single copy orthologs with an N50 of 30.1 Mb, while hap2 covered 99.0% with an N50 of 32.4 Mb (Table S3).

Scaffolding and chromosome-scale assembly of wild Oryza genomes

Scaffolding was performed using Hi-C data with four distinct pipelines as detailed in the Materials and Methods section (Figure S1). Assemblies were evaluated by examining telomere presence at scaffold ends, assessing N50 values of the entire assembly, and by aligning the scaffolds with our previously published O. sativa Nipponbare genome [34]. The final Hi-C integrated assembly mapped with Hi-C reads with Chromap [18] and YaHS scaffolding pipeline [21] was selected based on its high contiguity, completeness, and the presence of telomeric repeats at the scaffold ends (Figure S1).

For the Australian O. rufipogon, the primary scaffold assembly consisted of 637 scaffolds with total length 451 Mb, achieving a 99.6% BUSCO completeness score with an N50 of 39.3 Mb (Table S4). The top twelve longest scaffolds, representing the twelve pseudochromosomes, total length 410 Mb and matched a 98.4% BUSCO score (Table 1).

Table 1.

Statistics for the Australian O. rufipogon haplotype resolved genome assembly

Primary/collapsed Hap1 Hap2
Total assembly size 410,488,348 399,072,477 406,835,324
Complete BUSCOs (%) 98.40% 99.30% 99.60%
Total scaffold number 12 12 12
scaffold N50 33,219,564 31,728,070 32,811,568
scaffold L50 6 6 6
Largest scaffold 46,773,952 46,317,106 45,972,278
GC content (%) 44.16 44.05 44.11

These were aligned with the O. sativa Nipponbare genome and labelled as Chr1-Chr12 based on their sequence order. Chromosome orientations were determined relative to O. sativa Nipponbare (Figure S2). Among these pseudochromosomes, five displayed telomeres at both ends, five at one end, and two lacked telomeric sequences at either end (Table 2). The Hi-C contact maps of these 12 pseudochromosomes shown in the supplementary Figure S3 (a-b).

Table 2.

Australian O. rufipogon haplotype resolved pseudochromosomes genome assembly size and telomeres

Primary assembly Hap1 Hap2
Chr Size Telomere Chr Size Telomere Chr Size Telomere
UQTA01 46,773,952 1 UQTA01-hap1 46,317,106 1 UQTA01-hap2 45,972,278 1
UQTA02 39,385,449 2 UQTA02-hap1 38,547,108 1 UQTA02-hap2 38,575,539 2
UQTA03 40,707,908 2 UQTA03-hap1 40,098,251 2 UQTA03-hap2 40,450,144 2
UQTA04 36,555,555 2 UQTA04-hap1 36,528,245 2 UQTA04-hap2 36,144,744 2
UQTA05 29,680,119 2 UQTA05-hap1 31,728,070 2 UQTA05-hap2 31,522,893 2
UQTA06 33,219,564 2 UQTA06-hap1 31,474,211 1 UQTA06-hap2 32,811,568 2
UQTA07 31,654,997 1 UQTA07-hap1 31,148,080 1 UQTA07-hap2 31,322,994 2
UQTA08 32,113,094 1 UQTA08-hap1 30,992,204 1 UQTA08-hap2 31,437,414 1
UQTA09 26,937,707 1 UQTA09-hap1 25,020,948 0 UQTA09-hap2 25,351,886 0
UQTA10 27,974,182 0 UQTA10-hap1 25,396,168 0 UQTA10-hap2 27,587,524 1
UQTA11 35,472,447 1 UQTA11-hap1 33,246,953 1 UQTA11-hap2 34,507,905 1
UQTA12 30,013,374 0 UQTA12-hap1 28,575,133 1 UQTA12-hap2 31,150,435 1

The two haplotypes of Australian O. rufipogon, Hap1 and Hap2, were assembled with same pipeline with the addition of minor manual adjustments using Hi-C contact maps in Juicebox [35]. For Hap1, the twelve longest scaffolds matched a 99.3% BUSCO score with total length 399 Mb with an N50 of 31.7 Mb. Hap1 displayed telomeres on three pseudochromosomes at both ends, seven at one end, and two lacked telomeres. For Hap2, the twelve longest scaffolds achieved a 99.6% BUSCO score total length 406 Mb with an N50 of 32.8 Mb. Hap2 had telomeres on six pseudochromosomes at both ends, five at one end, and one without telomeric sequences.

The orientation and chromosome numbers for these chromosome-scale pseudomolecules were aligned with the primary/collapsed assembly (Figure S4-S5). Similarly, scaffolding of O. meridionalis utilized the same pipelines, resulting in a primary scaffold assembly of 649 scaffolds with totaling 443 Mb with a 99.0% complete BUSCO score and an N50 of 34.6 Mb (Table 3).

Table 3.

Statistics for the O. meridionalis haplotype resolved genome assembly

Primary/collapsed Hap1 Hap2
Total assembly size 410,680,253 379,917,735 403,802,234
Complete BUSCOs (%) 99.00% 98.80% 99.30%
Total scaffold number 12 12 12
scaffold N50 34,624,000 33,129,346 32,842,334
scaffold L50 6 5 6
Largest scaffold 45,241,736 42,644,651 44,660,978
GC content (%) 43.56 43.49 43.53

The twelve longest scaffolds, representing pseudochromosomes and totaling 410 Mb, achieved a 99.0% BUSCO score. These were also aligned and labelled as Chr1-Chr12 based on the O. sativa Nipponbare genome, with chromosome orientations determined accordingly (Figure S6). Among these pseudochromosomes, seven displayed telomeres at both ends, three at one end, and two lacked telomeric sequences (Table 4). The Hi-C contact maps of these 12 pseudochromosomes shown in the supplementary Figure S3 (c-d).

Table 4.

O. meridionalis haplotype resolved pseudochromosomes genome assembly size and telomeres

Primary assembly Hap1 Hap2
Chr Size (Mb) Telomere Chr Size (Mb) Telomere Chr Size (Mb) Telomere
UQTB01 45,241,736 1 UQTB01-hap1 42,644,651 1 UQTB01-hap2 44,660,978 2
UQTB02 38,812,793 2 UQTB02-hap1 38,794,154 2 UQTB02-hap2 37,234,421 2
UQTB03 41,606,441 1 UQTB03-hap1 41,434,322 1 UQTB03-hap2 40,867,596 2
UQTB04 36,915,708 1 UQTB04-hap1 34,934,566 2 UQTB04-hap2 36,861,587 1
UQTB05 30,518,841 2 UQTB05-hap1 30,183,564 2 UQTB05-hap2 29,851,082 2
UQTB06 34,624,000 0 UQTB06-hap1 33,129,346 1 UQTB06-hap2 32,842,334 2
UQTB07 32,437,630 2 UQTB07-hap1 32,408,745 2 UQTB07-hap2 32,414,700 1
UQTB08 30,562,755 2 UQTB08-hap1 30,350,580 2 UQTB08-hap2 30,550,193 2
UQTB09 28,509,793 0 UQTB09-hap1 22,704,612 0 UQTB09-hap2 27,549,789 1
UQTB10 25,710,503 2 UQTB10-hap1 24,914,896 1 UQTB10-hap2 25,691,775 2
UQTB11 36,795,251 2 UQTB11-hap1 25,419,214 1 UQTB11-hap2 37,978,922 2
UQTB12 28,944,802 2 UQTB12-hap1 22,999,085 1 UQTB12-hap2 27,298,857 2

The haplotypes of O. meridionalis, Hap1 and Hap2, were assembled with minor adjustments based on their Hi-C contact maps [35]. Hap1’s twelve longest scaffolds matched a 98.8% BUSCO score totaling 379 Mb with an N50 of 33.1 Mb, displaying telomeres at both ends on five pseudochromosomes, at one end on six, and one lacked telomere. Hap2’s twelve longest scaffolds achieved a 99.3% BUSCO score totaling 403 Mb with an N50 of 32.8 Mb, with telomeres on nine pseudochromosomes at both ends, three at one end, and none without telomeric sequences.

The orientation and chromosome numbers for these chromosome-scale pseudomolecules were aligned with the primary/collapsed assembly (Figure S7-S8).

Repeat identification and masking in wild Oryza genomes

In the primary/collapsed assembly of the Australian O. rufipogon, transposable elements exhibit a varied and significant presence, accounting for substantial portions of the genome across all 12 pseudochromosomes. Retroelements, which include 52,533 elements covering 79,242,881 base pairs (19.30% of the genome), dominate. Among these, LTR elements, particularly Gypsy LTR retroelements and Copia LTR retroelements, are the most prominent, making up 18.25% of the genome with Gypsy LTR retroelements covering 65,884,197 base pairs (16.05%) and Copia LTR retroelements 7,741,389 base pairs (1.89%). Other notable elements include LINEs and SINEs, contributing 3,962,426 base pairs (0.97%) and 351,280 base pairs (0.09%), respectively. DNA transposons also feature, covering 17,050,039 base pairs (4.15%), with hobo-Activator types notably covering 2,505,380 base pairs (0.61%) (Table S5, Fig. 1a).

Fig. 1.

Fig. 1

Circular plots illustrating the distribution of genomic elements across the pseudochromosomes of Australian O. rufipogon (Panel a) and O. meridionalis (Panel b). Each ring represents a different category of genomic content: (A) Chromosome number and scale, (B) Retroelements, (C) DNA transposons, (D) Copia LTR retroelements, (E) Gypsy LTR retroelements, (F) LTR elements, (G) Gene density

Conversely, the O. meridionalis genome shows a slightly different pattern. Retroelements in O. meridionalis span 68,505,654 base pairs (16.68% of the genome), with LTR elements again leading at 15.53% of the genome coverage. Gypsy LTR retroelements is notably prevalent at 54,211,186 base pairs (13.20%), followed by Copia LTR retroelements at 9,003,681 base pairs (2.19%). LINEs and SINEs are less abundant than in Australian O. rufipogon, covering 4,558,486 base pairs (1.11%) and 166,194 base pairs (0.04%), respectively. DNA transposons are comparably significant, covering 17,042,415 base pairs (4.15%), with hobo-Activator types at 1,158,224 base pairs (0.28%) (Table S6, Fig. 1b). This comparative analysis highlights significant variability in the distribution of transposable elements between the two taxa, reflecting adaptations that might influence genomic architecture and evolution. Chromosomal differences are evident, with larger chromosomes typically hosting higher concentrations of these elements. These distinctions underscore the complex interplay of genomic structure and functional dynamics within and across the Oryza species.

Gene Prediction and RNA-seq alignment

Utilizing the repeat masked genome, processed with soft masking techniques, gene prediction was conducted using Quality and adapter-trimmed RNA-seq reads alongside viridiplantae proteins via Braker3. The RNA-seq alignment for the Australian O. rufipogon collapse assembly demonstrated a high degree of concordance with the genomic data, achieving an alignment rate of 90.23%. This process successfully identified 29,151 genes, corresponding to 32,711 coding sequences (CDS) and an equal number of proteins, indicative of a robust gene prediction effort. The BUSCO completeness score for this assembly stood at 96.40%, reflecting a high level of completeness and annotation quality.

Comparative analysis of the haplotypes within the Australian O. rufipogon genome revealed slightly lower metrics. The “Australian O. rufipogon-Hap1” assembly showed an RNA-seq alignment of 90.14%, identifying 24,928 genes, 28,138 CDS, and proteins, with a BUSCO score of 94.90%. Meanwhile, “Australian O. rufipogon-Hap2” exhibited a slightly higher RNA-seq alignment rate of 90.58%, with 25,799 genes, 29,016 CDS, and proteins, alongside a BUSCO completeness of 96.50% (Table S7).

For the O. meridionalis the RNA-seq alignment rate of 76.2%, was slightly lower than that observed in Australian O. rufipogon. The alignment rate of ∼ 76% for RNA-seq reads to the O. meridionalis genome is influenced by genomic diversity, microbial contamination, and splicing events, which are common challenges in RNA-seq experiments. Given the evolutionary divergence of O. meridionalis, a higher alignment rate was not expected. These findings are consistent with our previous work [13]. This assembly identified 26,633 genes and the same number of coding sequences and proteins, showcasing the thoroughness of the gene prediction process. Notably, the BUSCO completeness score for this assembly was exceptionally high at 98.3%, indicating a highly complete and well-annotated genome.

Comparative insights were gained by analyzing additional haplotypes within the O. meridonalis genome. “O. merdionalis-Hap1” achieved an RNA-seq alignment percentage of 84.9%, with 23,417 genes and 26,125 CDS and proteins identified, alongside a BUSCO score of 96.4%. Meanwhile, “O. merdionalis-Hap2” showed an alignment rate of 75.5%, but with a higher gene count of 27,399, correlating to 30,569 CDS and proteins, and a BUSCO score of 97.9% (Table S8).

Coding sequences analysis of wild Oryza genomes

In the analysis of the Australian O. rufipogon collapsed genome, a total of 32,711 coding sequences (CDS) were identified. Among these, 32,391 CDS demonstrated significant similarity to known proteins via BLAST analysis, underscoring their predicted functional roles. Notably, 320 CDS lacked BLAST hits, suggesting these may represent novel genes unique to Australian O. rufipogon. These CDS were further assessed for coding potential using Arabidopsis thaliana and Oryza sativa models, which confirmed 243 sequences (75.9%) as coding under the Arabidopsis model, while the remaining sequences were predicted as non-coding (Table S9).

Similarly, the O. meridonalis collapsed genome revealed 29,604 CDS, with 29,220 showing significant matches in BLAST, indicating a high conservation of functional proteins. The coding potential assessment for the 384 CDS without BLAST hits resulted in 310 sequences (80.7%) being identified as coding according to the Arabidopsis model, affirming a substantial proportion of these as novel yet functional genes) (Table S10, Fig. 2).

Fig. 2.

Fig. 2

Combined bar and pie charts representing the coding sequences analysis of Australian O. rufipogon and O. meridionalis

The comparative functional annotation across both taxa highlights a substantial alignment of predicted genes with known protein databases, reflecting a strong conservation of essential biological functions. These findings also suggest a small but significant fraction of potentially novel genes, which could provide insights into the unique biological pathways or adaptations present in Australian O. rufipogon and O. meridionalis. The distribution of GO annotations and coding potential across both models provides a deeper understanding of the genomic complexity and evolutionary dynamics within these species.

Genomic divergence and structural variations in Oryza species

In this study, comprehensive genomic comparisons were conducted between the genomes of Australian O. rufipogon-like taxa and O. meridionalis, as well as between these wild rice genomes and the cultivated rice varieties Nipponbare (Japonica) and Indica. A notable 6 Mb inversion on chromosome 6 was identified in Australian O. rufipogon wild rice when aligned against the Japonica genome. This inversion, conspicuously absent in the comparative analysis with Indica, delineates a critical genomic divergence between these rice subspecies. A similar pattern was observed in O. meridionalis, where a 6 Mb inversion specific to Japonica was also identified on chromosome 6 (Fig. 3).

Fig. 3.

Fig. 3

Comparative Genomic Analysis of Rice Genomes. Panel (a) and (b) depict syntenic relationships and structural variations among Nipponbare, Australian O. rufipogon, and Indica genomes, highlighting inversions (orange), translocations (yellow), and duplications (green). Panel (c) shows a detailed dot plot of chromosome 6 for Australian O. rufipogon, illustrating specific inversions and duplications. Panel (d) presents alignment tracks of chromosome 6, marking insertions, deletions, and inversions, alongside a synteny map that reveals broader structural variations

This inversion’s detection is attributed to the high resolution of advanced genome assemblies, which were not available in earlier studies. The 6 Mb inversion observed in our study aligns with previous findings on the role of inversions in rice genomics, as reported by Zhou et al. (2023) and Xie et al. (2021) [36, 37]. The inversion’s reliability was validated through careful examination of alignment and assembly quality, as well as comparative analyses with other high-quality genomes, underscoring its importance in Japonica rice evolution.

Further comparative genomic analysis at the genome assembly level across Australian O. rufipogon, O. meridionalis, and Nipponbare (UQ_NIP)) has elucidated significant structural and sequence variations, underscoring the divergent evolutionary trajectories and genetic diversity among these rice genomes. For instance, the analysis between Australian O. rufipogon and O. meridionalis revealed 989 syntenic regions spanning 307,021,491 base pairs in Australian O. rufipogon and 296,977,252 in O. meridionalis, demonstrating substantial genomic conservation (Table S11, Fig. 4a).

Fig. 4.

Fig. 4

Panel (a) showcases genome sequence assembly comparisons between Australian O. rufipogon and O. meridionalis across 12 chromosomes, highlighting synteny, inversions, translocations, and duplications. Panel (b) visualizes the annotation of synteny blocks between the two taxa, emphasizing genomic connectivity and shared regions across the chromosomes

Comparatively, Australian O. rufipogon vs. Nipponbare (UQ_NIP)) revealed 5,786 syntenic regions with 207,196,097 base pairs in Australian O. rufipogon compared to 187,769,752 in Nipponbare (UQ_NIP), indicating significant genetic alignment yet notable divergence (Table S12, Fig. 5a).

Fig. 5.

Fig. 5

Panel (a) showcases genome sequence assembly comparisons between Australian O. rufipogon and Nipponbare across 12 chromosomes, highlighting synteny, inversions, translocations, and duplications. Panel (b) visualizes the annotation of synteny blocks between the two taxa, emphasizing genomic connectivity and shared regions across the chromosomes

O. meridionalis vs. Nipponbare (UQ_NIP)) displayed even more extensive synteny with 8,198 regions involving 213,880,714 base pairs in O. meridionalis and 190,456,022 in Nipponbare (UQ_NIP), underscoring profound genomic concordance and evolutionary divergence (Fig. 6a).

Fig. 6.

Fig. 6

Panel (a) showcases genome sequence assembly comparisons between O. meridionalis and Nipponbare across 12 chromosomes, highlighting synteny, inversions, translocations, and duplications. Panel (b) visualizes the annotation of synteny blocks between the two taxa, emphasizing genomic connectivity and shared regions across the chromosomes

Inversions and translocations across these comparisons further illustrate the genomes’ dynamic structural rearrangements. For example, there are 187 inversions in Australian O. rufipogon vs. Nipponbare (UQ_NIP)) and 216 in O. meridionalis vs. Nipponbare (UQ_NIP), reflecting distinct structural evolution patterns between the rice varieties. SNPs are prevalent across all taxa, with 1,840,501 SNPs in Australian O. rufipogon vs. Nipponbare (UQ_NIP)and 2,378,502 in O. meridionalis vs. Nipponbare (UQ_NIP), highlighting mutation rates and potential adaptive genetic drift or selection. Notably, insertions and deletions mark significant evolutionary impacts, with 133,625 insertions in Australian O. rufipogon versus O. meridionalis and 135,379 in Australian O. rufipogon versus Nipponbare (UQ_NIP), indicating active genomic alterations. The analysis also delineated highly diverged regions and copy number variations, with Australian O. rufipogon exhibiting 219,185,318 base pairs of highly diverged regions compared to 206,418,256 in O. meridionalis, and 57,393,529 base pairs in the comparison with Nipponbare (UQ_NIP). These regions suggest rapid evolutionary changes or functional diversification, which are crucial for deciphering the adaptive and speciation processes within these species. Overall, this extensive genomic comparison elucidates the intricate genetic landscapes and structural variations of Australian O. rufipogon, O. meridionalis, and Nipponbare (UQ_NIP). These insights are pivotal for understanding the genetic underpinnings of phenotypic traits and the evolutionary resilience in these rice species, providing critical markers for breeding and conservation strategies. At the annotation level, collinearity analysis, conducted using MCScanX, compared the genomes of Australian O. rufipogon, O. meridionalis, and Nipponbare (UQ_NIP) across various parameters to assess genomic conservation and evolutionary relationships. The analysis settings included a match score of 50, a match size of 5, a gap penalty of -1, an overlap window of 5, an e-value of 1e-05, and a maximum of 25 gaps. The analysis between Australian O. rufipogon and O. meridionalis showed a higher degree of conservation with 68.2% of genes showing collinearity. Chromosome 5 displayed the largest and most numerous syntenic blocks, pointing to a key area of evolutionary significance (Fig. 4b). Chromosome 1, however, showed fewer but significantly larger blocks, which might reflect older evolutionary events that have remained largely unchanged. In the comparison between Australian O. rufipogon and Nipponbare (UQ_NIP), significant genomic conservation was observed, with 67.6% of genes analyzed exhibiting collinearity. The distribution of syntenic blocks across chromosomes revealed a high concentration of blocks on chromosomes 3 and 7, with chromosome 3 featuring larger blocks indicative of deeper evolutionary links. In contrast, chromosome 8 had smaller but more numerous blocks, suggesting frequent but more recent evolutionary rearrangements (Fig. 5b). For the comparison involving O. meridionalis and Nipponbare (UQ_NIP), 66.2% of the genes were aligned, with syntenic blocks predominantly found on chromosomes 2 and 6. Chromosome 2 exhibited a higher number of smaller blocks, while chromosome 6 contained fewer, larger blocks, suggesting strong evolutionary conservation in specific regions (Fig. 6b). These analyses highlight the extensive genomic conservation between the taxa, with variations in block size and frequency across chromosomes providing insights into both the function and evolutionary history of these genomic regions. The detailed comparison of syntenic blocks enhances our understanding of the evolutionary dynamics that have shaped these genomes. In addition to collinearity, our analysis explored the types of gene duplications present in Australian O. rufipogon and O. meridionalis, revealing distinct patterns that may reflect different evolutionary pressures or historical adaptations. In Australian O. rufipogon, a total of 32,711 duplication events were identified, distributed as follows: 4,572 singletons (13.97%), 11,149 dispersed (34.04%), 1,566 proximal (4.78%), 9,553 tandem (29.17%), and 5,871 whole genome duplications (WGD) or segmental (17.94%) (Table S14). In comparison, O. meridionalis, which had a total of 29,604 duplication events, exhibits a slightly different distribution: 15.0% are singletons (4,445 events), 33.1% are dispersed (9,808 events), 3.4% are proximal (1,016 events), 26.4% are tandem (7,827 events), and 22% are whole genome or segmental duplications (6,508 events) (Table S14). These differences highlight distinct evolutionary trajectories. Australian O. rufipogon shows a higher prevalence of dispersed and tandem duplications, suggesting a dynamic genome with extensive rearrangements and duplicative events. Conversely, the higher number of WGD or segmental duplications in O. meridionalis indicates a history of large-scale duplication events, which might have played a crucial role in its evolution, possibly impacting its adaptability and speciation.

Relationship of the genomes the Australian species with domesticated rice

Rice domestication has traditionally been explained through two main models: the Single Origin Model and the Multiple Origin Model. The Single Origin Model posits that rice was domesticated once from a specific wild ancestor in a single geographic location, leading to high genetic similarity among domesticated varieties due to a common ancestry [38]. Conversely, the Multiple Origin Model suggests that rice was domesticated independently in multiple regions from different ancestral populations, resulting in greater genetic diversity and morphological variation among rice varieties [39]. These models provide frameworks for understanding the complex history of rice evolution and its adaptations to various environments.

In this study, we performed a detailed analysis of gene in Nipponbare rice shared with the Australain species. Our findings revealed that 555 genes in Nipponbare were not present in Australian O. rufipogon, of which 320 were identified in O. meridionalis (Table 5).

Table 5.

Summary of genes shared between Nipponbare (Oryza sativa ssp. japonica cv. Nipponbare) Australian O. Rufipogon. And O. Meridionalis

Nipponbare (UQ_NIP) vs. Australian O. rufipogon
Level Genome Name No. of CDS Mapping Total Reads Mapped Reads Not Mapped Reads
CDS NIPPONBARE (UQ_NIP)-CDS 31,540 Query 31,540 28,498 3,042
Australian O. rufipogon-CDS 29,151 Reference 29,151 28,498
Genome NIPPONBARE (UQ_NIP)-CDS 3,042 Query 3,042 2,487 555
Australian O. rufipogon-Genome Reference Genome sequence
Unique_Nipponbare (UQ_NIP) vs. O. meridionalis
CDS NIPPONBARE (UQ_NIP)-CDS 555 Query 555 205 350
O. meridionalis -CDS 26,633 Reference 26,633 205
Genome NIPPONBARE (UQ_NIP)-CDS 350 Query 350 115 235
O. meridionalis-Genome Reference Genome sequence 115
Total Identified Genes Contributing to 320 205 (CDS) + 115 (Genome) = 320

Functional annotation of these 320 genes from O. meridionalis revealed substantial contributions to disease resistance and environmental stress tolerance. Notably, approximately 15% of these genes (around 48 genes) were directly associated with disease resistance mechanisms, such as pathogen recognition and signal transduction pathways that activate defence responses. This includes genes encoding proteins with kinase activity, which are essential for initiating downstream signalling in response to pathogenic attacks. Furthermore, about 10% of the genes (approximately 32 genes) enhance tolerance to abiotic stresses, including those related to water deprivation and temperature extremes. These genes function in various stress-responsive pathways, aiding in the plant’s ability to withstand adverse conditions. Additionally, a smaller subset of around 5% (about 16 genes) is involved in oxidative stress responses, which are crucial for mitigating the damaging effects of reactive oxygen species produced under stress conditions. These findings underscore the significant role of genes inherited from a common ancestor with O. meridionalis in enhancing the genetic diversity and adaptive traits of Nipponbare rice. These genes not only contribute to increased disease resistance but also bolster the plant’s resilience against environmental stresses, thereby supporting sustainable agricultural practices. For example, the genes common with O. meridionalis include significant loci such as the Pi-cd locus, which has been linked to improved disease resistance [9]. These traits are supported by quantitative trait locus (QTL) and transcriptomic analyses, indicating that genes found to be shared with O. meridionalis have played a crucial role in adapting Nipponbare to diverse environmental conditions and agricultural demands [10]. This suggests a complex picture of rice domestication, where contributions found in various wild relatives have combined to produce the phenotypic diversity and adaptability observed in modern cultivated varieties. This is supported by studies showing that wild rice populations, such as those in Australia, possess significant genetic resources for traits beneficial to rice cultivation and improvement [4042]. These findings are consistent with the broader evidence of adaptive introgression in rice, as documented in various studies. Wild rice species like O. meridionalis and Australian O. rufipogon provide valuable genetic resources for exploring the genetics and breeding potential of rice wild relatives [43]. Moreover, the genetic diversity contributed by these wild species may be crucial for enhancing the resilience and adaptability of rice to changing environmental conditions and agricultural practices [44].

Discussion

Our comprehensive genomic analysis of Australian O. rufipogon and O. meridionalis has unveiled significant insights into the evolutionary dynamics and genetic contributions of wild rice populations to domesticated rice. The high-quality genome assemblies of these taxa, facilitated by HiFi and Hi-C sequencing, have provided a detailed view of their genomic architecture and evolutionary history. The reveals substantial numbers of genes found in O. meridionalis were shared with O. sativa Nipponbare, with 320 unique genes identified from O. meridionalis that were not present in the Australian O. rufipogon genome. These genes are predominantly involved in disease resistance and environmental stress tolerance, highlighting the adaptive benefits conferred by genetic introgression. The genes identified as unique to O. sativa Nipponbare and O. meridionalis, but absent from Australian O. rufipogon, exhibit functional enrichments related to stress tolerance, disease resistance, and oxidative stress pathways. These genes, which include loci associated with kinase activity and stress-responsive mechanisms, underscore their potential role in enhancing the resilience of domesticated rice. For example, several of these genes encode proteins involved in pathogen recognition and downstream signal transduction pathways, which are critical for biotic stress responses. Additionally, genes linked to oxidative stress management provide further evidence of their adaptive significance in coping with environmental stresses. The functional annotations of these unique genes not only shed light on their evolutionary importance but also underscore their potential utility in modern rice breeding programs. The presence of genes contributing to abiotic and biotic stress tolerance highlights their importance in inferring the domestication history of rice and improving agricultural productivity. These findings align with prior studies, such as those by Fujino et al. (2019), which demonstrated the introgression of stress-tolerance traits from wild species into cultivated rice varieties [9]. Our results reaffirm the value of these wild species as reservoirs of beneficial alleles and emphasize the need for their conservation to ensure continued genetic improvement of rice under changing environmental conditions. Notably, the presence of genes associated with kinase activity and oxidative stress response underscores the role of introgression in enhancing the resilience of domesticated rice to biotic and abiotic stresses. Our comparative genomic analysis between the wild rice, and Nipponbare has identified significant structural variations, including a notable 6 Mb inversion on chromosome 6. This inversion, specific to the Japonica subspecies, delineates critical genomic divergence that may have implications for the phenotypic and adaptive differences observed between Japonica and Indica rice. The extensive syntenic blocks and structural rearrangements observed across the genomes further elucidate the complex evolutionary trajectories of these species.

The present study provides significant insights into the genetic diversity and evolutionary history of Australian O. rufipogon and O. meridionalis. Previous studies [12, 4] have highlighted the significant genetic divergence between the Australian and Asian populations of O. rufipogon, suggesting a period of evolutionary isolation. The chloroplast genome sequence was used to compare Australian O. rufipogon and O. meridionalis with their Asian counterparts. The study identified 124–125 genetic variations distinguishing the Australian species from Asian O. rufipogon, and only 53 variations between the two Australian taxa (Australian O. rufipogon and O. meridionalis). This level of genetic divergence underscores a prolonged period of evolutionary isolation, and also supports the hypothesis of a Gondwanan origin for these species, followed by long-distance dispersal [12]. Phylogenetic analyses [12] revealed that the Australian taxa are genetically distinct from both Asian O. rufipogon and domesticated rice (O. sativa). This genetic distinctness was further corroborated by whole-genome sequencing [4], which showed that the Australian O. rufipogon was a sister to the clade that included the Asian and African A genome wild and domesticated species. Analysis of the phylogenetic relationships at the chromosome level revealed introgression of genes from the Australian O. rufipogon specifically into the Asian rice populations. This suggests that the divergence between these taxa has been shaped by both natural evolutionary processes and human-mediated selection over millennia. Our findings align with these earlier studies, highlighting the significance of Australian O. rufipogon and O. meridionalis as crucial genetic resources for rice improvement. The unique genetic attributes of these species, particularly their genetic distinctness and diversity, underscore their potential to contribute novel traits, such as disease resistance, environmental stress tolerance, and enhanced nutritional value, to cultivated rice varieties. Furthermore, the conservation implications of these findings cannot be overstated. As previously argued [4], the preservation of these wild rice populations, both in situ and ex situ, is imperative for maintaining their genetic diversity and ensuring future food security. The potential for genetic contamination from cultivated rice in Asia, which has been documented for O. rufipogon, poses a significant threat to these wild populations. Therefore, comprehensive conservation strategies must be developed and implemented to protect these valuable genetic resources. The integration of our findings with those of earlier studies [12, 4] provides a more comprehensive understanding of the genetic diversity and evolutionary history of Australian wild rice species. The distinct genetic profiles and high levels of diversity observed in these species highlight their importance as genetic resources for rice improvement.

The functional annotation and gene prediction efforts have demonstrated a high degree of conservation of essential biological functions across the two wild rice species. However, the identification of novel genes unique to these wild taxa suggests potential undiscovered pathways and adaptive mechanisms. The differential distribution of transposable elements, with Australian O. rufipogon exhibiting a higher prevalence of dispersed and tandem duplications, indicates distinct evolutionary pressures shaping the genomic landscapes of these species.

Our analysis identified and characterized the number of duplications in Australian O. rufipogon and O. meridionalis, providing insights into the genomic landscape of these wild rice species. While this study did not determine the timing or species specificity of these duplication events, such analyses could reveal critical insights into the evolutionary dynamics of these genomes. For example, understanding the rate and timing of duplications could help elucidate whether these events have contributed to the adaptation and divergence of Australian Oryza species. Future research could apply approaches such as Ks (synonymous substitution rate) analysis or phylogenetic dating to estimate the age of duplications and identify species-specific events. These methods, coupled with functional annotation of duplicated genes, could provide a deeper understanding of how these duplications influence genome evolution, adaptation to environmental stresses, and divergence within the Oryza genus. Our findings lay the groundwork for such studies, highlighting the significance of duplications in shaping the genomic architecture and functional potential of wild rice species. Expanding this research to include dating and evolutionary analyses will enhance our understanding of the role of these duplications as drivers of genetic diversity and species differentiation.

The genetic diversity and adaptive traits uncovered in this study underscore the importance of wild rice populations as reservoirs of beneficial alleles for rice improvement. Genes from O. meridionalis, particularly those enhancing disease resistance and stress tolerance, present valuable targets for breeding aimed at developing resilient rice varieties. Moreover, the detailed understanding of genomic structural variations and evolutionary dynamics provides a foundation for future research on the adaptive evolution and domestication of rice.

In summary, our study highlights the dynamic interplay between wild and domesticated rice gene pools, facilitated by gene flow and hybridization events. The insights gained from the genomic analysis of Northern Australian wild rice populations contribute to a deeper understanding of rice domestication and offer promising avenues for the improvement of cultivated rice. The genetic contributions of the genes found in these wild species to Nipponbare rice exemplify the ongoing importance of wild species in enhancing the genetic diversity and adaptive potential of modern crops.

Very few wild rice genomes have been sequenced to the level of accuracy and completeness employed in this study and necessary for analysis of genome differences. Most have not used highly accurate sequencing, are not de novo assembled, complete or haplotype resolved (Chen et al., 2013; Zhang et al., 2016). The production of more high-quality genomes form wild rice populations in Asia and Africa will allow these studies to be extended to provide a more complete understanding of rice evolution and domestication.

Electronic supplementary material

Below is the link to the electronic supplementary material.

12864_2025_11246_MOESM1_ESM.pdf (1.1MB, pdf)

Supplementary Table S1-14: Provide detailed data and analyses supporting the main findings of this study, including genomic features, assembly statistics, variant calling results, and comparative analyses across Australian Oryza rufipogon and Oryza meridionalis assemblies and haplotypes. Figure S1 Workflow of HiFi + HiC Contig Assembly and Scaffolding Pipelines. Figure S2 Alignment Dotplot of Australian O. rufipogon Scaffolds against O. sativa Nipponbare Chromosomes. Figure S3 High-resolution Hi-C contact maps displaying the architecture of the wild rice genomes. Panels (a) and (b) present the contact maps for the 12 pseudochromosomes of Australian O. rufipogon with bin sizes of 1000 Kb and 500 Kb, respectively, indicating chromosomal interactions. Panels (c) and (d) depict similar maps for O. meridionalis, illustrating the consistency and variation in chromosomal contact points across both taxa. Figure S4 Alignment Dotplot of Australian O. rufipogon-collapsed Assembly against O. Australian O. rufipogon-Hap1. Figure S5 Alignment Dotplot of Australian O. rufipogon-collapsed Assembly against O. Australian O. rufipogon-Hap2. Figure S6 Alignment Dotplot of O. meridionalis-collapsed Assembly against O. sativa Nipponbare Chromosomes. Figure S7 Alignment Dotplot of O. meridionalis-Collapse Assembly against O. meridionalis-Hap1. Figure S8 Alignment Dotplot of O. meridionalis-Collapse Assembly against O. meridionalis-Hap2

Author contributions

M.A performed the experiments and analysed the data, A.F. and A.K. supervised and assisted in data analysis, P.O helped in writing the manuscript and data interpretation, R.H was involved in conception and supervision of the research. All authors approved the final version of the paper.

Funding

This research was supported by the ARC Centre of Excellence for Plant Success in Nature and Agriculture (Grant number CE200100015).

Data availability

The whole genome sequence data reported in this paper have been deposited in NCBI and the Genome Warehouse at the National Genomics Data Centre (NGDC), Beijing Institute of Genomics, Chinese Academy of Sciences. The data are associated with the following submissions: NCBI submission IDs SUB14650113 (Australian Oryza rufipogon Collapsed Assembly and Haplotypes) and SUB14659345 (Oryza meridionalis Collapsed Assembly and Haplotypes); NGDC Bio Projects PRJCA029102 (Australian Oryza rufipogon Collapsed Assembly and Haplotypes) and PRJCA029129 (Oryza meridionalis Collapsed Assembly and Haplotypes). These datasets are publicly accessible through the respective links provided by NCBI and NGDC.

Declarations

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Khush GS. Origin, dispersal, cultivation and variation of rice. Plant Mol Biol. 1997;35:25–34. [PubMed] [Google Scholar]
  • 2.Vaughan DA, Lu B-R, Tomooka N. The evolving story of rice evolution. Plant Sci. 2008;174:394–408. 10.1016/j.plantsci.2008.01.016 [Google Scholar]
  • 3.Chen J, et al. Whole-genome sequencing of Oryza brachyantha reveals mechanisms underlying Oryza genome evolution. Nat Commun. 2013;4:1595. 10.1038/ncomms2596 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Brozynska M, et al. Sequencing of Australian wild rice genomes reveals ancestral relationships with domesticated rice. Plant Biotechnol J. 2017;15:765–74. 10.1111/pbi.12674 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Fuller DQ, et al. Consilience of genetics and archaeobotany in the entangled history of rice. Archaeol Anthropol Sci. 2010;2:115–31. 10.1007/s12520-010-0035-y [Google Scholar]
  • 6.Jones M, et al. Food globalization in prehistory. World Archaeol. 2011;43:665–75. 10.1080/00438243.2011.624764 [Google Scholar]
  • 7.Gross BL. Z. Zhao 2014 Archaeological and genetic insights into the origins of domesticated rice. Proc Natl Acad Sci 111 6190–7 10.1073/pnas.1308942110 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Moner AM, Furtado A, Henry RJ. Two divergent chloroplast genome sequence clades captured in the domesticated rice gene pool may have significance for rice production. BMC Plant Biol. 2020;20:472. 10.1186/s12870-020-02689-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Fujino K, Hirayama Y, Obara M, Ikegaya T. Introgression of the chromosomal region with the Pi-cd locus from Oryza Meridionalis into O. Sativa L. during rice domestication. Theor Appl Genet. 2019;132:1981–90. 10.1007/s00122-019-03332-1 [DOI] [PubMed] [Google Scholar]
  • 10.Wairich A, et al. Chromosomal introgressions from Oryza meridionalis into domesticated rice Oryza sativa result in iron tolerance. J Exp Bot. 2021;72:2242–59. 10.1093/jxb/eraa461 [DOI] [PubMed] [Google Scholar]
  • 11.Henry RJ. Wild rice research: advancing plant science and food security. Mol Plant. 2022;15:563–5. 10.1016/j.molp.2021.12.006 [DOI] [PubMed] [Google Scholar]
  • 12.Brozynska M, et al. Chloroplast Genome of Novel Rice Germplasm identified in Northern Australia. Trop Plant Biology. 2014;7:111–20. 10.1007/s12042-014-9142-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Hasan S, Furtado A, Henry R. Reticulate Evolution in AA-Genome Wild Rice in Australia. Front Plant Sci. 2022;13. 10.3389/fpls.2022.767635 [DOI] [PMC free article] [PubMed]
  • 14.Ichitani K et al. New Hybrid Spikelet sterility gene found in Interspecific Cross between Oryza sativa and O. Meridionalis. Plants 11 (2022). [DOI] [PMC free article] [PubMed]
  • 15.Furtado A. in Cereal Genomics: Methods and Protocols (eds Robert J. Henry & Agnelo Furtado) 1–5Humana Press, (2014).
  • 16.Cheng H, Concepcion GT, Feng X, Zhang H, Li H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat Methods. 2021;18:170–5. 10.1038/s41592-020-01056-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012;9:357–9. 10.1038/nmeth.1923 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Zhang H, et al. Fast alignment and preprocessing of chromatin profiles with Chromap. Nat Commun. 2021;12:6566. 10.1038/s41467-021-26865-w [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Li H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv: Genomics (2013).
  • 20.Ghurye J, Pop M, Koren S, Bickhart D, Chin C-S. Scaffolding of long read assemblies using long range contact information. BMC Genomics. 2017;18:527. 10.1186/s12864-017-3879-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Zhou C, McCarthy SA, Durbin R. YaHS: yet another Hi-C scaffolding tool. Bioinformatics. 2023;39:btac808. 10.1093/bioinformatics/btac808 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015;31:3210–2. 10.1093/bioinformatics/btv351 [DOI] [PubMed] [Google Scholar]
  • 23.Gurevich A, Saveliev V, Vyahhi N, Tesler G. QUAST: quality assessment tool for genome assemblies. Bioinformatics. 2013;29:1072–5. 10.1093/bioinformatics/btt086 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Cabanettes F, Klopp C. D-GENIES: dot plot large genomes in an interactive, efficient and simple way. PeerJ. 2018;6:e4958. 10.7717/peerj.4958 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Vurture GW, et al. GenomeScope: fast reference-free genome profiling from short reads. Bioinformatics. 2017;33:2202–4. 10.1093/bioinformatics/btx153 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Manekar SC, Sathe SR. A benchmark study of k-mer counting methods for high-throughput sequencing. Gigascience. 2018;7. 10.1093/gigascience/giy125 [DOI] [PMC free article] [PubMed]
  • 27.Flynn JM et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proceedings of the National Academy of Sciences 117, 9451–9457. 10.1073/pnas.1921046117 (2020). [DOI] [PMC free article] [PubMed]
  • 28.Tarailo-Graovac M, Chen N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr Protocols Bioinf. 2009;25. 10.1002/0471250953.bi0410s25 [DOI] [PubMed]
  • 29.Hasan S, Furtado A, Henry R. Gene expression in the developing seed of Wild and Domesticated Rice. Int J Mol Sci. 2022;23. 10.3390/ijms232113351 [DOI] [PMC free article] [PubMed]
  • 30.Kim D, Paggi JM, Park C, Bennett C, Salzberg SL. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat Biotechnol. 2019;37:907–15. 10.1038/s41587-019-0201-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Goel M, Sun H, Jiao W-B, Schneeberger K. SyRI: finding genomic rearrangements and local sequence differences from whole-genome assemblies. Genome Biol. 2019;20:277. 10.1186/s13059-019-1911-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Goel M, Schneeberger K. Plotsr: visualizing structural similarities and rearrangements between multiple genomes. Bioinformatics. 2022;38:2922–6. 10.1093/bioinformatics/btac196 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Tang H, et al. Synteny and collinearity in plant genomes. Science. 2008;320:486–8. 10.1126/science.1153917 [DOI] [PubMed] [Google Scholar]
  • 34.Abdullah M, Furtado A, Masouleh AK, Okemo P, Henry RJ. An improved haplotype resolved genome reveals more rice genes. Trop Plants. 2024;3. 10.48130/tp-0024-0007
  • 35.Durand NC, et al. Juicebox provides a visualization system for Hi-C contact maps with unlimited zoom. Cell Syst. 2016;3:99–101. 10.1016/j.cels.2015.07.012 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Zhou Y, et al. Pan-genome inversion index reveals evolutionary insights into the subpopulation structure of Asian rice. Nat Commun. 2023;14:1567. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Xie X, et al. A chromosome-level genome assembly of the wild rice Oryza rufipogon facilitates tracing the origins of Asian cultivated rice. Sci China Life Sci. 2021;64:282–93. [DOI] [PubMed] [Google Scholar]
  • 38.Sang T, Ge S. The puzzle of Rice Domestication. J Integr Plant Biol. 2007;49:760–8. 10.1111/j.1744-7909.2007.00510.x [Google Scholar]
  • 39.Choi JY, et al. The Rice Paradox: multiple origins but single domestication in Asian Rice. Mol Biol Evol. 2017;34:969–79. 10.1093/molbev/msx049 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Hasan S, Furtado A, Henry R. Diversity of Domestication Loci in Wild Rice Populations. Proceedings 36 (2019). [DOI] [PMC free article] [PubMed]
  • 41.Zhang F, et al. Genome-wide analysis of Dongxiang wild rice (Oryza rufipogon Griff.) To investigate lost/acquired genes during rice domestication. BMC Plant Biol. 2016;16:103. 10.1186/s12870-016-0788-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Vigueira CC, et al. Call of the wild rice: Oryza rufipogon shapes weedy rice evolution in Southeast Asia. Evol Appl. 2019;12:93–104. 10.1111/eva.12581 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Arbelaez JD, et al. Development and GBS-genotyping of introgression lines (ILs) using two wild species of rice, O. Meridionalis and O. Rufipogon, in a common recurrent parent, O. sativa cv. Curinga. Mol Breeding. 2015;35:81. 10.1007/s11032-015-0276-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Nabholz B, et al. Transcriptome population genomics reveals severe bottleneck and domestication cost in the African rice (ryza glaberrima). Mol Ecol. 2014;23:2210–27. 10.1111/mec.12738 [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

12864_2025_11246_MOESM1_ESM.pdf (1.1MB, pdf)

Supplementary Table S1-14: Provide detailed data and analyses supporting the main findings of this study, including genomic features, assembly statistics, variant calling results, and comparative analyses across Australian Oryza rufipogon and Oryza meridionalis assemblies and haplotypes. Figure S1 Workflow of HiFi + HiC Contig Assembly and Scaffolding Pipelines. Figure S2 Alignment Dotplot of Australian O. rufipogon Scaffolds against O. sativa Nipponbare Chromosomes. Figure S3 High-resolution Hi-C contact maps displaying the architecture of the wild rice genomes. Panels (a) and (b) present the contact maps for the 12 pseudochromosomes of Australian O. rufipogon with bin sizes of 1000 Kb and 500 Kb, respectively, indicating chromosomal interactions. Panels (c) and (d) depict similar maps for O. meridionalis, illustrating the consistency and variation in chromosomal contact points across both taxa. Figure S4 Alignment Dotplot of Australian O. rufipogon-collapsed Assembly against O. Australian O. rufipogon-Hap1. Figure S5 Alignment Dotplot of Australian O. rufipogon-collapsed Assembly against O. Australian O. rufipogon-Hap2. Figure S6 Alignment Dotplot of O. meridionalis-collapsed Assembly against O. sativa Nipponbare Chromosomes. Figure S7 Alignment Dotplot of O. meridionalis-Collapse Assembly against O. meridionalis-Hap1. Figure S8 Alignment Dotplot of O. meridionalis-Collapse Assembly against O. meridionalis-Hap2

Data Availability Statement

The whole genome sequence data reported in this paper have been deposited in NCBI and the Genome Warehouse at the National Genomics Data Centre (NGDC), Beijing Institute of Genomics, Chinese Academy of Sciences. The data are associated with the following submissions: NCBI submission IDs SUB14650113 (Australian Oryza rufipogon Collapsed Assembly and Haplotypes) and SUB14659345 (Oryza meridionalis Collapsed Assembly and Haplotypes); NGDC Bio Projects PRJCA029102 (Australian Oryza rufipogon Collapsed Assembly and Haplotypes) and PRJCA029129 (Oryza meridionalis Collapsed Assembly and Haplotypes). These datasets are publicly accessible through the respective links provided by NCBI and NGDC.


Articles from BMC Genomics are provided here courtesy of BMC

RESOURCES