Abstract
Y900 is one of the top hybrid rice (Oryza sativa) varieties, with its yield exceeding 15 t·hm−2. To dissect the mechanism of heterosis, we sequenced the male parent line R900 and female parent line Y58S using long-read and Hi-C technology. High-quality reference genomes of 396.41 Mb and 398.24 Mb were obtained for R900 and Y58S, respectively. Genome-wide variations between the parents were systematically identified, including 1,367,758 single-nucleotide polymorphisms, 299,149 insertions/deletions, and 4,757 structural variations. The level of variation between Y58S and R900 was the lowest among the comparisons of Y58S with other rice genomes. More than 75% of genes exhibited variation between the two parents. Compared with other two-line hybrids sharing the same female parent, the portion of Geng/japonica (GJ)-type genetic components from different male parents increased with yield increasing in their corresponding hybrids. Transcriptome analysis revealed that the partial dominance effect was the main genetic effect that constituted the heterosis of Y900. In the hybrid, both alleles from the two parents were expressed, and their expression patterns were dynamically regulated in different tissues. The cis-regulation was dominant for young panicle tissues, while trans-regulation was more common in leaf tissues. Overdominance was surprisingly prevalent in stems and more likely regulated by the trans-regulation mechanism. Additionally, R900 contained many excellent GJ haplotypes, such as NARROW LEAF1, Oryza sativa SQUAMOSA PROMOTER BINDING PROTEIN-LIKE13, and Grain number, plant height, and heading date8, making it a good complement to Y58S. The fine-tuned mechanism of heterosis involves genome-wide variation, GJ introgression, key functional genes, and dynamic gene/allele expression and regulation pattern changes in different tissues and growth stages.
The mechanism of super-hybrid rice heterosis involves genome-wide variations, Geng/Japonica introgression, key gene haplotypes, and strong spatiotemporal dynamics of allelic expression and regulation.
Introduction
Since the large-scale promotion and application of three-line hybrid rice (Oryza sativa) in 1976, the planting area of hybrid rice has accounted for approximately 50% of the total rice area each year, and the promotion of two-line super-hybrid rice has further increased rice yields (Cheng et al. 2007; Wu et al. 2016). Y Liangyou 1 (Y1), Y Liangyou 2 (Y2), and Y Liangyou 900 (Y900) are representative phase 2, 3, and 4 cultivars of high-yielding super-hybrid rice, respectively, and their yield potential reaches 12 t·hm−2, 13.5 t·hm−2, and 15 t·hm−2. In particular, Y900 has all the characteristic traits of super-high-yielding rice cultivars and is considered as the culmination of 40 yr of careful crossbreeding (Cheung 2014). Although the application of hybrid rice has achieved great success, there is yet no widely accepted explanation for the molecular mechanism of heterosis.
The molecular mechanism of heterosis has been studied in two main directions: variations in gene sequence and expression (Liu et al. 2022). High-quality genome sequences can help identify differences at the genomic level between hybrid parents. In recent years, with advancements in long-read sequencing technology, the genomes of many elite rice lines have been assembled, and the characteristics of their genome variations, including single-nucleotide polymorphisms (SNPs), insertions/deletions (Indels), structural variations (SVs), and copy number variations (CNVs), have been identified. Based on population genomic variation, a series of important heterosis-related sites, such as Heading date 3a, Tiller Angle Control1 (TAC1), and Days to Heading 8, have been identified through quantitative trait locus mapping or genome-wide association studies (Zhou et al. 2012; Huang et al. 2015, 2016; Li et al. 2016; Chen et al. 2020; Lin et al. 2020). Wei et al. (2021) constructed a quantitative trait nucleotide (QTN) map of rice and summarized the most comprehensive, key functional variations in quantitative trait genes, and their roles in heterosis. These studies have explained the selection, fixation, and integration of dominant alleles in the process of rice hybrid breeding and provided an important genetic basis for the analysis of the molecular mechanism of heterosis. However, there are few studies on the effects of these genomic variations, especially QTNs, on traits in the homozygous and heterozygous states and the effects of different combinations of QTNs that affect the same trait.
In general, compared with parental gene expression, hybrid gene expression can be divided into additive and nonadditive types (Liu et al. 2022). Most previous studies have focused on the importance of nonadditive effects in heterosis (Ni et al. 2009; Yang et al. 2021; Liu et al. 2022). In terms of gene function, hybrids can achieve a stronger growth advantage than the two parents by combining the advantages of the two parents in two biological pathways, cell division and photosynthesis, which are critical to plant growth and development (Liu et al. 2021). Gene expression is influenced by genomic cis-acting elements and trans-acting factors. Different epigenomic factors also affect gene expression (Ma et al. 2021). The combination of the genomic and epigenetic regulation of the hybrids compared to that in the parents may lead to transcription- and protein-level differences in different tissues between hybrids and parents. Allele-specific expression (ASE) in hybrid rice suggests that hybrids can selectively express beneficial alleles under specific conditions (Shao et al. 2019; Fu et al. 2022). Heterosis runs through almost the entire life cycle of rice, and its performance is usually tissue-dependent or temporally dynamic (Liu et al. 2022). Therefore, the exploration of the molecular mechanism of heterosis should consider spatiotemporal specificity.
There have been many studies on the mechanism of heterosis in hybrid rice in the areas of genome assembly, variation mining, heterotic site identification, and gene expression and regulation (Wei et al. 2009; Zhai et al. 2013; Huang et al. 2016; Zhang et al. 2016; Shao et al. 2019; Fu et al. 2022). However, the molecular mechanism of heterosis formation has not been comprehensively investigated in both dimensions of genomic variation and gene expression. In this study, we assembled the genomes of the two parents of Y900, and systematically identified variations at the whole-genome level between them. We also investigated the relationship between cis-/trans-regulation and heterosis through comparison of the differentially expressed genes (DEGs) in four tissues. Finally, the effects of several important functional genes were verified through F2 population or gene editing. This study provides insights for the systematic interpretation of the basis of heterosis formation in rice.
Results
Phenotype features of super-hybrid rice Y900 and related lines
As a top hybrid rice variety exhibiting superior heterosis, Y900 outperformed its two parents (over-both-parents heterosis, OBPH) in many agronomy traits, including total grain number per plant (GNPP), grain yield per plant (GYPP), panicle length (PL), grain length (GL), and plant height (PH). For instance, the GYPP for Y900 was 43.9 g, exceeding the male parent (28.7 g) by approximately 52.96% (Fig. 1, A and B; Supplemental Table S1). Also, the length and diameter of the first node under the panicle showed positive OBPH, providing good lodging resistance and transportation capability (Fig. 1, A and B). The other representative two-line hybrids Y1 and Y2 showed similar heterosis patterns with Y900 in most traits, i.e. OBPH in GNPP, PL, and PH and heterosis over male or female parent in the effective panicle number (EPN), grain number per panicle (GNPPa), thousand-grain weight (TGW), GYPP, heading date (HD), and flag leaf width (FLW). Consistent with the classical two-line hybrid rice LYP9 (Li et al. 2016), all three hybrids (Y1, Y2, Y900) took full advantage of over-male-parent heterosis (OMPH) in EPN and over-female-parent heterosis (OFPH) in GNPPa (Fig. 1C; Supplemental Table S1), which should be important for yield heterosis of the two-line hybrid rice. Although Y900 was inferior to its male parent in GNPPa, it exhibited the highest utilization rate of OMPH in EPN (73.91%) and OFPH in GNPP (16.17%), leading to the highest OBPH/OMPH in GYPP (female parent had no yield data due to male sterility) comparing with Y1 (33.4 g) and Y2 (38.9 g), respectively (Fig. 1C).
Figure 1.
Phenotypic heterosis in super-hybrid rice Y1, Y2, and Y900. A) Comparison of plant, panicle, leaf, and grain morphology of Y900 and its two parents, R900 and Y58S. B) Heterosis of Y900 OBPH in yield-related traits. C) Comparison of heterosis rates for 13 important agronomic traits in Y1, Y2, and Y900. OMPH, OFPH, and OMiPH represent over-male-parent heterosis, over-female-parent heterosis, and over-mid-parent heterosis, respectively. D) Comparison of yield factors for R9311, YH2, and R900. Ten plants of each variety were compared for traits including EPN, GNPPa, GNPP, TGW, GL, grain width (GW), HD, PL, PH, FLW, second leaf length (SLL), seed setting rate (SSR), GYPP, height of stem under panicle (HSP) and diameter of stem under panicle (DSP). Different letters above the SD error bars in (B) and (D) represent significant differences by ANOVA (P < 0.05).
As Y1, Y2, and Y900 share the same female parent (Y58S), differences among the three hybrids were caused by their different male parents. We compared R900 with R9311 and Yuanhui 2 (YH2), the male parents for Y900, Y1, and Y2, respectively. R900 was the most compact with shorter PH and larger FLW (Fig. 1A; Supplemental Table S1). Although R900 was slightly inferior in EPN, GNPP, and GYPP, the GNPPa of R900 was much higher than those of R9311 and YH2 (Fig. 1D). In addition, the HD of R900 was the longest, nearly 12 d longer than that of R9311 (Supplemental Table S1). These characteristics provide more time for the growth and development of hybrid rice Y900 and lay the foundation for lodging resistance and high fertilizer tolerance, which are critical to super-high yield. Furthermore, R900 possessed complementary traits of yield factors to those of Y58S (smaller GNPPa, earlier HD, and more EPN), resulting in strong heterosis with a higher yield for hybrid rice Y900 than that of Y1 and Y2 (Fig. 1, A and C; Supplemental Table S1).
Assembly and comparison of the high-quality genome references of the parent lines
To investigate the molecular mechanism of heterosis, the genomes of R900 and Y58S were assembled using ∼110 Gb and ∼101 Gb long reads generated by PacBio Sequel II (Supplemental Fig. S1; Supplemental Table S2). After polishing with ∼85 Gb and ∼75 Gb short reads (Supplemental Table S3), we obtained assembled sequences of ∼396 Mb and ∼398 Mb in size, with contig N50s of ∼16 Mb and ∼17 Mb, respectively. Ultimately, ∼98% (389.04 Mb and 389.54 Mb) of the sequences were anchored onto 12 chromosomes using ∼45 Gb and ∼46 Gb of Hi-C sequencing data (Supplemental Table S3), with 42 and 35 gaps, respectively (Fig. 2A; Supplemental Fig. S2; Table 1; Supplemental Table S4). Over 99% of the next-generation sequencing (NGS) reads could be properly aligned to both genomes (Supplemental Table S5), and the Benchmarking Universal Single-copy Orthologs (BUSCO) were 98.70% and 98.60%, the highest among the known important rice reference genomes (Supplemental Fig. S3). In addition, the high long-terminal repeat assembly index (LAI) (>23) and the very strong collinearity of the two genomes aligning with other genomes such as Nipponbare (NIP), R498, and R9311 indicate that the quality of both genomes is excellent (Table 1; Supplemental Fig. S4; Supplemental Table S6). After masking ∼54% of the repetitive sequences and removing transposable element (TE)-related genes, we predicted 40,417 and 41,583 protein-coding genes and 667 to 1,988 noncoding RNA genes in the R900 and Y58S genomes, respectively. A total of 89.81% and 87.42% of the protein sequences, respectively, were annotated in the functional database (Supplemental Tables S7–S10).
Figure 2.
Y900 biparental genomic features and variations. A) Genome circos map. I: distribution of gaps in Y58S and R900; SNP and Indel densities in Y58S (II) and R900 (III); IV: Regions of variations between the XI and GJ parents; V: Gene density in Y58S; VI: Density distribution ratio of the cloned genes of parental variations (2,790) to all variant genes (36,477); VII: Density distribution ratio of parental DEGs (13,490) to all expressed genes (36,393); VIII–X: Densities of genes with effects of PDO, DO, and ODO; XI–XIII: Densities of genes with cis, trans, and both cis and trans expression regulation patterns. Color shades represent the magnitude of the distribution density. B) Identification and Sanger sequencing verification of two Indels of Hd1. C) presence absence variations (PAVs) of the cloned genes in important rice reference genomes. D) CNV identification, Sanger sequencing, and expression by RT-qPCR in different tissues of OsWRKY89. Ten plants of each variety in four tissues, including stems (S), flag leaves (L), 5 mm (P5), and 10 mm (P10) length young panicles, were compared and different letters above the SD error bars represent significant differences by one-way ANOVA (P < 0.05).
Table 1.
Summary of the assembly and annotation features of R900 and Y58S
R900 | Y58S | |
---|---|---|
Genome coverage (x) | 273.98 | 251.95 |
Assembly length (Mb) | 396.41 | 398.24 |
Chromosome anchoring rate (%) | 98.14 | 97.82 |
Number of contigs | 230 | 177 |
Contig N50 (Mb) | 15.95 | 16.59 |
Largest contig (Mb) | 34.76 | 33.28 |
Scaffold N50 (Mb) | 31.86 | 31.81 |
Number of gaps | 42 | 35 |
Genome BUSCO (%) | 98.70 | 98.60 |
LAI score | 23.73 | 23.90 |
Mapping rate (%) | 99.83 | 99.89 |
GC content (%) | 43.71 | 43.71 |
Repeat content (%) | 54.27 | 54.82 |
Number of non-TE genes | 40, 417 | 41, 583 |
Protein BUSCO (%) | 95.40 | 95.60 |
A total of 1,367,758 SNPs and 299,149 Indels (149,604 insertions and 149,545 deletions) were identified between the R900 and Y58S genomes, which were distributed in similar densities pattern cross the genome (Fig. 2A). The number of variants between R900 and Y58S was minimal compared to other assembled rice genome (Supplemental Figs. S5 and S6; Supplemental Table S11). At least 36,477 genes (76.49%) had sequence variation between R900 and Y58S, including 2,790 cloned genes (75.32%) from the funricegenes database (https://funricegenes.github.io/) (Supplemental Data S1). Interestingly, the distribution pattern of these varied cloned genes was not always consistent with the genome-wide distribution pattern of SNP/Indel. An example can be found on the long arm of Chr6, where many cloned genes showed variation between the two parents, but the SNP/Indel density was relatively low (Fig. 2A). More importantly, among 342 QTNs of 253 genes (Wei et al. 2021), 59 QTNs (17.25%) of 47 genes (18.58%) differed between the two parents of Y900, including Hd1 and NARROW LEAF1 (NAL1) (Supplemental Data S2; Supplemental Figs. S7 and S8). For Hd1, there was a 43-bp deletion in the first exon and a 4-bp insertion (AAAG) in the second exon of the R900 allele relative to the Y58S allele. Both Indels will result in frame-shift mutations, lead to a nonfunctional Hd1 protein and thus cause delays in flowering in R900. We verified the variation between the two Hd1 alleles using sanger sequencing technology (Fig. 2B).
Similarly, only 4,757 SVs were identified between R900 and Y58S genomes (Supplemental Fig. S9; Supplemental Table S11). Although no megabase-level structure variations were found, some important functional genes were identified in presence absence variations (PAVs), and more PAVs were located upstream of 2 kb, introns and exons of the gene (Fig. 2C; Supplemental Figs. S10–S12). Ghd7 is completely absent in the sterile lines Y58S, ZS97, and TF, while it is present in many restorers such as R900, indicating that the complete deletion haplotype of Ghd7 plays an important role in the two-line hybrid rice selection system. Six hundred and eight seven CNVs with 321 copy gains and 366 copy losses were identified (Supplemental Table S11; Supplemental Data S3). One of the CNVs on the short arm of chromosome 11 (Y58S Chr11: 861, 840 to 869, 492) was identified, and OsWRKY89 (OsY58Sg08701) within this fragment is a transcription factor that affects early plant growth and development (Wang et al. 2007). There were two copies separated by 3 nt in Y58S but only one copy in R900. We verified the CNV event by sanger sequencing (Fig. 2D). Further analysis revealed two copies of this gene in Y58S and Lemont but only one copy in R900 and the sterile line PA64S. Because Lemont and PA64S are parents in the Y58S breeding lineage, the copies likely originated from Lemont. Gene expression analysis by RT-qPCR showed that OsWRKY89 was much more abundant in flag leaves (L) than in other tissues, and the expression level in Y58S leaf was significantly higher than in R900, and Y900 in between, indicating that both copies were functional in Y58S and showed an additive effect in the hybrid (Fig. 2D).
Geng/japonica introgression and functional genes/QTNs in Y900
After constructing Xian/Indica (XI)–Geng/japonica (GJ) genetic compositions and introgression bin maps, we found that some hotspot regions of GJ introgression were largely coordinated with the hotspot regions of sequence variation among their genomes, such as Chr5: 17.9 to 27.3 Mb and Chr7: 22.3 to 27.1 Mb in Y58S and Chr1: 29.2 to 34.4 Mb and Chr9: 17.7 to 21.2 Mb in R900 (Figs. 2A and 3A; Supplemental Fig. S5). The portions of XI components (XI-ind + XI-aus + XI-admix) within Y58S, R900, YH2, and R9311 genome were all above 60%, and the portions of GJ components (GJ-admix + GJ-tmp + GJ-trp) were ∼18.37%, ∼14.31%, ∼3.40%, and ∼2.11%, respectively (Fig. 3B). Notably, the total size of GJ introgression fragments was the largest in Y58S (∼68.7 Mb) compared with the other three genomes, presumably due to the presence of GJ lines Lemont and Paddy in its breeding lineage. For R9311, YH2, and R900, the sizes of GJ introgression fragments were increasing from ∼7.9 Mb, ∼12.7 Mb, to ∼53.5 Mb, respectively, coordinating with the levels of hybrid heterosis utilization from phase 2 to phase 4 super-rice (Fig. 3B).
Figure 3.
Genetic composition and introgression of super-hybrid rice Y1, Y2, and Y900 parents. The relationship between hybrid and parents are as follows: Y1 (Y58S/R9311), Y2 (Y58S/YH2), and Y900 (Y58S/R900). A) Genetic composition distribution of Y58S, R900, YH2, and R9311. The short horizontal line to the left of each chromosome shows the locations of 3,704 cloned rice genes; from left to right, the bin maps show Y58S, R900, YH2, and R9311. GJ-tmp, GJ-trp, GJ-admix represent temperate, tropical, and mixed japonica components, respectively; XI-aus, XI-ind, and XI-admix represent aus, indica, and mixed indica components, respectively; Admix represents indica–japonica mixed components; none represents unidentifiable. B) Proportions of genetic components for the four genomes. C) Top 10 categories of genetic components differentiation for clone genes among the four genomes. The first row means there were 127 genes showing GJ-admix component in Y58S and XI-ind component in all other three genomes.
Of the 3,704 rice cloned genes in the funricegenes database, 3,469 (∼94%) could be assigned to the bin maps, and Y58S (729) and R900 (626) had far more cloned genes falling into the GJ regions than YH2 (103) and R9311 (118) (Fig. 3A). Among the top 10 categories of genetic components differentiation for the cloned genes, the first six categories (437 genes in total) were either Y58S or R900 showing GJ components and other genomes showed XI components. There were also 38 genes showing GJ components in both Y58S and R900 genomes, but XI components in both R9311 and YH2 (Fig. 3C; Supplemental Data S4).
We also investigated the functional QTNs falling into the GJ regions (Wei et al. 2021). There were 51 QTNs for 38 genes located in the GJ region within the Y58S genome (Fig. 3A), among which only ALKALI digestion and DOPPELGANGER2 displayed XI genotypes. For R900, the numbers of QTNs and genes in GJ regions were 61 and 40, with only four loci showing inconsistency with NIP alleles, much higher than YH2 (22 QTNs for 13 genes) and R9311 (17 QTNs for 8 genes) (Supplemental Data S2). In particular, only R900 had the QTNs for 11 genes, including OsSPL13, NOG1, D61, basic leucine zipper73 (bZIP73), TAC1, and DTH2, located in GJ regions (Fig. 3A). The high proportion of GJ introgression, more functional genes and some critical QTNs of the GJ genotype within the R900 (and Y58S) genome laid a strong genome foundation for the strong heterosis of Y900.
Transcriptome analysis of Y900 and its parents in four tissues
The transcriptomes of four tissues of Y900 and its parents (Y58S and R900), including stem (S), flag leaf (L), 5 mm (P5), and 10 mm length young panicle (P10), were analyzed, and the data showed good reproducibility (Supplemental Fig. S13). A total of 36,393 genes were expressed (Supplemental Data S5), of which 25% (9,061) were of low expression (Transcript per million [TPM] ≤ 1). Approximately 70% (25,689) of the genes were expressed in all four tissues, and 1.5% (536) were tissue specific. We identified 2,139 variety-specific genes and found that the number of hybrid-specific genes expressed in flag leaves was the highest (196) (Supplemental Fig. S14; Supplemental Data S6). Analysis of DEGs revealed that approximately 37% (13,490) of the genes differed among the three varieties. The number of DEGs in the young panicle was substantially lower than in the stem and leaf, and the number of DEGs between parents in flag leaves was as high as 7,180 (53%) (Fig. 4A). In flag leaves, P5, and P10, the DEGs between the hybrid and parents were fewer than those between parents. However, in stems, we unexpectedly found that the DEGs between the hybrid and either parent were 40% more abundant than those between parents (2,475), suggesting that over-parent heterosis is more likely to occur in the stems of the hybrid.
Figure 4.
Multi-tissues transcriptome analysis of Y900 and its parents. A) DEGs of Y900 and its parents in four tissues, including stems (S), flag leaves (L), 5 mm (P5), and 10 mm (P10) length young panicles. B) Division of the gene expression modes. There are 12, 5, and 3 modes in Mode I (P1–P12), Mode II (B2P: between the two parents, CHP: close to the higher parent, CLP: close-to-lower-parent, H2P: higher-than-2-parents, L2P: lower than both parents), and Mode III (PDO: partial dominant, DO: dominant, ODO: overdominant), respectively. C) KEGG pathway enrichment analysis of Mode II genes. The line labels Level 1 and Level 2 represent two KEGG levels, and color shades indicate the degree of enrichment (P adjusted < 0.05).
The proportion of DEGs shared in common by the three comparison groups was only 5.8% to 12.7% (Supplemental Fig. S15), indicating that the gene expression patterns in different tissues of the three varieties were quite different. By further analyzing the differences in the expression levels of these genes between the hybrid and their parents, we found that the expression levels of 1,214 genes in the hybrid were substantially higher or lower than those of the parents (overdominant [ODO] mode), the expression levels of 6,943 genes in the hybrid were similar to one parent but substantially higher than those in the other parent (dominance [DO] mode), and the expression levels of 8,185 genes in the hybrid were between the two parental levels (partial dominance [PDO] mode) (Fig. 4B; Supplemental Data S7). Thus, the contribution of ODO to heterosis may not be as big as the additive and dominance effects. This result differs from the previous view that the genetic pattern of hybrids is mainly ODO, probably due to the use of different methods and varieties (Chen et al. 2018; Fu et al. 2022). The distributions of the DEGs and genes with different expression models were largely consistent with the distribution of SNPs and/or Indels (Fig. 2A). The gene expression patterns were substantially different in different tissues. Although there were many DEGs in the flag leaves, most of them were in the PDO and DO modes, and only 0.5% (57) were in the ODO mode. In the young panicles at the two different stages, the proportion of the ODO mode increased to 0.8% and 7%, respectively. The proportion of ODO in stems was surprisingly as high as 28%, even higher than the proportion of PDO. The high proportion of ODO genes could at least partially explain the OBPH phenotypes in the length and diameters of stems in Y900 described in the previous section (Fig. 1, A and B).
Similar to the report that Arabidopsis (Arabidopsis thaliana) hybrids achieved growth heterosis by integrating their respective strengths in photosynthesis and cell division (Liu et al. 2021), Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis here revealed that many DEGs in stems and young panicles were enriched in photosynthesis-related pathways, mainly showing additive effects or dominance effects. In P10, the enriched photosynthesis genes showed the negative over-dominance effect. In flag leaves, however, fewer DEGs were found to be enriched in photosynthesis pathways, but many were enriched in cell division or material metabolism pathways (Fig. 4C). Interestingly, the function of genes with close-to-lower-parent (CLP)/dominant patterns were enriched in starch and sucrose metabolism cross all four tissues, indicating that Y900 inherits the level of energy consumption in the whole plant from the lower parent. The only enriched function category for higher-than-2-parents (H2P) genes in stems was plant hormone signal transduction. One functional gene, Oryza sativa cytokinin response regulator2, is a good example of this category, which affects PH through the regulation of plant hormones (Shi et al. 2020).
ASE in the hybrid Y900
To confirm that the alleles of DO and ODO gene expression in hybrids were mainly from either the male or female parent, 7,562 allele-specific genes with differences between the parental alleles were identified (SNPs were present in the transcript). Analysis of the expression in the four tissues revealed a total of 3,048 (40%) allele-specific expressed genes (ASEGs), and the ratio of genes that leaned toward the male parent to those toward the female parent was close to 1:1 (Fig. 5A). We identified several potentially imprinted genes with only one allele expressed in F1, where the genes in P5 were adjacent to each other and only the Y58S allele was expressed (Supplemental Table S12). Therefore, in most cases, the two alleles from the parents were expressed simultaneously in the hybrids, 40% of the genes in some tissues were biased toward one of the parents, and very few genes showed complete dominant or recessive expression.
Figure 5.
ASE in Y900. A) Bias of ASEGs in different tissues, including stems (S), flag leaves (L), 5 mm (P5), and 10 mm (P10) length young panicles. B) Relationship between ASE in F1 and the expression in parents. The figures show the linear regression equations and corrected coefficients of determination R2 for each tissue. C) Relationships between the numbers of SNPs, Indels, and ASEGs. The lines in each box from bottom to top are the minimum, first quartile, median, mean (dashed line), third quartile, and maximum. Different letters above the bars represent significant differences by one-way ANOVA (Duncan's multiple range test, P < 0.05). The number in parentheses on the horizontal axis is the sample size.
Further study revealed that the expression of ASEGs in different tissues changed dynamically. Among the four tissues, the number of ASEGs was the highest in young panicle P10 (1,763), accounting for 27% of the ASEGs in this tissue and approximately 20% in the other three tissues. The fold difference in the allele expression in tissues of young panicles P5 and P10 was smaller than that in stems and flag leaves (Supplemental Fig. S16), a trend that was consistent with the differential expression of related genes in both parents, indicating that both copies from both parents could be fully expressed in the tissues of actively growing hybrids. With the differentiation of tissues, the bias of ASEGs also changed. Only 122 genes had stable bias cross all tissues, and the ratio of male-parent bias R900 (61) to female-parent bias Y58S (61) was exactly 1:1 (Supplemental Fig. S17). There were 2,420 genes with dynamic bias, including 115 genes with opposite biases in two tissues and 2,305 genes with a certain bias in some tissues but no bias in other tissues (Supplemental Data S8), indicating that similar to the three-line hybrid rice (Shao et al. 2019), there are far more dynamic ASEGs than stable ASEGs in two-line hybrid rice.
We compared the biases of ASEGs with the differences in gene expression between the parents and found that most ASEGs (74% to 89%) had a good correlation with the differences in gene expression between the parents (Fig. 5B). The correlation was the highest in P5 and the lowest in flag leaves. The data demonstrated that the differences between the parents were relatively well inherited in the hybrids, which is the basis for heterosis formation, and these genes may be subjected to relatively strong cis-regulation, especially in young tissues; In mature tissues, trans-regulation clearly played greater roles. We further compared the variations in the vicinity of alleles between the parents and found that ASEGs had significantly more variations than non-ASEGs, regardless of the type (SNPs or Indels) or position (promoter, coding or downstream) (Fig. 5C), differing from the case in aquatic plant sacred lotus (Nelumbo nucifera), where SNP density in the promoter region of ASEGs was higher than that of non-ASEGs (Gao et al. 2021).
Regulation of gene expression pattern in the hybrid
To further investigate the regulation of the expression of 7,562 allele-specific genes, we compared the differences in the expression of these genes between parents (A) and the differences in the expression of alleles from the male and female parents in hybrids (B) and divided the expression regulation patterns of these genes into seven categories (I–VII) as previously described (Bao et al. 2019) (Fig. 6A; Supplemental Data S9). Overall, the expression of most alleles in young panicles did not differ between parents or between the two copies in hybrids (conserved, 51.48% to 61.74%) or was difficult to determine (ambiguous, 22.08% to 30.72%), and most of the remaining alleles were cis-regulated (cis only: 12.61% to 14.25%). However, in stems and flag leaves, the number of conserved genes was substantially reduced, and the number of cis-regulated genes was also reduced to approximately 8% to 9%, but the number of trans-regulated genes was substantially increased. Especially in flag leaves, the gene expression regulatory mechanism was very different. Approximately 29.74% of the genes were only trans-regulated, which was even higher than the proportion of the conserved genes. The expression levels of many expressed genes in flag leaves differed substantially between the parents, but the expression levels of the two copies in the hybrids were almost the same (Fig. 5B; Supplemental Fig. S18). The number of genes regulated by categories III–V were relatively small and mainly concentrated in stems and flag leaves (Fig. 6A). In this study, the proportion of trans-only regulation was substantially higher than that in a study of cotton (Gossypium hirsutum) (0.9% to 1.1%) (Bao et al. 2019), which may be due to specific tissues used in cotton (ovules at 10 and 20 d post-anthesis).
Figure 6.
Relationship between cis-/trans-regulation patterns and modes of gene expression. A) Classification and identification of expression regulation patterns of alleles in F1 and parents. Differential gene expression between the parents reflects cis- and trans-acting effects (A), while differential expression of alleles from the two parents in F1 reflects the cis effect (B). The trans effect cannot be directly measured but can be calculated using A − B. Gene expression regulation can be classified into categories I–VII based on the comparisons of A and 0, B and 0, and A and B. The size and color of the circles represent the proportion of the regulation category in a particular tissue, including stems (S), flag leaves (L), 5 mm (P5), and 10 mm (P10) length young panicles. B) Proportion of the contribution of cis-regulation to the expression differences in the parents. The x-axis represents absolute differences in gene expression between the parents, while the y-axis represents the proportion of differential expression caused by the cis effect relative to that caused by the total effect. Error lines indicate the 95% confidence intervals. The table in the upper left corner represents the sample size used for analysis. C) Relationship between modes of gene expression (rows) and cis-/trans-regulation patterns (columns). The residual size calculated by an independent Pearson chi-square test; positive residuals represent enrichment in the gene number, i.e. more genes than expected under an independent null model, and negative residuals represent depletion in the gene number, i.e. fewer genes than expected under an independent null model. Statistical significance was tested by Fisher's exact test, and *, **, and *** represent P values of less than 0.05, 0.01, and 0.001, respectively.
We also found that the gene regulation pattern changed dynamically in different tissues. In the four tissues, 3,421 genes were identified in seven categories. Except for the ambiguous category, only 201 genes had the constitutive regulatory pattern, which were mainly the cis-only and conserved categories (Supplemental Fig. S19). Most genes adopted different regulation patterns in different tissues, and the expression differences of trans-regulated genes in different tissues were more obvious, consistent with the dynamic changes in the gene expression regulation in different tissues in maize (Zea mays) (Zhou et al. 2019). The number of SNPs and Indels in genes regulated by cis-only was significantly higher than that in genes regulated by trans-only or no-divergence genes (including compensatory and conserved). No significant difference was found between cis and trans (including enhancing and compensating) in the upstream and downstream regions (Supplemental Fig. S20).
When the difference in allele expression between parents was not substantial (0- to 2-fold), the contribution of cis-regulation to gene expression in each tissue was 40% to 50%. When the difference was between 2- and 16-fold, the greater the difference in the allele expression in young panicles, the greater the role of cis-regulation, and the highest contribution was found in P5, up to 70% (a difference of 8- to 16-fold). In flag leaves, the contribution of cis-regulation gradually decreases to approximately 20% at the lowest (difference of 8- to 16-fold) (Fig. 6B), indicating that trans-regulation makes a greater contribution than cis-regulation in mature tissues. Among the seven regulatory categories, categories I–IV had the greatest impact on the difference between parents, especially cis/trans (Supplemental Fig. S21), similar to the cases in cotton (Bao et al. 2019) and chili pepper (Capsicum frutescens) (Diaz-Valenzuela et al. 2020).
The relationship between gene regulation and heterosis was studied, and the results indicated that when there was a significant difference in the expression of parental alleles (A ≠ 0), there was a significant correlation between the two (Fig. 6C). The proportion of PDO genes regulated in cis-only was the highest (27.85%), as expected. The proportion of DO genes regulated in cis-only (25.87%) was slightly higher than expected, while the proportion of corresponding ODO genes was extremely significantly lower than expected. In contrast, the proportion of DO genes regulated in trans-only (15.22%) was significantly lower than expected, but the proportion of ODO genes regulated in trans-only was extremely significantly higher than expected (1.32%), indicating that in rice heterosis, the mechanism of trans-only regulation plays a key role in the ODO mode (ODO genes), while the contribution of the regulatory mechanism of cis-only to the dominance mode (DO genes) is higher than expected.
Verification of gene functions critical to heterosis
Among all 3,704 cloned functional genes in rice, 1,854 were DEGs between the parents and the hybrid and were divided into the expression patterns of Modes I–III (Supplemental Data S10), 33 genes had expression patterns identified in four tissues (Supplemental Fig. S22). Also, the ASE and gene regulation pattern (categories I–VII) were determined for 1,110 genes (Supplemental Data S11), and both the expression pattern and expression regulation category were clarified for 739 functional genes. The expression and regulatory pattern of these functional genes were catalogued in Supplemental Data S10 and S11, and we hereby verified the functions of several key heterosis genes in Y900 background.
Ghd8/RH8 (RICE HETEROSIS8) was considered as the main heterosis gene of two-line hybrid rice (Huang et al. 2015; Li et al. 2016) and played multiple roles in regulating grain yield, PH, and HD (Yan et al. 2011). Sequence comparison analysis showed that there were many differences in Ghd8 between parents, including 2 Indels and 1 SNP in the coding sequence (CDS) region. Ghd8 from Y58S can only translate 125 amino acids (AAs), while Ghd8 from R900 can translate a protein with a length of 296 AAs. The expression of this gene in the young panicle tissue in Y900 was slightly higher than that of the parents, and it had the highest expression in flag leaves. We obtained two double mutation Ghd8 lines (KO-1 and KO-2) in the R900 background by designing guide RNAs targeting two sites through CRISPR/Cas9 technique. The first mutation will cause an early stop codon and generate truncated/nonfunctional Ghd8 protein. Both lines had at least 7 to 10 d shorter HD and significantly lower GNPP and PH than wild-type R900 (Fig. 7, A–C).
Figure 7.
Genetic validation of Ghd8 and the effect of simulated combinations of important genes. A) Phenotypes of R900 wild type and two mutant types (KO-1 and KO-2). B) Comparison of gene editing sequences and amino acid changes of R900 wild type and two mutant types. C) Comparison of GNPPa, GYPP, PH, and HD (i.e. the days from sowing to seeing the ears) of the three male parents. Ten plants of each line were used for statistical analysis. D) Phenotypes of hybrids after mating R900 wild type and two mutant types (KO-1 and KO-2) with Y58S. E) Comparison of GNPPa, GYPP, PH, and HD of the three hybrids. Ten plants of each line were used for statistical analysis. F) Comparison of the nine different genotypes of Ghd8 and OsSPL13 in terms of HD and GYPP. “Ghd8 (RS) + OsSPL13 (RS)” represents haplotypes of Y900, and “Ghd8 (RS) + OsSPL13 (SS)” represents haplotypes of Y2 and Y1. The three genotypes, RR, RS, and SS, correspond to homozygous R900, heterozygous Y900, and homozygous Y58S, respectively. G) Comparison of the nine different genotypes of LAX1 and NAL1 in terms of GNPP and GYPP. The dark red, light red, and blue box represent haplotypes of Y900, Y2, and Y1, respectively. Different letters above the SD error bars in (C), (E), (F), and (G) represent significant differences between the groups (LSD) using ANOVA (P < 0.05). The numbers after the horizontal axis “n=” in (F) and (G) represent the sample sizes of different genotypes.
We made hybrids (KO-Y900) by crossing both knockout lines with Y58S, and found no significant difference between the KO-Y900 and Y900 in many agronomic traits, such as GNPP, PH, PL, FLW, etc. However, the growth periods were shortened for both KO-Y900 hybrids and GYPP was 34.7 g and 34.3 g for Y58S/KO-1 and Y58S/KO-2, respectively, ∼20% lower than that of Y900 (41.4 g) (Fig. 7, D–E). It was clear that while female parent Y58S compensated for some traits when crossing with nonfunctional Ghd8 male lines, HD and GYPP were difficult to compensate. These results demonstrated that Ghd8 from both parents played critical roles in yield heterosis.
An F2 population of 379 progenies was constructed from Y900 self-crossing for a genetic effect study. Ghd8 and Ghd7 showed the most substantial effect on the delay of HD (Supplemental Data S12). Surprisingly, the grain-type gene OsSPL13 also exhibited effects on HD similar to Ghd7, which was not reported in previous studies (Si et al. 2016). The effects of Ghd8 combining with OsSPL13 (Ghd8 + OsSPL13) were largely consistent with the effects of the Ghd8 + Ghd7 combination, with the former performing slightly better than the latter under double heterozygous (RS + RS) situation. For the Ghd8 + OsSPL13 combination, F2 individuals with RR + RR genotype (R900) and RS + RS genotype (Y900) had significantly longer HD (and higher GYPP) than RS + SS genotype (Y1 and Y2 genotype, based on QTN haplotype comparison), confirming the function of both genes in HD (Fig. 7F).
Using the same strategy, we also verified the effects of LAX PANICLE1 (LAX1), a known heterosis gene in two-line hybrid rice, and NAL1, a gene contributing to higher yield commonly in GJ but rarely in XI rice (Xiao et al. 1995; Huang et al. 2015). QTN-based haplotype analysis revealed the combined genotypes of LAX1 + NAL1 for Y1 as RS + SS, Y2 as SS + RS, and Y900 as RS + RS. F2 individuals with RS + RS genotype (Y900) had the best GNPP and GYPP, consistent with the expectation. No significant difference was found between individuals with genotype SS + RS (Y2) and with RS + SS (Y1) for the two traits (Fig. 7G). Interestingly, the SS + SS genotype (both loci using maternal parent Y58S allele) was ranked as the second best in all nine combinations, and the average values of GNPP and GYPP for individuals with SS + SS genotype reached 2130.4 and 33.4 g, respectively. Hybrids or inbred lines with the Y58S alleles of LAX1 and NAL1 would likely to have good GNPP and GYPP. These results not only confirmed the gene function, but also provided a breeding strategy for fine-tuning agronomy traits and heterosis by selecting gene and haplotype combinations.
Discussion
Using a two-line super-hybrid rice Y900 as the model, we explored the molecular mechanism of heterosis from two directions: (1) genome variation between male and female parents, and (2) differential gene expression among the hybrids and the two parents. With the high-quality genomes of Y58S and R900, we were able to catalogue all variations between the two parents and compare the expression level of maternal and paternal alleles in different tissues in the hybrid. Both alleles expressed simultaneously in the hybrid for almost all genes, and the expression differences between them were largely consistent with the difference between the two parents, but dynamically changed cross different tissues, demonstrating the foundation roles of cis-regulation and additive effect (PDO mode) and important roles of trans-regulation and nonadditive effect (DO and ODO modes) in heterosis.
When comparing the genomes of Y58S with R900 and other published genomes including R9311, we found that the level of variation between Y58S and R900 genomes was lower (and genetic distance was smaller) than the level between Y58S and R9311 genomes, regardless of the variation type (Supplemental Table S11). The yield of Y900, however, was much higher than Y1 (Y58S/R9311). Indeed, the level of variation between Y58S and R900 was the lowest among all other genome comparisons, but the yield of the hybrid was one of the highest in the world. This is somewhat contrary to the typical principle of hybrid breeding: large genetic distance between parents would bring higher heterosis in hybrids (Cheng 2021). It is clear that in addition to genetic distance, the proportion of GJ introgression plays important roles in rice heterosis. Comparing with the high portions (∼18.37%) of GJ components in the Y58S genome, the proportion was 14.31% in R900 and only ∼2.11% in R9311. The comparable level of GJ introgression between Y58S and R900 might reduce the genetic distance but likely provide better coordinated gene function in the hybrid than the case in Y58S and R9311.
Another important factor to heterosis is the role of functional genes, some of which have a direct impact on phenotypes. We compared the known 304 QTNs (out of 342) between R900 and R9311. Among them, 267 QTNs were the same, and 37 QTNs (33 genes, ∼11%) have differences. The specific traits corresponding to the divergent QTNs included Yield components (6), HD (5), Plant Architecture (5), Taste quality (1), Secondary metabolism (6), Biotic Stress (2), Abiotic Stress (5), and Others (7). Many phenotypical differences between R900 and R9311 could be explained by the QTNs. For instance, R900 had smaller tiller and flag leaf angles than R9311, consistent with the functional GJ alleles of D61 and TAC1 in R900 and XI alleles in R9311. The japonica type NAL1 made the leaves wider and darker green in R900. For the grain phenotype, the GJ genotype of Oryza sativa Length of Grain3, GIANT EMBRYO, and OsSPL13 were consistent with the round and short grain in R900, and XI genotype in R9311 showed typical long grain. The GJ alleles of Hd1 and Ghd7 delayed flowering substantially in R900 compared with R9311. Of course, some traits in R900 were not as good as R9311, such as the number of tillers and the yield per plant, there were good complementation from the female parents Y58S providing XI haplotype for these loci, making Y900 a superior hybrid.
The combination of the two parental genomes within F1 hybrids resulted in the prevalence of differential gene expression between hybrids and parents in various tissues (Wei et al. 2009; Shao et al. 2019; Fu et al. 2022). When comparing the gene expression of Y900 with its parents, we found hundreds of genes were expressed only in the hybrids (Supplemental Fig. S14; Supplemental Data S5). This phenomenon most likely originated from the transcriptional regulators of one parent acting on the promoter of the other, and the cross-regulatory mechanism is an important component of heterosis. Indeed, the cross-regulatory mechanism should be commonly applied for almost all genes in hybrid cells, the differential affinity of a transcription factor from one parent to the cis-element of the other would lead to the differential expression level in the hybrid compared with the parents. The hybrid-specific gene expression is likely an extreme case for the cross-regulatory mechanism. The differences in gene expression were found to be small in the early stages of tissue development, as the transcription activity was overall very high. When the tissues were established and became mature, trans-elements played more important roles, resulting in phenotypic differences between varieties.
As there are sequence variations between different alleles, the selection of the reference genome for RNA-seq analysis may affect the results of gene expression, and closely related reference genome should be used rather than distantly related (Slabaugh et al. 2019). In this study, we used Y58S as the reference for RNA-seq analysis in Y58S, R900, and Y900, as the genome of Y58S had the least variation to R900 (compared with other reference genomes such as NIP) and had slightly better quality than R900. Indeed, comparing with NIP as the reference, the overall and unique mapping rates for sequencing reads were 2% to 3% higher when using Y58S or R900 as the reference, and more expressed and DEGs were co-identified between Y58S and R900 than between NIP and any other genome (Supplemental Fig. S23 and Table S13). We further investigated whether the expression of the 6,128 genes within XI–GJ differential regions would be affected by using different reference genomes. As shown in Supplemental Fig. S24, the correlations of the expression value for overall gene and most individual genes using different reference genomes were above 0.98, while some genes, such as bZIP73, did have a relatively smaller but fairly good correlation (0.94). Overall, the selection of Y58S genome (instead of NIP) as the reference for gene expression analysis in this study ensured the accuracy of the results, even for genes in regions with XI–GJ differences.
There were many studies trying to identify causal loci for heterosis in rice and showed that introgression of genetic components between parents and accumulation of superior alleles in hybrids were important for the formation of heterosis (Huang et al. 2015, 2016; Lin et al. 2020). To breed even better hybrids than Y900, we should consider adding more beneficial alleles to either parent and consider the overall performance of the hybrid. Y900 was shown to be susceptible to rice blast (Pyricularia oryzae), bacterial blight (Xanthomonas oryzae), and brown planthopper (Nilaparvata lugens), and the introduction of major haplotypes of Xa23, Xa21, Bph15, Pi1, and Pi2 would likely greatly improve the pest resistance of R900 and Y900 (Wang et al. 2016; He et al. 2022). Recently Oryza sativa receptor for activated C kinase 1A (OsRACK1A) was reported to increase the false smut (Ustilagrnoidea virens) resistance without affecting rice yield (Li et al. 2022), which could be another target to improve in Y900, as the super-large panicle and ultra-high grain density in Y900 made it very easy to cause false smut in the environment of high temperature and humidity. The contamination of heavy metal cadmium in the soil is a problem in Southern China, while the alleles of R900 and Y58S are all high cadmium accumulation types at the loci of Oryza sativa Cadmium1 (OsCd1), Oryza sativa heavy-metal ATPase3 (OsHMA3), and Oryza sativa natural resistance-associated macrophage protein5 (OsNRAMP5). We recently found a rice germplasm (Luohong 3A/4A) with OsNRAMP5 deleted showing low cadmium accumulation (Lv et al. 2020), and are working on bringing the haplotype to the major rice parent lines. Also, it is important to improve the eating quality while maintaining the high yield potential. The introduction of long and narrow grain haplotypes of genes such as GRAIN LENGTH AND WIEIGHT ON CHROMOSOME7 (GLW7) and Grain Width and Weight on chromosome5 (GW5) will improve the appearance quality, and the addition of aroma gene fragrance (fgr) will add fragrance to the cooked rice. With the complete genome sequences and catalogues of genome variation and expression profile, the pyramiding of target haplotypes is now feasible and will become more efficient.
In conclusion, heterosis is a complex phenomenon involving many players. Genome-wide variation between the two parents, GJ introgression, accumulation of beneficial haplotypes for functional genes, and dynamic gene expression and regulation pattern changes in different spatiotemporal tissues all contributed to the overall performance of hybrids.
Materials and methods
Plant materials
Rice (O. sativa) materials included the two-line hybrid rice lines Y1, Y2, and Y900 and the corresponding restorer lines R9311, YH2, and R900 and their common female parent Y58S. The F2 population was obtained from self-crossing of F1, including Y900-F2 (379) and Y2-F2 (259). All the materials were planted, and traits were investigated in Changsha, China (112.93°E, 28.23°N). The leaves of R900 and Y58S were used for long-read, short-read, and Hi-C sequencing. Tissues such as 5 and 10 mm PL, flag leaf at heading, and stem under the panicle were used for RNA sequencing, with 36 samples for each tissue (three individual plants and three biological replicates). The primers used to verify the expression levels of heterosis-related genes are listed in Supplemental Table S14.
Cas9 series PMT knockout vectors and sgRNA expression vectors were presented by Prof. Yao-Guang Liu of South China Agricultural University. According to the method they described (Ma et al. 2015), we constructed Cas9-Ghd8-gRNA expression vectors. Firstly, the Ghd8 gene of R900 was amplified. According to the gene sequence of Ghd8, the target site and the primer synthesis of double-strand connector were designed. At the same time, for higher mutation efficiency, we designed two target selection for this target gene, namely, we found the sequences of 19 to 20 bp bases TCAGGGGAACAAGGCGTACTG and GGCACTTGCTGAGCCCCGGT upstream of NGG in the CDS coding region of Ghd8, and then added enzyme digestion sites as the target sequence of U3 and U6 promoters, respectively, and designed two pairs of primers, namely, Cas-Ghd8-T1F: GGCATCAGGGGAACAAGGCGTACTG and Cas-Ghd8-T1R: AAACCAGGCCTTGTTCCCTGA; Cas-Ghd8-T2F: GCCGGGCACTTGCTGAGCCGGT and Cas-Ghd8-T2R: AAACACCGGGCTCAGCAAGGCC. Subsequently, the candidate target sequence and NGG were compared with the target genome by blast to avoid high similarity with other genome sequences at the 3′ end of the target sequence and NGG. Combined with KOD-FX high fidelity enzyme, PCR amplification system, and agarose gel electrophoresis technology, the final vector was obtained by enzyme linking method, and was verified by Escherichia coli genetic transformation and colony PCR. Combined with the verification of the Sanger sequencing and molecular digestion, they were sent to Wuhan Boyuan Biotechnology Co., Ltd, and R900 was used as the recipient parent to complete the plant genetic transformation. Hygromycin was used to detect whether the transformation was successful, and water and R900 genomic DNA were used as controls. At the same time, a pair of specific primers Ghd8-3F: GAGCATCACCACTACTCATCCCT were designed based on the two target positions of Ghd8; Ghd8-3R: CAATGTGCCAAGCCCAGATG to detect the base mutation at the target position, and finally obtain KO-1 (the original number is LQGM7) and KO-2 (the original number is LQGM8) transgenic lines. Their corresponding hybrid combinations are Y58S/KO-1 (the original number is LQGM19) and Y58S/KO-2 (the original number is LQGM20).
Genome assembly and annotation
Library construction and sequencing were performed on the MGISEQ-2000 platform (BGI, Shenzhen, China) for NGS short reads and on the PacBio Sequel II platform for long reads. Genome assembly was performed using Canu v1.8 (Koren et al. 2017), followed by polishing using Pilon (Walker et al. 2014). The 144.7 million and 148.8 million paired-end clean reads generated by Hi-C sequencing in R900 and Y58S, respectively (Supplemental Table S3), were clustered, sorted, and positioned in contigs using Lachesis. Based on the embryophyta_odb10 database, BUSCO (Simao et al. 2015) was used to evaluate the genome integrity. LAI was used as the standard for assessing the assembly of repetitive sequences (Ou et al. 2018). The NGS reads were aligned to the assembled genome using the BWA-MEM algorithm (Li and Durbin 2009). The collinearity alignment between rice genomes was performed using Minimap2 (-x asm5) (Li 2018).
Repeat sequences were masked using RepeatMasker (Zhi et al. 2006) and the TE library from the Repbase library (Edition-20181026) (Bao et al. 2015). Genes were predicted by integrating evidence from ab initio, homology-based, and mRNA. Specifically, Augustus (Stanke et al. 2006) and GlimmerHMM (Majoros et al. 2004) were used to perform de novo gene prediction on genomes that had been masked for repetitive sequences; exonerate was used for homology alignment for all known plant proteins in the UniProt database and predicted proteins in rice (MSU7), maize (Z. mays) (B73 v5), and Arabidopsis (A. thaliana) (TAIR10); and BLAT (Kent 2002) and GMAP (Wu and Watanabe 2005) were used for all O. sativa mRNAs from the GenBank database and transcripts assembled by Trinity (Grabherr et al. 2011) for four tissues in this study. The above-predicted gene models were integrated using EVidenceModeler (Haas et al. 2008) and finally updated using PASA (Haas et al. 2003). TRNA, rRNA, snRNA, and miRNA were predicted by using tRNAscan-SE (Schattner et al. 2005), Barrnap, and INFERNAL (Nawrocki and Eddy 2013). Functional annotation of protein sequences using BLASTP alignment with filtering condition E value < 1e−5. TE-related genes were annotated in the same way as Zhang et al. (2016). Fisher's exact test was used to detect the GO terms and KEGG pathways with significant gene enrichment.
Genomic variant identification and validation
Y58S was used as the reference genome, and MUMmer v4.0 (-maxmatch -c 90 -l 40) (Marçais et al. 2018) was used to align with the R900, R9311, R498, NIP, HZ, TF, MH63RS1, and ZS97RS1 genomes. Delta-filter -m was used to filter the results, and show-snps (-Clr th) identified SNPs and Indels. SnpEff (Cingolani et al. 2012) annotated the variants. The density distribution of SNPs and Indel on the genome was calculated with a sliding window of 100 kb. Based on the MUMmer results, the default parameters of SyRI Pipeline (Goel et al. 2019) were used to identify structural variants between genomes. The variants detected by SyRI include two types: genomic rearrangements and sequence variants. These variants were converted into three SV types, PAV, inversions, and translocations, according to the definition of SyRI results. CPL, DEL, DUP/INVDP (loss), HDR, NOTAL, and TDM in the Y58S genome were defined as Absence SVs (relative to Y58S). CPG, INS, DUP/INVDP (gain), HDR, NOTAL, and TDM in the query genome were defined as Presence SVs (relative to Y58S). INV is considered an inverted SV. Both TRANS and INVTR are considered translocated SVs.
For large structural variants, NGS was aligned to the Y58S and NIP genomes using BWA-MEM (Li and Durbin 2009), and the sequenced bam files were imported into IGV (Integrative Genomics Viewer) (Robinson et al. 2011) and verified by the variant locus read coverage to confirm the genotypes of different varieties. Rice varieties included Y58S, R900, R9311, R498, NIP, HZ, TF, MH63RS1, ZS97RS1, and YH2, with NGS data from previous studies (Lv et al. 2020). The XI–GJ genetic composition and introgression bin maps were constructed using the method described by Chen et al. (2020) with 3K-RG as the background and a 100-kb sliding window between Y58S, R900, R9311, YH2, and NIP using high-density SNPs. Details of 3,704 rice cloned genes (as of July 2021) and 342 QTNs were obtained from the funricegenes (https://funricegenes.github.io/) and RiceNavi databases (http://www.xhhuanglab.cn/tool/RiceNavi.html), respectively.
RNA-sequencing data analysis
In RNA-seq analysis, the use of reference genomes from different cultivars may introduce bias in gene expression, so it is recommended to choose a high-quality reference genome that is as closely related to the sample as possible (Slabaugh et al. 2019). The transcriptome samples in this study were all indica rice cultivars with close affinity to Y900, while Y58S was the common female parent of representative varieties of phase 2 to phase 4 super-rice and had a slightly higher assembly quality compared to R900 (Table 1). Therefore, after comparing the results of NIP and Y900 parental genomes, we selected Y58S as the reference genome for RNA-seq analysis (Supplemental Fig. S23). The “new Tuxedo” protocol (Pertea et al. 2016) was used for the analysis of assembly of transcripts, quantification of gene expression levels, etc., and DESeq2 (Love et al. 2014) was used for DEG analysis based on grouping information. DEGs were defined as those with a fold change > 2 and FDR < 0.05. TPM was calculated to represent gene expression; 0 < TPM ≤ 1 was defined as low expression. For specifically expressed genes, more than two-thirds of the samples must have TPM > 0 to be considered expressed.
Based on the differential gene expression trends of R900, Y58S, and Y900 in individual tissues, the heterosis patterns were classified according to three different modes (Modes I–III) (Fig. 4B). Mode I assigned all possible gene expression patterns to 12 (Swanson-Wagner et al. 2006). Mode II, based on 12 patterns of F1 and parental gene expression, can be summarized into five patterns, namely, H2P, CHP, B2P, CLP, and L2P, representing higher-than-2-parents, close-to-higher-parent, between the two parents, close-to-lower-parent, and lower than both parents, respectively (Wei et al. 2009). Mode III further classified the five models into PDO, DO, and ODO according to the definition of heterosis.
ASE analysis
First, the transcriptomic libraries of all samples were aligned to the R900 and Y58S genomes using HISAT2 (Kim et al. 2019); then, headers were added to the bam files using Picard, and SNPs were identified using the Genome Analysis Toolkit pipeline (McKenna et al. 2010). MarkDuplicates was used to remove duplicates, SplitNCigarReads was used to remove reads in the intron region, HaplotypeCaller was used to detect variants, and VariantFiltration was used to filter variants. Furthermore, the ASEReadCounter program (-min-depth 8) was used to identify the allele origin of each sample separately and filter out the SNPs that did not meet the requirements (the number of reads from samples of the same species as the reference genome should be much smaller than the number of reads from samples of different species). To eliminate the preference for a single genome, we combined all SNPs between the two genomes and the SNPs identified to R900 and Y58S as the reference genome. A total of six replicates of the three genotypes per tissue were used to finally identify ASEGs by comparing the difference in the number of reads of the two alleles in F1 by DESeq2 (P adjusted < 0.05) (Love et al. 2014).
Analysis of cis- and trans-regulated genes
Referring to the methods of previous studies (Wittkopp et al. 2004; Bao et al. 2019), cis- and trans-regulated genes were identified by comparing the gene expression of the two parents and the two alleles in F1 (Fig. 6A).
Accession numbers
All the raw data are archived at NCBI PRJNA825106 and NGDC PRJCA009056. The accession numbers of the major genes/proteins mentioned in this paper can be found in Supplemental Table S15. The Supplemental Data relevant to this study can be accessed from Figshare (DOI: 10.6084/m9.figshare.20140235).
Supplementary Material
Acknowledgments
We thank Qiusheng Xu (State Key Laboratory of Hybrid Rice, Hunan Hybrid Rice Research Center, Hunan Academy of Agricultural Sciences) for their useful discussion and opinion on the article.
Contributor Information
Zhizhong Sun, State Key Laboratory of Hybrid Rice, Hunan Hybrid Rice Research Center, Hunan Academy of Agricultural Sciences, Changsha 410125, China; Longping Branch, College of Biology, Hunan University, Changsha 410125, China.
Jianxiang Peng, Biobin Data Sciences Co., Ltd., Changsha 410221, China.
Qiming Lv, State Key Laboratory of Hybrid Rice, Hunan Hybrid Rice Research Center, Hunan Academy of Agricultural Sciences, Changsha 410125, China; Longping Branch, College of Biology, Hunan University, Changsha 410125, China.
Jia Ding, State Key Laboratory of Hybrid Rice, Hunan Hybrid Rice Research Center, Hunan Academy of Agricultural Sciences, Changsha 410125, China; College of Agronomy, Hunan Agricultural University, Changsha 410128, China.
Siyang Chen, College of Agronomy, Hunan Agricultural University, Changsha 410128, China.
Meijuan Duan, College of Agronomy, Hunan Agricultural University, Changsha 410128, China.
Qiang He, State Key Laboratory of Hybrid Rice, Hunan Hybrid Rice Research Center, Hunan Academy of Agricultural Sciences, Changsha 410125, China.
Jun Wu, State Key Laboratory of Hybrid Rice, Hunan Hybrid Rice Research Center, Hunan Academy of Agricultural Sciences, Changsha 410125, China.
Yan Tian, State Key Laboratory of Hybrid Rice, Hunan Hybrid Rice Research Center, Hunan Academy of Agricultural Sciences, Changsha 410125, China.
Dong Yu, State Key Laboratory of Hybrid Rice, Hunan Hybrid Rice Research Center, Hunan Academy of Agricultural Sciences, Changsha 410125, China.
Yanning Tan, State Key Laboratory of Hybrid Rice, Hunan Hybrid Rice Research Center, Hunan Academy of Agricultural Sciences, Changsha 410125, China.
Xiabing Sheng, State Key Laboratory of Hybrid Rice, Hunan Hybrid Rice Research Center, Hunan Academy of Agricultural Sciences, Changsha 410125, China.
Jin Chen, State Key Laboratory of Hybrid Rice, Hunan Hybrid Rice Research Center, Hunan Academy of Agricultural Sciences, Changsha 410125, China.
Xuewu Sun, State Key Laboratory of Hybrid Rice, Hunan Hybrid Rice Research Center, Hunan Academy of Agricultural Sciences, Changsha 410125, China.
Ling Liu, College of Agronomy, Hunan Agricultural University, Changsha 410128, China.
Rui Peng, State Key Laboratory of Hybrid Rice, Hunan Hybrid Rice Research Center, Hunan Academy of Agricultural Sciences, Changsha 410125, China.
Hai Liu, State Key Laboratory of Hybrid Rice, Hunan Hybrid Rice Research Center, Hunan Academy of Agricultural Sciences, Changsha 410125, China.
Tianshun Zhou, State Key Laboratory of Hybrid Rice, Hunan Hybrid Rice Research Center, Hunan Academy of Agricultural Sciences, Changsha 410125, China; Longping Branch, College of Biology, Hunan University, Changsha 410125, China.
Na Xu, State Key Laboratory of Hybrid Rice, Hunan Hybrid Rice Research Center, Hunan Academy of Agricultural Sciences, Changsha 410125, China.
Jianhang Lou, State Key Laboratory of Hybrid Rice, Hunan Hybrid Rice Research Center, Hunan Academy of Agricultural Sciences, Changsha 410125, China.
Longping Yuan, State Key Laboratory of Hybrid Rice, Hunan Hybrid Rice Research Center, Hunan Academy of Agricultural Sciences, Changsha 410125, China.
Bingbing Wang, Biobin Data Sciences Co., Ltd., Changsha 410221, China.
Dingyang Yuan, State Key Laboratory of Hybrid Rice, Hunan Hybrid Rice Research Center, Hunan Academy of Agricultural Sciences, Changsha 410125, China; Longping Branch, College of Biology, Hunan University, Changsha 410125, China.
Author contributions
L.Y. proposed and led the research project. D.Y. and B.W. conceived the research project and designed the experiments. Z.S. managed the project. Z.S. and J.P. wrote the manuscript. D.Y., B.W., Z.S., J.P., and Q.L. participated in the article discussion and performed the bioinformatics work. Z.S., J.D., S.C., Q.H., J.W., Y.T., M.D., D.Y., YN.T., XB.S., J.C., XW.S., L.L., R.P., H.L., T.Z., J.L., and N.X. conducted the experiments.
Supplemental data
The following materials are available in the online version of this article.
Supplemental Data S1 . SNP and Indel information of 2,790 cloned genes in R900 and Y58S.
Supplemental Data S2 . Three-hundred and forty-two QTNs of 253 cloned genes in the rice genome.
Supplemental Data S3 . CNV of R900 and Y58S and related genes.
Supplemental Data S4 . Cloned genes with genetic differences between Y58S and the three restorer lines.
Supplemental Data S5 . TPM of 36 samples from four tissues of three genotypes.
Supplemental Data S6 . Specific expressed genes in four tissues of three genotypes.
Supplemental Data S7 . List of gene expression patterns in four tissues.
Supplemental Data S8 . ASEGs in four tissues.
Supplemental Data S9 . List of genes for seven cis–trans regulation categories in four tissues.
Supplemental Data S10 . Expression patterns of all cloned genes.
Supplemental Data S11 . AES and cis–trans regulation of all cloned genes.
Supplemental Data S12 . Phenotypic values of important agronomic traits in the F2 population.
Supplemental Table S1 . Phenotypic comparison of Y1, Y2, Y900, and their parents.
Supplemental Table S2 . Summary of Pacbio sequencing data.
Supplemental Table S3 . Summary of NGS clean data.
Supplemental Table S4 . Gaps of two assemblies.
Supplemental Table S5 . Summary of NGS clean reads mapping to assemblies.
Supplemental Table S6 . Statistics of chromosome length (Mb) in Y58S, R900, and other rice reference genomes.
Supplemental Table S7 . Summary of gene prediction of R900 and Y58S.
Supplemental Table S8 . Statistics of repeat sequences in R900 and Y58S genomes.
Supplemental Table S9 . Summary of noncoding RNA annotation of R900 and Y58S.
Supplemental Table S10 . Summary of functional annotation of R900 and Y58S proteins.
Supplemental Table S11 . Variation statistics of rice reference genome (with Y58S as reference).
Supplemental Table S12 . Genes that express only one parental allele in F1.
Supplemental Table S13 . Comparison of the number of DEGs in three reference genomes.
Supplemental Table S14 . List of all primers used in this study.
Supplemental Table S15 . The accession numbers of the major genes/proteins mentioned in this paper.
Supplemental Figure S1 . Length distribution of Pacbio long reads.
Supplemental Figure S2 . Hi-C interactive heat map.
Supplemental Figure S3 . BUSCO assessment of nine rice genomes.
Supplemental Figure S4 . Collinearity of R900 with other reference genomes.
Supplemental Figure S5 . SNP density distribution of eight rice genomes with Y58S.
Supplemental Figure S6 . Annotation of SNPs and Indels between eight rice genomes and Y58S.
Supplemental Figure S7 . Reads mapping of Hd1 variants by NGS.
Supplemental Figure S8 . Reads mapping of NAL1 variants by NGS.
Supplemental Figure S9 . Comparison of the lengths of syntenic region and structural variation between eight rice genomes and Y58S.
Supplemental Figure S10 . Reads mapping of Ghd7 by NGS.
Supplemental Figure S11 . Reads mapping of OsTPP7 by NGS.
Supplemental Figure S12 . Reads mapping of PAV by NGS.
Supplemental Figure S13 . RNA-seq data quality control.
Supplemental Figure S14 . Variety-specific expression of genes in different tissues.
Supplemental Figure S15 . Statistics on the number of variety DEGs in different tissues.
Supplemental Figure S16 . Relationship between ASEGs and the fold change of biparental differential expression.
Supplemental Figure S17 . Statistics on the number of genes biased towards R900 and Y58S expression in different tissues.
Supplemental Figure S18 . The distribution of different cis- and trans-regulatory types of genes in four tissues.
Supplemental Figure S19 . Tissue specificity and tissue conservation of cis- and trans-regulatory genes.
Supplemental Figure S20 . The relationship between the number and location of SNPs for different regulatory categories of genes.
Supplemental Figure S21 . Comparison of the extent to which different regulatory categories differ in the expression of parental genes.
Supplemental Figure S22 . Examples of expression patterns of cloned genes.
Supplemental Figure S23 . Comparison of RNA-seq analysis of different reference genomes.
Supplemental Figure S24 . Effect of using different reference genomes on the Y900 biparental indica-japonica difference region.
Funding
This work was supported by the National Key Research and Development Project (2022YFD1200800), the Science and Technology Innovation Program of Hunan Province (2021NK1001, 2021RC4066, 2021NK1003, 2021NK1012, and 2021RC3113), the Major Science and Technology Program in Hainan Province (ZDKJ2021002), the National Natural Science Foundation of China (U21A20208 and 31801341), and the Natural Science Foundation of Hunan Province (2020JJ4456).
References
- Bao Y, Hu G, Grover CE, Conover J, Yuan D, Wendel JF. Unraveling cis and trans regulatory evolution during cotton domestication. Nat Commun. 2019:10(1):5399. 10.1038/s41467-019-13386-w [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bao W, Kojima KK, Kohany O. Repbase Update, a database of repetitive elements in eukaryotic genomes. Mob DNA. 2015:6(1):11. 10.1186/s13100-015-0041-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen L, Bian J, Shi S, Yu J, Khanzada H, Wassan GM, Zhu C, Luo X, Tong S, Yang X, et al. Genetic analysis for the grain number heterosis of a super-hybrid rice WFYT025 combination using RNA-seq. Rice (N Y). 2018:11(1):37. 10.1186/s12284-018-0229-y [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen Z, Li X, Lu H, Gao Q, Du H, Peng H, Qin P, Liang C. Genomic atlases of introgression and differentiation reveal breeding footprints in Chinese cultivated rice. J Genet Genomics. 2020:47(10):637–649. 10.1016/j.jgg.2020.10.006 [DOI] [PubMed] [Google Scholar]
- Cheng S. One-hundred years’ development and prospect of rice breeding in China. China Rice (In Chinese). 2021:27(4):1–6. 10.3969/j.issn.1006-8082.2021.04.001 [DOI] [Google Scholar]
- Cheng SH, Zhuang JY, Fan YY, Du JH, Cao LY. Progress in research and development on hybrid rice: a super-domesticate in China. Ann Bot. 2007:100(5):959–966. 10.1093/aob/mcm121 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cheung F. Yield: the search for the rice of the future. Nature. 2014:514(7524):S60–S61. 10.1038/514S60a [DOI] [PubMed] [Google Scholar]
- Cingolani P, Platts A, Wang le L, Coon M, Nguyen T, Wang L, Land SJ, Lu X, Ruden DM. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: sNPs in the genome of drosophila melanogaster strain w1118; iso-2; iso-3. Fly (Austin). 2012:6(2):80–92. 10.4161/fly.19695 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Diaz-Valenzuela E, Sawers RH, Cibrian-Jaramillo A. Cis- and trans-regulatory variations in the domestication of the chili pepper fruit. Mol Biol Evol. 2020:37(6):1593–1603. 10.1093/molbev/msaa027 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fu J, Zhang Y, Yan T, Li Y, Jiang N, Zhou Y, Zhou Q, Qin P, Fu C, Lin H, et al. Transcriptome profiling of two super hybrid rice provides insights into the genetic basis of heterosis. BMC Plant Biol. 2022:22(1):314. 10.1186/s12870-022-03697-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gao Z, Li H, Yang X, Yang P, Chen J, Shi T. Biased allelic expression in tissues of F1 hybrids between tropical and temperate lotus (Nelumbo nuicfera). Plant Mol Biol. 2021:106(1–2):207–220. 10.1007/s11103-021-01138-8 [DOI] [PubMed] [Google Scholar]
- Goel M, Sun H, Jiao WB, Schneeberger K. SyRI: finding genomic rearrangements and local sequence differences from whole-genome assemblies. Genome Biol. 2019:20(1):277. 10.1186/s13059-019-1911-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, Adiconis X, Fan L, Raychowdhury R, Zeng Q, et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol. 2011:29(7):644–652. 10.1038/nbt.1883 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Haas BJ, Delcher AL, Mount SM, Wortman JR, Smith RK Jr, Hannick LI, Maiti R, Ronning CM, Rusch DB, Town CD, et al. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res. 2003:31(19):5654–5666. 10.1093/nar/gkg770 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Haas BJ, Salzberg SL, Zhu W, Pertea M, Allen JE, Orvis J, White O, Buell CR, Wortman JR. Automated eukaryotic gene structure annotation using EVidenceModeler and the program to assemble spliced alignments. Genome Biol. 2008:9(1):R7. 10.1186/gb-2008-9-1-r7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- He Z, Xin Y, Wang C, Yang H, Xu Z, Cheng J, Li Z, Ye C, Yin H, Xie Z, et al. Genomics-Assisted improvement of super high-yield hybrid rice variety “super 1000” for resistance to bacterial blight and blast diseases. Front Plant Sci. 2022:13:881244. 10.3389/fpls.2022.881244 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Huang X, Yang S, Gong J, Zhao Y, Feng Q, Gong H, Li W, Zhan Q, Cheng B, Xia J, et al. Genomic analysis of hybrid rice varieties reveals numerous superior alleles that contribute to heterosis. Nat Commun. 2015:6(1):6258. 10.1038/ncomms7258 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Huang X, Yang S, Gong J, Zhao Q, Feng Q, Zhan Q, Zhao Y, Li W, Cheng B, Xia J, et al. Genomic architecture of heterosis for yield traits in rice. Nature. 2016:537(7622):629–633. 10.1038/nature19760 [DOI] [PubMed] [Google Scholar]
- Kent WJ. BLAT – the BLAST-like alignment tool. Genome Res. 2002:12:656–664. 10.1101/gr.229202 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kim D, Paggi JM, Park C, Bennett C, Salzberg SL. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat Biotechnol. 2019:37(8):907–915. 10.1038/s41587-019-0201-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Koren S, Walenz BP, Berlin K, Miller JR, Bergman NH, Phillippy AM. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res. 2017:27(5):722–736. 10.1101/gr.215087.116 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. 2018:34(18):3094–3100. 10.1093/bioinformatics/bty191 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009:25(14):1754–1760. 10.1093/bioinformatics/btp324 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li GB, He JX, Wu JL, Wang H, Zhang X, Liu J, Hu XH, Zhu Y, Shen S, Bai YF, et al. Overproduction of OsRACK1A, an effector-targeted scaffold protein promoting OsRBOHB-mediated ROS production, confers rice floral resistance to false smut disease without yield penalty. Mol Plant. 2022:15(11):1790–1806. 10.1016/j.molp.2022.10.009 [DOI] [PubMed] [Google Scholar]
- Li D, Huang Z, Song S, Xin Y, Mao D, Lv Q, Zhou M, Tian D, Tang M, Wu Q, et al. Integrated analysis of phenome, genome, and transcriptome of hybrid rice uncovered multiple heterosis-related loci for yield increase. Proc Natl Acad Sci U S A. 2016:113(41):E6026–E6035. 10.1073/pnas.1610115113 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lin Z, Qin P, Zhang X, Fu C, Deng H, Fu X, Huang Z, Jiang S, Li C, Tang X, et al. Divergent selection and genetic introgression shape the genome landscape of heterosis in hybrid rice. Proc Natl Acad Sci U S A. 2020:117(9):4623–4631. 10.1073/pnas.1919086117 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liu W, He G, Deng XW. Biological pathway expression complementation contributes to biomass heterosis in Arabidopsis. Proc Natl Acad Sci U S A. 2021:118(16):e2023278118. 10.1073/pnas.2023278118 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liu W, Zhang Y, He H, He G, Deng XW. From hybrid genomes to heterotic trait output: challenges and opportunities. Curr Opin Plant Biol. 2022:66:102193. 10.1016/j.pbi.2022.102193 [DOI] [PubMed] [Google Scholar]
- Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014:15(12):550. 10.1186/s13059-014-0550-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lv Q, Li W, Sun Z, Ouyang N, Jing X, He Q, Wu J, Zheng J, Zheng J, Tang S, et al. Resequencing of 1,143 indica rice accessions reveals important genetic variations and different heterosis patterns. Nat Commun. 2020:11(1):4778. 10.1038/s41467-020-18608-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ma X, Xing F, Jia Q, Zhang Q, Hu T, Wu B, Shao L, Zhao Y, Zhang Q, Zhou DX. Parental variation in CHG methylation is associated with allelic-specific expression in elite hybrid rice. Plant Physiol. 2021:186(2):1025–1041. 10.1093/plphys/kiab088 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ma X, Zhang Q, Zhu Q, Liu W, Chen Y, Qiu R, Wang B, Yang Z, Li H, Lin Y, et al. A robust CRISPR/Cas9 system for convenient, high-efficiency multiplex genome editing in monocot and dicot plants. Mol Plant. 2015:8(8):1274–1284. 10.1016/j.molp.2015.04.007 [DOI] [PubMed] [Google Scholar]
- Majoros WH, Pertea M, Salzberg SL. TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders. Bioinformatics. 2004:20(16):2878–2879. 10.1093/bioinformatics/bth315 [DOI] [PubMed] [Google Scholar]
- Marçais G, Delcher AL, Phillippy AM, Coston R, Salzberg SL, Zimin A. MUMmer4: a fast and versatile genome alignment system. PLoS Comput Biol. 2018:14(1):e1005944. 10.1371/journal.pcbi.1005944 [DOI] [PMC free article] [PubMed] [Google Scholar]
- McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, et al. The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010:20(9):1297–1303. 10.1101/gr.107524.110 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nawrocki EP, Eddy SR. Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics. 2013:29(22):2933–2935. 10.1093/bioinformatics/btt509 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ni Z, Kim ED, Ha M, Lackey E, Liu J, Zhang Y, Sun Q, Chen ZJ. Altered circadian rhythms regulate growth vigour in hybrids and allopolyploids. Nature. 2009:457(7227):327–331. 10.1038/nature07523 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ou S, Chen J, Jiang N. Assessing genome assembly quality using the LTR assembly index (LAI). Nucleic Acids Res. 2018:46(21):e126. 10.1093/nar/gky730 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pertea M, Kim D, Pertea GM, Leek JT, Salzberg SL. Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and ballgown. Nat Protoc. 2016:11(9):1650–1667. 10.1038/nprot.2016.095 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Robinson JT, Thorvaldsdóttir H, Winckler W, Guttman M, Lander ES, Getz G, Mesirov JP. Integrative genomics viewer. Nat Biotechnol. 2011:29(1):24–26. 10.1038/nbt.1754 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schattner P, Brooks AN, Lowe TM. The tRNAscan-SE, snoscan and snoGPS web servers for the detection of tRNAs and snoRNAs. Nucleic Acids Res. 2005:33(Web Server):W686–W689. 10.1093/nar/gki366 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shao L, Xing F, Xu C, Zhang Q, Che J, Wang X, Song J, Li X, Xiao J, Chen LL, et al. Patterns of genome-wide allele-specific expression in hybrid rice and the implications on the genetic basis of heterosis. Proc Natl Acad Sci U S A. 2019:116(12):5653–5658. 10.1073/pnas.1820513116 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shi F, Wang M, An Y. Overexpression of a B-type cytokinin response regulator (OsORR2) reduces plant height in rice. Plant Signal Behav. 2020:15(8):1780405. 10.1080/15592324.2020.1780405 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Si L, Chen J, Huang X, Gong H, Luo J, Hou Q, Zhou T, Lu T, Zhu J, Shangguan Y, et al. OsSPL13 controls grain size in cultivated rice. Nat Genet. 2016:48(4):447–456. 10.1038/ng.3518 [DOI] [PubMed] [Google Scholar]
- Simao FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015:31(19):3210–3212. 10.1093/bioinformatics/btv351 [DOI] [PubMed] [Google Scholar]
- Slabaugh E, Desai JS, Sartor RC, Lawas LMF, Jagadish SVK, Doherty CJ. Analysis of differential gene expression and alternative splicing is significantly influenced by choice of reference genome. RNA. 2019:25(6):669–684. 10.1261/rna.070227.118 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stanke M, Keller O, Gunduz I, Hayes A, Waack S, Morgenstern B. AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res. 2006:34(Web Server):W435–W439. 10.1093/nar/gkl200 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Swanson-Wagner R, Jia Y, DeCook R, Borsuk L, Nettleton D, Schnable P. All possible modes of gene action are observed in a global comparison of gene expression in a maize F1 hybrid and its inbred parents. Proc Natl Acad Sci U S A. 2006:103(18):6805–6810. 10.1073/pnas.0510430103 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Walker BJ, Abeel T, Shea T, Priest M, Abouelliel A, Sakthikumar S, Cuomo CA, Zeng Q, Wortman J, Young SK, et al. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS One. 2014:9(11):e112963. 10.1371/journal.pone.0112963 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang H, Hao J, Chen X, Hao Z, Wang X, Lou Y, Peng Y, Guo Z. Overexpression of rice WRKY89 enhances ultraviolet B tolerance and disease resistance in rice plants. Plant Mol Biol. 2007:65(6):799–815. 10.1007/s11103-007-9244-x [DOI] [PubMed] [Google Scholar]
- Wang H, Ye S, Mou T. Molecular breeding of rice restorer lines and hybrids for brown planthopper (BPH) resistance using the Bph14 and Bph15 genes. Rice (N Y). 2016:9(1):53. 10.1186/s12284-016-0126-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wei X, Qiu J, Yong K, Fan J, Zhang Q, Hua H, Liu J, Wang Q, Olsen KM, Han B, et al. A quantitative genomics map of rice provides genetic insights and guides breeding. Nat Genet. 2021:53(2):243–253. 10.1038/s41588-020-00769-9 [DOI] [PubMed] [Google Scholar]
- Wei G, Tao Y, Liu G, Chen C, Luo R, Xia H, Gan Q, Zeng H, Lu Z, Han Y, et al. A transcriptomic analysis of superhybrid rice LYP9 and its parents. Proc Natl Acad Sci U S A. 2009:106(19):7695–7701. 10.1073/pnas.0902340106 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wittkopp PJ, Haerum BK, Clark AG. Evolutionary changes in cis and trans gene regulation. Nature. 2004:430(6995):85–88. 10.1038/nature02698 [DOI] [PubMed] [Google Scholar]
- Wu J, Deng Q, Yuan D, Qi S. Progress of super hybrid rice research in China (in Chinese). Chin Sci Bull. 2016:61(35):3787–3796. 10.1360/N972016-01013 [DOI] [Google Scholar]
- Wu TD, Watanabe CK. GMAP: a genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics. 2005:21(9):1859–1875. 10.1093/bioinformatics/bti310 [DOI] [PubMed] [Google Scholar]
- Xiao J, Li J, Yuan L, Tanksley S. Dominance is the major genetic basis of heterosis in rice as revealed by QTL analysis using molecular markers. Genetics. 1995:140(2):745–754. 10.1093/genetics/140.2.745 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yan WH, Wang P, Chen HX, Zhou HJ, Li QP, Wang CR, Ding ZH, Zhang YS, Yu SB, Xing YZ, et al. A major QTL, Ghd8, plays pleiotropic roles in regulating grain productivity, plant height, and heading date in rice. Mol Plant. 2011:4(2):319–330. 10.1093/mp/ssq070 [DOI] [PubMed] [Google Scholar]
- Yang L, Liu P, Wang X, Jia A, Ren D, Tang Y, Tang Y, Deng XW, He G. A central circadian oscillator confers defense heterosis in hybrids without growth vigor costs. Nat Commun. 2021:12(1):2317. 10.1038/s41467-021-22268-z [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhai R, Feng Y, Wang H, Zhan X, Shen X, Wu W, Zhang Y, Chen D, Dai G, Yang Z, et al. Transcriptome analysis of rice root heterosis by RNA-Seq. BMC Genom. 2013:14(1):19. 10.1186/1471-2164-14-19 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang J, Chen LL, Xing F, Kudrna DA, Yao W, Copetti D, Mu T, Li W, Song JM, Xie W, et al. Extensive sequence divergence between the reference genomes of two elite indica rice varieties zhenshan 97 and minghui 63. Proc Natl Acad Sci U S A. 2016:113:E5163–E5171 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhi D, Raphael BJ, Price AL, Tang H, Pevzner PA. Identifying repeat domains in large genomes. Genome Biol. 2006:7(1):R7. 10.1186/gb-2006-7-1-r7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhou G, Chen Y, Yao W, Zhang C, Xie W, Hua J, Xing Y, Xiao J, Zhang Q. Genetic composition of yield heterosis in an elite rice hybrid. Proc Natl Acad Sci U S A. 2012:109(39):15847–15852. 10.1073/pnas.1214141109 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhou P, Hirsch CN, Briggs SP, Springer NM. Dynamic patterns of gene expression additivity and regulatory variation throughout maize development. Mol Plant. 2019:12(3):410–425. 10.1016/j.molp.2018.12.015 [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.