Abstract
Following whole-genome duplication (WGD), duplicate gene pairs (homoeologs) can evolve varying degrees of expression divergence. However, the determinants influencing these relative expression level differences (RFPKM) between homoeologs remain elusive. In this study, we analyzed the RFPKM between homoeologs in 3 angiosperms, Nymphaea colorata, Nelumbo nucifera, and Acorus tatarinowii, all having undergone a single WGD since the origin of angiosperms. Our results show significant positive correlations in RFPKM of homoeologs among tissues within the same species, and among orthologs across these 3 species, indicating convergent expression balance/bias between homoeologous gene copies following independent WGDs. We linked RFPKM between homoeologs to gene attributes associated with dosage-balance constraints, such as protein–protein interactions, lethal-phenotype scores in Arabidopsis (Arabidopsis thaliana) orthologs, domain numbers, and expression breadth. Notably, homoeologs with lower RFPKM often had more interactions and higher lethal-phenotype scores, indicating selective pressures favoring balanced expression. Also, homoeologs with lower RFPKM were more likely to be retained after WGDs in angiosperms. Within Nelumbo, greater RFPKM between homoeologs correlated with increased cis- and trans-regulatory differentiation between species, highlighting the ongoing escalation of gene expression divergence. We further found that expression degeneration in 1 copy of homoeologs is inclined toward nonfunctionalization. Our research highlights the importance of balanced expression, shaped by dosage-balance constraints, in the evolutionary retention of homoeologs in plants.
Following whole-genome duplication, gene expression divergence of homoeologs is influenced by dosage-balance constraints, with similarly expressed homoeologs more likely to be evolutionarily retained.
IN A NUTSHELL.
Background: Whole-genome duplication (WGD) is a common and frequent occurrence in plants, providing raw genetic material for evolution. Homoeologs (duplicate genes from a WGD) often diverge in expression levels, while some still maintain similar (balanced) expression levels between the two copies even after tens of millions of years. Dosage-balance constraints, which ensure the proper stoichiometry of protein complex subunits, play a crucial role in retaining both copies of homoeologs from the protein complex after a WGD.
Question: Do dosage balance constraints also play an important role in shaping similar expression levels for the homoeologous gene copies after ancient WGDs, as the stoichiometry of interacting proteins is maintained by ensuring the expression of the proper amounts of gene products?
Findings: By analyzing homoeologs in Nymphaea, Nelumbo, and Acorus, which each underwent a single WGD since the origin of angiosperms, we observed convergent expression balance/bias between homoeologous gene copies following independent WGD events. Homoeologs with balanced expression levels indeed exhibit characteristics indicative of stronger selective pressures related to dosage balance, and their putative orthologs are more likely to be retained after WGDs in other angiosperm lineages. Further, homoeologous genes with similar expression levels exhibit relatively less regulatory differentiation between species of Nelumbo, suggesting that dosage balance constraints also play a role in recent gene expression evolution. Additionally, the lower-expression copy of homoeologs is more prone to becoming nonfunctional.
Next steps: Understanding the expression divergence and fate of homoeologs that have undergone multiple WGDs is a more complex task that necessitates further analysis of evolutionarily close taxa with varying numbers of WGD events.
Introduction
Gene duplication provides extra genetic material for evolution to work on (Ohno 1970). Polyploidy, resulting from whole-genome duplication (WGD) where the entire set of genes is duplicated simultaneously, has been assumed to facilitate species diversification and survival under environmental turmoil (Lynch and Conery 2000; Lynch and Force 2000; Van de Peer et al. 2017, 2021; Fox et al. 2020; Roman-Palacios et al. 2020; Wu et al. 2020; Ebadi et al. 2023). Although the most likely fate of duplicated genes is gene loss (Lynch and Conery 2000), even after tens to hundreds of millions of years, a substantial portion of the homoeologs (genes duplicated in a WGD event) remain present in the extant genomes. The retention of both copies is often explained by assuming subfunctionalization (Force et al. 1999) or neofunctionalization (Ohno 1970) of genes, by gene dosage effects (Conant and Wolfe 2008) or dosage-balance constraints (Birchler and Veitia 2007), or other mechanisms (Van de Peer et al. 2017), such as the preferential retention of biologically meaningful gene clusters of interacting genes (i.e. genes encoding proteins acting in multiprotein complexes), including those with coexpression or preservation of epistatic interactions (Makino and McLysaght 2012). Of all theories of WGD-derived duplicate gene trajectories (deletion vs. retention), dosage sensitivity or dosage-balance constraint seems particularly important, as directly impacting gene dispensability and duplicability (Tasdighian et al. 2017; Birchler and Yang 2022). The dosage-balance hypothesis asserts that selection acts on the expressed amount of gene product to maintain the stoichiometry of protein complex subunits, crucial for their proper functioning (Papp et al. 2003; Birchler and Yang 2022). For polyploids, WGD often triggers gene loss and genomic reshuffling, yet many duplicates escape deletion due to such dosage-balance constraints to maintain proper stoichiometric balance, contributing greatly to the specific gene content of paleopolyploids or younger mesopolyploids (Geiser et al. 2016; Cheng et al. 2018; Kuzmin et al. 2020). Evidence in angiosperms also indicates that genes that function in “development” and “transcription regulation,” likely sensitive to dosage balance, are typically retained as duplicates post-WGD for millions of years while others eventually revert to single-copy status in most species (Maere et al. 2005; Li et al. 2016).
The fate of a redundant gene copy is indeed largely subjected to dosage-balance constraints because it is the amount of gene product (protein) that is firstly “visible” for natural selection. Although both regulatory and protein-coding regions are being duplicated through WGD, gene expression, a critical determinant of gene function for protein-coding genes, often exhibits substantial divergence in duplicate gene copies over time. A previous study in Arabidopsis (Arabidopsis thaliana) revealed that small-scale duplicates exhibit a more pronounced asymmetry in expression divergence between copies compared with duplicates derived from large-scale segmental duplications or WGDs (Casneuf et al. 2006). Yet, duplicates (homoeologs) from WGDs can also show substantial divergence in expression. This divergence often manifests itself as 1 copy dominating in expression (the other copy then shows “expression degeneration”) across different tissues, as frequently observed in various species, such as the common carp (Li et al. 2015) and different plant species (Ganko et al. 2007; Liang and Schnable 2018). However, there are instances where a fraction of homoeologs retain “balanced” expression levels, as evidenced in Arabidopsis (Coate et al. 2020). The potential range of expression level variation of copies among different homoeologous pairs, from dominance to balanced expression, does not seem to be random but indicates a complex expression landscape, where dosage-balance constraints might play an important role as stoichiometry of interacting proteins is maintained through intricate gene expression and synthesis of the proper amounts of proteins.
The intriguing patterns of gene expression level divergence between homoeologous copies after a WGD, and particularly what may constrain it, need further investigation. By focusing on plants that have undergone only 1 single WGD since the origin of angiosperms, such as Nelumbo nucifera (a sister lineage of most other eudicots), Acorus tatarinowii (a sister lineage of most other monocots), and Nymphaea colorata (a member of the ANA grade standing for Amborellales, Nymphaeales, and Austrobaileyales; a sister lineage of most other angiosperms including eudicots, monocots, magnoliids, and others) (Ming et al. 2013; Zhang et al. 2019; Shi and Chen 2020; Shi et al. 2020, 2022), we can eliminate confounding effects of recurring duplications (at least through WGD) within the same organism (Tiley et al. 2016). As this setting assures a uniform genesis for all WGD-derived duplicates per species, differences in expression between copies of homoeologs must be due to their differences in regulatory evolution rates. In this study, we aim to investigate the multiple factors associated with dosage-balance constraints that may hypothetically limit the expression divergence of the duplicate copies, such as protein–protein interactions (PPIs), the number of protein domains encoded in genes, protein length, expression breadth in various tissues, and so on. Our primary goal is to elucidate the transition from balanced to unbalanced (or not) gene expression between gene copies in plant species characterized by a single WGD. Further, given the availability of data including cis- and trans-regulatory changes between 2 Nelumbo species (Li et al. 2021a, 2021b; Zhang et al. 2022b; Gao et al. 2023), we aim to assess the dosage-balance constraint on the ongoing and escalating evolutionary differences between duplicate gene pairs with varying degrees of expression bias during the last ∼6.5 million years. Finally, we analyze the evolutionary, structural, and functional traits of high- vs. low-expression homoeologous copies to gain insights into how differences in expression levels between copies may manifest divergent evolutionary outcomes.
Results
Convergence of expression divergence of homoeologs
Chromosome-level genome assemblies of Ny. colorata (here after referred to as Nymphaea), N. nucifera (here after referred to as Nelumbo), and A. tatarinowii (here after referred to as Acorus), achieving high Benchmarking Universal Single-Copy Orthologs (BUSCO) scores of 94.40%, 94.60%, and 92.40%, respectively, have previously been published (Supplementary Table S1). These well-assembled and well-annotated genomes ensure high-quality datasets for our subsequent analyses. After eliminating redundant homoeologous gene pairs associated with tandem arrays (see Materials and Methods), we obtained 2,442, 5,018, and 3,631 homoeologous gene pairs through intraspecific synteny searches from the 31,475, 34,233, and 28,241 annotated genes in Nymphaea, Nelumbo, and Acorus, respectively (Supplementary Table S2). First, to uncover how different homoeologs in Nymphaea, Nelumbo, and Acorus may differ in expression, we summarized the relative frequency distribution of homoeologs according to their expression level difference (Fig. 1A). In total, 19, 9, and 5 tissues were used to measure expression level differences between homoeologous copies for Nelumbo, Nymphaea, and Acorus, respectively (Supplementary Table S3). To ensure comparability of gene expression divergence or expression decay of one of the homoeologs across species and across homoeologous gene pairs, we normalized the expression differences between duplicate copies as RFPKM for each tissue sample (see Materials and Methods and Fig. 1A). RFPKM values range from 0 to 1, i.e. from no difference (no divergence) in gene expression between both duplicates to complete silence of 1 copy (Fig. 1A). The RFPKM of homoeologs within the same species shows a high and significant correlation across different tissues, as indicated by Pearson correlation tests (all P-values <0.01), suggesting that the extent of gene expression divergence among homoeologs is highly consistent and stable across various plant tissues (Supplementary Fig. S1 and Table S4). Consequently, our subsequent analyses of homoeolog expression evolution primarily utilize the average RFPKM values across all tissues. Generally, in all 3 species studied, the distribution of homoeologs is skewed toward higher RFPKM values (Fig. 1A). This trend indicates that for most duplicate pairs, 1 copy undergoes considerable expression degeneration (referred to as biased expression) following WGD, while a subset maintains balanced expression levels between the copies.
Figure 1.
Gene expression characteristics of homoeologs for N. nucifera, A. tatarinowii, and Ny. colorata.A) Distributions of the proportion of homoeologs according to relative expression difference between the 2 homoeologous copies (RFPKM) for Nelumbo, Acorus, and Nymphaea. WGD, whole-genome duplication; FPKM, fragments per kilobase of exon model per million mapped fragments. B) Hypothesis showing that higher expression balance of duplicate copies is associated with more PPIs and higher sensitivity to expression change. Horizontal arrows, evolutionary divergence; vertical arrows, the transitions from RNAs to proteins, and from proteins to phenotypes. C to E) Average RFPKM of putatively orthologous duplicates are significantly correlated between Nymphaea and Nelumbo (C), Nymphaea and Acorus (D), and Nelumbo and Acorus (E). F to K) Average RFPKM of WGD duplicates in Nelumbo are significantly negatively correlated with average protein length (F), number of Pfam domains genes (G), number of PPIs of putative orthologs in Arabidopsis (H), lethal-phenotype score of Arabidopsis putative ortholog groups (I), and significantly positively correlated with average nonsynonymous substitutions per site (dN) after WGD (J), and tissue specificity (τ index) of gene expression (K). r, correlation coefficient of Pearson correlation test; log, log-transformed values of gene features; shading band, the 95% (default) confidence level interval for predictions from a linear model. **P < 0.01.
Over tens of millions of years following a single WGD, it remains noteworthy that a fraction of homoeologs persist with highly balanced (highly similar) expression levels. This observation suggests the presence of a stringent selective pressure, likely driven by the critical dosage balance required for PPIs. Such selective constraints are essential to mitigate expression alterations and to ensure the production of the correct number of interacting proteins in multiprotein complexes within the cell, thus preventing dysfunctional or lethal phenotypes (Fig. 1B). Should such constraints on gene expression last for extended periods, we would anticipate that orthologous duplicates between species exhibit consistent patterns in expression balance or bias following independent WGDs in each species. In agreement with this hypothesis, we observed significant correlations in the average RFPKM of putatively orthologous duplicates between Nymphaea and Nelumbo (Fig. 1C), Nymphaea and Acorus (Fig. 1D), and Nelumbo and Acorus (Fig. 1E). These correlations underscore the influence of selective pressures in sculpting the convergent patterns of expression balance or unbalance among putative orthologs across these species in the context of their respective WGDs.
To investigate whether, and to what extent, dosage-balance constraints or negative selection influence the relative expression differences in homoeologs across 3 species, we explored the relationships between the average RFPKM of WGD duplicates and a range of gene characteristics, encompassing structural, functional, and molecular evolutionary aspects, with both linear and log-transformed regression analyses (Supplementary Table S5). Consistently, our findings reveal that although the Pearson correlation coefficient suggests a weak correlation between RFPKM and Pfam domain count (r < −0.1), the average RFPKM of duplicates in all 3 species exhibits a significant negative correlation with several gene attributes: protein length, number of exons, Pfam domain count per gene, and the number of PPIs of putative orthologs in Arabidopsis (Pearson correlation tests, all P-values <0.01; Fig. 1, F to H, Supplementary Figs. S2 to S4 and Table S6). After transferring the lethal-phenotype scores from Arabidopsis (Lloyd et al. 2015) to our 3 investigated species via putative ortholog assignment, our analysis revealed that the average RFPKM of duplicates across the 3 species displays a significantly negative correlation with the lethal-phenotype score of their putative orthologs in Arabidopsis, although the correlation coefficients are weak (r < −0.1) (Fig. 1I, Supplementary Figs. S3J and S4J; all P-values <0.01). This finding indicates a potential negative selection on these homoeologs with balanced expression, as evidenced by Pearson correlation tests (Fig. 1I, Supplementary Figs. S3J and S4J). In line with this, we observed that homoeologs exhibiting balanced expression typically demonstrate lower tissue specificity (τ) in gene expression, implying their involvement in multiple plant tissue types and a higher degree of essentiality (Fig. 1K, Supplementary Figs. S3D and S4C). This conclusion is further supported by molecular evolutionary data, where average RFPKM shows significant positive correlations with both synonymous (dS) and, notably, nonsynonymous (dN) substitution rates (Pearson correlation tests, all P-values <0.01; Fig. 1J, Supplementary Figs. S2A, S3A, B, and S4A, B). The stronger correlation with dN is consistent with a previous finding in Arabidopsis, indicating rapid protein evolution associated with expression change (Ganko et al. 2007). Yet, RFPKM only have significantly positive correlations with dN/dS ratio (ω) in Nymphaea and Nelumbo but not Acorus, possibly because of other selective pressures acting in this context (Supplementary Figs. S2B, S3C, and S4D). When we independently conducted correlation tests between the RFPKM of homoeologs for each individual plant tissue and a variety of gene characteristics, we observed correlation patterns similar to those found using the average RFPKM: while the P-values for individual tissues are slightly higher, the correlation coefficients (r) remain consistent between average RFPKM and those of individual tissues (Supplementary Table S5). This suggests that increasing the number of tissues sampled for average RFPKM can lead to lower P-values, but the overall trends in positive or negative correlations remain unchanged if we use different individual tissue samples. In addition, we compared correlation coefficients (r) of RFPKM and gene traits among Nymphaea, Nelumbo and Acorus listed in Supplementary Table S6, and we found convergent trends in RFPKM–gene trait relationships among species based on significant Pearson correlations (Supplementary Fig. S5). Further, upon categorizing various homoeologs into gene ontology (GO) slim categories, it was observed that those linked to dosage sensitive and complex systems, notably in categories like “regulation of gene expression, epigenetic” and “translation,” tend to have some of lower average RFPKM values. Conversely, homoeologs involved in “secondary metabolic process” and “response to biotic stimulus” exhibit some of higher average RFPKM values (Fig. 2, Supplementary Figs. S6 and S7), supporting a previous study that stress-regulated genes evolve rapidly in expression (Zou et al. 2009). These functional annotations further corroborate the key role of dosage balance in constraining the expression level evolution between homoeologs.
Figure 2.
Violin plots illustrating relative expression difference (RFPKM) of nelumbo homoeologs varies among GO slim categories. Bars in violins, median and quartiles; FPKM, fragments per kilobase of exon model per million mapped fragments. The sample sizes of the GO slim categories, from bottom to top in the figure, are as follows: 12, 10, 19, 50, 59, 115, 249, 139, 39, 361, 434, 207, 424, 366, 448, 259, 101, 27, 370, 600, 523, 178, 2526, 659, 720, 135, 711, 121, 61, 1323, 1814, 112, 402, 51, 116, 335, 513, 582, 78, 37, 413, 43, 276, 27, and 59.
Expression balance between homoeologs predicts copy number change of putative orthologs of angiosperms experiencing different rounds of WGDs
In light of previous studies, suggesting that post-WGD slow-evolving genes are less likely to be lost (Inoue et al. 2015), we hypothesize that duplicate pairs with greater expression balance (less expression divergence) are subject to strong dosage-balance constraints and consequently, orthologs tend to have copy numbers that correlate with the number of experienced WGDs (Tasdighian et al. 2017). For instance, for 1 copy in Amborella trichopoda, Vitis vinifera should have 3 copies and Brassica rapa even 36 copies, given their respective number of WGD(s) (Fig. 3A, Supplementary Table S7). This observation allows for the measurement of “relative dosage sensitivity” by Pearson correlation analyses. For example, while observed copy numbers in some gene families, like the leucine-rich repeat receptor-like kinase IGP4 (IMPAIRED IN GLYCAN PERCEPTION 4) functioning in plant immunity (OG0000026; Martín-Dacal et al. 2023) show a lack of significant correlation with the expected post-WGD copy numbers (r = −0.043, P = 0.8393; Fig. 3B), other genes such as BAK1 (BRI1-ASSOCIATED RECEPTOR KINASE) and its OGs (another group of leucine-rich receptor-like repeat kinases, OG0001486), essential for various cellular processes including brassinosteroid signaling (Li et al. 2002), displayed a pronounced sensitivity to dosage alterations evidenced by fitting a strong positive linear regression with expected copy numbers post-WGD events (r = 0.491, P = 0.0126; Fig. 3C). The relatively high dosage sensitivity of BAK1 is also reflected by a study of Arabidopsis where single, double, and triple mutants of BAK1 paralogs (SERKs or SOMATIC EMBRYOGENESIS RECEPTOR-LIKE KINASES) exhibit more severe reduction of hypocotyl (Gou et al. 2012) and root growth (Ou et al. 2022), emphasizing the critical role of gene dosage in plant developmental processes (Fig. 3D). Our analysis of genomic structure and gene expression patterns revealed that the microsyntenies surrounding BAK1 are conserved, exhibiting a consistent 1:1:2:2:2 distribution across Amborella, Aristolochia, Nymphaea, Nelumbo, and Acorus (Fig. 3E). Moreover, we noted that BAK1's putative orthologs show balanced expression in the tissues of Nelumbo and Acorus, while this balanced expression was not well preserved in Nymphaea, likely due to its older WGD age (Zhang et al. 2019; Fig. 3F).
Figure 3.
The role of gene dosage in the gene expression patterns of homoeologs. A) The copy number of dosage-sensitive genes aligns with number of WGD, whole-genome triplication (WGT), and WGM events. Bar length, the relative copy number in relation to Amborella (a taxon without WGD since the origin of angiosperm). B) Copy number dynamics of putative ortholog OG0000026 (containing IGP4) in angiosperms: lack of significant correlation between observed and expected copy numbers post-WGD events. C) Copy number dynamics of putative ortholog OG0001486 (containing BAK1) in angiosperms: presence of significant correlation between observed and expected copy numbers post-WGD events. D) Hypocotyl growth in Arabidopsis under dark conditions correlates with silencing of more BAK1 inparalogs (Gou et al. 2012). E) Microsynteny patterns of BAK1: consistent 1:1:2:2:2 distribution in Amborella, Aristolochia, Nymphaea, Nelumbo, and Acorus. F) Tissue-specific expression patterns of BAK1 putative orthologs in Nymphaea, Nelumbo, and Acorus. FPKM, fragments per kilobase of exon model per million mapped fragments. G to I) Pearson correlation between relative expression differences (RFPKM) of duplicate pairs in Nymphaea (G), Nelumbo (H), and Acorus (I) and their dosage sensitivity (rcopy number), as reflected by copy number changes in angiosperms associated with expected post-WGD copy numbers. **P < 0.01; shading band, the 95% (default) confidence level interval for predictions from a linear model.
Our correlation analyses reveal a significant negative association between the relative expression differences of duplicate pairs, i.e. average RFPKM, and their dosage sensitivity in response to WGD events proxied by rcopy number (observed vs. expected post-WGD copy numbers). This association was consistent across Nymphaea, Nelumbo, and Acorus, thereby highlighting the universal nature of our findings (Fig. 3, G to I). Upon quantifying the propensity for gene loss (PGL) of each OG using COUNT software, which performs evolutionary analysis of phylogenetic profiles with parsimony and likelihood (Csűös 2010), we observed a notable positive correlation between average RFPKM and their PGL values (Supplementary Fig. S8). This observation further corroborates our hypothesis that the expression balance of homoeologs is indicative of gene longevity following WGDs.
Homoeologs with greater expression difference show rapid regulatory change between Nelumbo species
To further our understanding of expression divergence of duplicates resulting from WGD, we considered the role of cis- and trans-regulatory variations in 2 Nelumbo species, namely N. nucifera and Nelumbo lutea, which diverged ∼6.5 million years ago. Our hypothesis posits that these regulatory changes are more pronounced in homoeologs with high RFPKM values given lesser dosage-balance constraints. Through allele-specific expression analysis (see Materials and Methods; Fig. 4A; Gao et al. 2023), we quantified the regulatory changes, uncovering positive correlations between the magnitude of cis- and trans-regulatory alterations and relative expression differences RFPKM for both linear and log-transformed regressions (Pearson correlation test, all P-values <0.01), which support the hypothesis that regulatory changes are more pronounced in homoeologs with high RFPKM, due to lesser dosage-balance constraints (Figs. 4, B to D, Supplementary Table S8). Thus, cis- and trans-regulatory mutations uncover a fascinating evolutionary path distinguishing homoeologs with balanced expression from those with biased expression. This distinction is further highlighted by a lower frequency of premature stop codon mutations in homoeologs exhibiting balanced expression (RFPKM ranging from 0 to 0.6), compared with their biased expression counterparts (RFPKM exceeding 0.6) within Nelumbo populations (Supplementary Fig. S9).
Figure 4.
Relationship between cis- and trans-regulatory variation in homoeologs and relative expression differences in Nelumbo species. A) Model depicting quantification of cis- and trans-regulatory changes based on allele-specific expression between N. nucifera and N. lutea. B) Pearson correlation between relative expression difference (RFPKM) and cis-regulatory change magnitude (H8-S1). C) Pearson correlation between RFPKM and trans-regulatory change magnitude (H8-S1). D) Pearson correlation between RFPKM and combined cis- and trans-regulatory change magnitude (H8-S1). FPKM, fragments per kilobase of exon model per million mapped fragments; log, log-transformed values of regulatory change magnitude; shading band, the 95% (default) confidence level interval for predictions from a linear model. *P < 0.05; **P < 0.01.
Although, as mentioned above, we conducted correlation tests between RFPKM and various factors, such as gene features, relative dosage sensitivity, and regulatory divergence using published genome assemblies and annotations, it is unclear to what extent our results are impacted by incomplete genome assembly or annotation. To address this concern, we performed additional correlation tests. Specifically, we simulated scenarios with various degrees of incompleteness by randomly removing 2.5%, 5%, 10%, 20%, and 40% of the Nelumbo homoeologs from our dataset. The results reveal that these simulated deletions have minimal effect on the correlation coefficient (r) (Supplementary Table S9). As the percentage of deletions increases, the P-values show a slight upward trend, but they remain statistically significant even at the highest deletion level of 40% (Supplementary Table S9). Considering these results, along with the high BUSCO scores achieved for the 3 species studied, we are confident in the adequacy and reliability of our datasets to support our conclusions.
Divergent evolutionary paths between homoeologous copies with low and high expression
A further analysis was inspired by a study of Bsister genes in crucifers (Hoffmeier et al. 2018). Bsister genes, belonging to MIKC-type MADS-box genes encoding transcription factors, play a vital role in ovule and seed development (Hoffmeier et al. 2018). Hoffmeier et al. suggested that 1 ancient gene copy from the core eudicot γ triplication (Jiao et al. 2012; Shi and Chen 2020), known as a GOA-like gene, has often lost its function and got lost in different plant lineages due to their relatively lower expression, whereas ABS-like genes, derived from the other ancient copy of the Bsister genes, are significantly conserved due to their relatively higher expression. These GOA-like genes were therefore referred to as “a dead gene walking,” the hypothesis being that varying expression levels in these genes lead to different evolutionary outcomes (Fig. 5A). We tested this hypothesis by examining the evolutionary outcomes, including rates of nonsynonymous (dN) and synonymous (dS) substitutions, dN/dS ratios (ω), exon numbers, CDS lengths, protein lengths, Pfam domains, and tissue specificity of expression, in low- vs. high-expression homoeologous copies in Nelumbo (Fig. 5, B to D, Supplementary Fig. S10 and Table S10). Significant differences observed (P < 0.01), as per the paired t-tests, affirmed our hypothesis that higher expression copies tending to be more conserved in terms of sequence substitutions and gene structure, showing broader expression breadth. The exceptions observed in the number of exons and Pfam domains could be attributed to the intrinsic structural and functional constraints of these genomic elements (Supplementary Fig. S10 and Table S10), which may be less susceptible to evolutionary changes driven by differential expression levels. The trend mirrored in Nymphaea and Acorus suggests a broader applicability of this evolutionary pattern (Supplementary Figs. S11 and S12, and Table S10). Thus, homoeologous copies with relatively higher expression levels are subject to stronger selective constraints, thus retaining essential functions, resisting mutations and nonfunctionalization.
Figure 5.
Homoeologs with high and low gene expression in immature stamen of Nelumbo. A) The “dead gene walking” model for the “low expression” copy of homoeologs. Arrows between circles, direction of evolutionary divergence. B to E) Comparison between “high-expression” and “low-expression” copies in protein length (B), nonsynonymous substitutions per site (dN) (C), tissue specificity (τ) (D), magnitude of cis-regulatory mutations (in H8-S1) (E), and magnitude of trans-regulatory mutations (in H8-S1) (F) (paired t-test, **P < 0.01). Solid line in violin plot, median value; dashed line in violin plot: quantile; n, samples size. G) Comparison between “high-expression” and “low-expression” copies in proportion of genes with premature stop codon mutations in Nelumbo populations (χ2 test, *P < 0.05).
Building on this hypothesis, we also considered cis- and trans-regulatory changes of homoeologs between the closely related species N. nucifera and N. lutea. Consistent with the Bsister gene narrative, our results revealed significant differences in the magnitude of cis- and trans-regulatory changes between higher and lower expression homoeologs (Fig. 5, E and F, Supplementary Fig. S10 and Table S11). The lower magnitudes of cis- (measured by |B|) and trans-regulatory mutations (measured by |A − B|; see Materials and Methods) in higher expression copies suggest that conservation and higher gene expression levels are intertwined. This is also particularly evident in the reduced incidence of premature stop codon mutations in these highly expressed copies, suggesting an ongoing selective process preventing gene copies with higher expression from nonfunctionalization (χ2 test, P < 0.01; Fig. 5G, Supplementary Fig. S13).
Discussion
We considered 3 angiosperms, each having undergone a single, independent WGD, since the origin of angiosperms, namely Nymphaea (a WGD of 117–98 MYA; Zhang et al. 2019), Nelumbo (a WGD about 65 MYA; Ming et al. 2013; Shi et al. 2020), and Acorus (a WGD about 41.7 MYA; Shi et al. 2022; Ma et al. 2023). Despite 1 or 2 older WGDs preceding the origin of angiosperms (Jiao et al. 2011; Ruprecht et al. 2017), our intraspecific synteny search using MCScanX revealed only 127 homoeologous pairs in Aristolochia, with none in Amborella, 2 species without a lineage-specific WGD (Amborella Genome Project 2013; Qin et al. 2021). This also implies that the number of homoeologs resulting from 1 or 2 older WGDs (be it in the ancestral angiosperm and/or ancestral seed plant lineage) will be very limited in Nymphaea, Nelumbo, and Acorus, and not affecting the analysis described here. Differences in the number of homoeologs identified in each of the 3 species used can be attributed to several factors including the age differences between the 3 WGDs, as well as the numbers of ancestral genes present before each WGD, and lineage-specific gene loss rates. Hence, the uniformity in the date of origin of the majority of homoeologs—because identified through synteny analysis suggesting large-scale duplication—enables us to confidently attribute any observed expression divergence between most homoeologous copies to the varying strengths of dosage-balance constraint, the focal point of our study. This approach precludes confounding factors associated with species that have undergone multiple WGDs, such as Arabidopsis, or smaller scale gene duplicates like tandem duplicates, where the origins are more dispersed in time, rendering it difficult to discern if expression divergence is driven by dosage-balance constraint, or simply the passage of time.
Careful analysis showed that gene expression evolution of homoeologs in these Nymphaea, Nelumbo, and Acorus—whether balanced or unbalanced between copies—is not random. Instead, it seems determined by dosage-balance constraints. Expression divergence in duplicate genes is often observed between gene copies following gene duplication or WGD events. One archetype of this divergence is the scenario where 1 copy exhibits significantly higher expression than its counterpart. Such biased expression level between copies is often linked to the “subgenome dominance” phenomenon and can be attributed to various factors, including mutations in regulatory regions, methylations, chromatin accessibility, and changes in transcription factor affinities, acting different in different subgenomes (Li et al. 2005, 2022; Cheng et al. 2016; Zhao et al. 2017; Bird et al. 2021; Garcia-Lozano et al. 2021). As our study reveals, it becomes increasingly evident that the post-WGD evolutionary trajectory of gene expression is not uniformly characterized by divergence. Remarkably, a subset of gene pairs maintains similar expression levels over extensive evolutionary timescales. This finding complements the widely accepted hypothesis that redundant gene copies are preserved primarily through subfunctionalization or neofunctionalization. In quantifying expression bias between copies using RFPKM, we observe a convergent pattern among putative orthologs across different species. This pattern suggests a selective pressure influencing expression evolution. It seems that convergence in expression levels may be driven by factors (gene features), such as protein lengths, the number of protein domains, and the number of PPIs. These gene features might indirectly relate to dosage-balance constraints. A protein domain is a distinct, conserved region within a protein that can independently fold and function, with multi-domain proteins often being longer and potentially more prone to diverse PPIs (Zhang and Yang 2015). Additionally, lower lethal-phenotype scores in Arabidopsis putative orthologs for homoeologs with balanced expression further implies strong selection against expression variation. Indeed, homoeologs with balanced expression levels are likely subjected to more stringent stoichiometric constraints. As a result, these homoeologs with highly balanced expression levels are likely more susceptible to purifying selection. The lethal-phenotype score serves as an alternative metric to assess the constraint imposed on genes, complementing “protein structure”-related measures. In a study focused on Arabidopsis, each gene was assigned a lethal-phenotype score ranging from 0 to 1 (Lloyd et al. 2015). In this scoring system, higher values correspond to a greater probability of exhibiting lethal phenotypes when the gene is disrupted. This provides a quantifiable way to understand a gene's essentiality. Lower expression tissue specificity for homoeologs with balanced expression is also supported by the association between broader gene expression and higher essentiality that has been observed earlier (Liao et al. 2006). Furthermore, these genes are characterized by lower rates of nonsynonymous (amino acid) sequence substitutions and a broader expression range across tissues, also indicating strong purifying selection (Kondrashov et al. 2002; Cardoso-Moreira et al. 2016). Our observed increase in the number of exons for genes that show balanced expression points to a more complex gene structure, potentially enabling diverse functions (Zhang 2003; Van de Peer et al. 2009). In eukaryotes, there is a positive correlation between the number of exons and protein domains (Liu and Grigoriev 2004), which means that a higher number of exons might lead to a higher number of PPIs, and therefore, we also incorporated the number of exons as an extra variable. This overall gene complexity may result in increased functional constraints and dosage balance, a hypothesis supported by previous studies (Carels and Bernardi 2000; Bowers et al. 2022). We realized that, in our study of RFPKM, there are differences in dataset extensiveness among species, particularly for Nelumbo because its genome was published already in 2013, whereas the genomes of Nymphaea and Acorus were published in 2019 and 2022, respectively. However, congruent correlation tests and other statistical analyses of different tissues and species, ensure that our conclusions are consistent despite differences in dataset size. Overall, our research thereby adds a crucial layer to our understanding of gene expression dynamics of duplicate genes in the context of evolutionary genomics (Pal et al. 2001; Conant and Wagner 2003; Jordan et al. 2005; Holland et al. 2017). We acknowledge that some features, like the average number of Pfam domains and the lethal-phenotype score of Arabidopsis OGs, show weaker correlations with RFPKM. However, we believe that this is reasonable given that sequence divergence and tissue expression specificity exhibit a much broader range of variance, allowing for precise characterization of genes. Conversely, the lethal-phenotype score and the number of Pfam domains are estimated more crudely, based on homologous gene transfer and annotations of known domains, and exhibit a limited range of variance. The lethal-phenotype score and the number of Pfam domains, thus, serve as auxiliary parameters, providing additional support for reflecting the levels of dosage-balance constraint on genes. Such variations in correlations are also commonly observed in studies like genome-wide association study, where there are leading loci and “suboptimal” loci differentially associated with target traits. Overall, we believe that all gene features analyzed point to the crucial role of dosage-balance constraint in shaping expression divergence between homoeologous copies.
Several studies demonstrated that dosage-sensitive genes are slow in regulatory change after WGDs. For example, a study on diploid and polyploid Glycine species suggests that duplicates from different rounds of WGDs and annotated with “metabolic pathways” and GOs that are putative dosage sensitive exhibit reduced expression variance across the species compared with those putative dosage-insensitive genes. This indicates a tendency toward stabilized, less variable gene expression in response to WGDs for dosage-sensitive genes (Coate et al. 2016). Another study revealed that following ploidy changes in Arabidopsis, genes sensitive to dosage balance showed more coordinated transcriptional responses (similar expression alterations) than dosage-insensitive genes and less variable expression level among accessions, indicating that gene expression regulation and duplicate gene retention are influenced by selection for dosage balance rather than simple gene dosage increase (Song et al. 2020). Thus, our hypothesis centers upon the premise that dosage-sensitive genes, characterized by their slow expression change during species divergence as exemplified in Glycine and Arabidopsis, can maintain balanced expression levels between homoeologs even after extensive periods of paleopolyploidy, spanning tens of millions of years. Consequently, we posited and tested that the persistence of balanced gene expression in homoeologs serves as an indicator of high dosage sensitivity and gene longevity, a key aspect we have further examined in our current study. To conclude, the retention of duplicate genes with balanced expression serves as a mechanism to buffer against disruption of the functional integrity of the stoichiometry of interacting proteins.
Dosage-balance constraints affect more than just gene expression (Birchler and Veitia 2012). In plants, vertebrates, yeast, and other organisms, a common pattern emerges where the fate (retention or deletion) of duplicated genes post-WGD is influenced by factors such as dosage sensitivity (Birchler and Veitia 2021). In vertebrates, the selective constraints on coding sequences of nervous system genes greatly influence duplicate gene retention, particularly after WGD, due to the need for purifying selection against protein misfolding or misinteracting in nonrenewable neural tissues (Roux et al. 2017). In yeast, WGD gene retention is shaped by complex genetic interactions, revealing that duplicated genes often have entangled functions and their retention is influenced by factors like gene expression, protein interactions, and evolutionary age (Kuzmin et al. 2020). In angiosperms, gene duplicability across 37 surveyed species showed to be remarkably similar, even in the context of independent WGDs, indicating a potential sensitivity of these genes to dosage balance (Li et al. 2016). The variation in WGD episodes among different angiosperm lineages (Van de Peer et al. 2017) implies that for genes sensitive to dosage changes, the number of copies in related species should align with those produced by their historical WGDs or whole-genome multiplications (WGMs). Indeed, reciprocally retained genes after WGDs or WGMs in angiosperm lineages exhibit dosage-balance sensitivity, based on functional annotations, characterized by stronger sequence divergence constraints and lower rates of functional and expression divergence compared with other putative dosage-insensitive genes (Tasdighian et al. 2017). In alignment with these studies, we found that homoeologs with balanced expression typically not only show a copy number of putative orthologs (genes in different species that evolved from a common ancestral gene) consistent with the expected numbers post-WGD but also a lower propensity of gene loss during angiosperm radiation. For example, BAK1 and its related SERK homologs, essential leucine-rich repeat receptor-like kinases in plants, interact with BRI1 for brassinosteroid signaling essential for plant growth, G proteins for sugar signaling, and BTL2 in immune responses (Li et al. 2002; Liu et al. 2020). The correct gene dosage of these kinases is critical, as altering their numbers strongly impacts plant growth (Gou et al. 2012). Our study also shows the dosage sensitivity of SERKs in angiosperms, underscoring their importance in plant biology. Thus, our research indicates that maintaining expression balance of homoeologs is indicative of gene essentiality and persistence following any independent WGD events in plants. However, interestingly, we found that SERKs are missing in some species like the carnivorous floating bladderwort (Utricularia gibba) and the desert-dwelling date palm (Phoenix dactylifera). We searched for putative orthologs of SERKs in the Plant Plaza 5.0 database (Van Bel et al. 2022), and found that SERKs is also absent in oil palm (Elaeis guineensis).
More intriguingly, in our study on the evolution of Nelumbo species within the last ∼6.5 million years (see http://www.timetree.org), we have uncovered a distinct evolutionary trajectory for homoeologs characterized by balanced vs. biased expression. Our observations reveal that homoeologs with balanced expression between N. nucifera and N. lutea exhibit fewer cis- and trans-regulatory mutations. Additionally, homoeologs showing balanced expression demonstrate a reduced incidence of premature stop codon mutations within populations, compared with their biased expression counterparts. This finding highlights the substantial impact of expression balance on the evolutionary dynamics of gene variants at the population level in Nelumbo species, suggesting ongoing escalation of duplicate divergence.
For duplicate genes, the lesser expressed copy often culminates in the nonfunctionalization due to relaxed selective pressures, leading to the accumulation of deleterious mutations and eventual loss of function (Innan and Kondrashov 2010). In the current study, we discovered that gene copies with higher expression levels exhibit broader tissue expression, longer protein structures, and notably, a lower dN/dS ratio, indicating stronger purifying selection. This unique selection pressure is further corroborated by microevolutionary patterns observed in Nelumbo species divergence. Specifically, in the divergence between N. lutea and N. nucifera, we noted a reduced magnitude of both cis- and trans-regulatory mutations and a lower incidence of premature stop codon mutations in the higher expression gene copies. This suggests an ongoing process of gene loss in the copies with lower expression. Such findings of distinct patterns observed between higher and lower expression homoeologous copies provide a nuanced understanding of how expression levels dictate the occurrence of regulatory variation and the evolutionary fate of gene duplicates in plant genomes, mirroring the conservation and functional importance seen in the case study of Bsister gene homoeologs in crucifers (Hoffmeier et al. 2018). A recent study on Drosophila and human genomes suggests that “complete” duplicate genes, which maintain all exons and introns, are subject to dosage constraints due to protein stoichiometry, thereby reinforcing the correlation between a greater protein length in highly expressed copies in our study (Zhang et al. 2022a). Also, our observation of distinctiveness between copies is in accordance with the general assumption that slower evolving genes are more conserved and often exhibit higher expression levels and greater functional importance (Pal et al. 2001; Conant and Wagner 2003; Jordan et al. 2005; Holland et al. 2017). Notably, a typical example demonstrated that while ABS-like genes, a clade of Bsister genes, are highly conserved in crucifers and maintain their ancestral function in ovule and seed development, their closest paralogs from core eudicot γ triplication, the GOA-like genes, are experiencing convergent downregulation or gene death in Brassicaceae (Hoffmeier et al. 2018). Thus, the trend toward nonfunctionalization of the lesser expressed gene copies suggests a predominant evolutionary strategy where plants retain only the necessary gene functions for survival, shedding redundant copies. This process substantially influences the genomic architecture and functional repertoire of plant species, as observed in multiple recent genomic studies (Carretero-Paulet and Van de Peer 2020; Zhong et al. 2022; Gout et al. 2023). It is important to acknowledge that while there is a noticeable trend of unequal fates for copies with lower expression, their eventual loss is not inevitable. In some cases, these copies may persist for extended periods if they acquire new functions or regulatory mechanisms (neofunctionalization), or partitioning ancestral functions or expression (subfunctionalization) as exemplified by investigations in Arabidopsis (Panchy et al. 2019; Coate et al. 2020; Jonas et al. 2022). Overall, such insights are indispensable for unraveling the complex evolutionary processes shaping plant genomes, furthering our understanding of plant biodiversity and adaptation strategies.
Materials and methods
Detection of homoeologs
To elucidate the expression level difference and dosage-balance constraints between duplicate gene pairs originating from WGDs, i.e. homoeologs, we curated a comprehensive dataset of WGD-derived duplicate gene pairs (Tiley et al. 2016). This dataset encompasses genes from water lily (Ny. colorata; Zhang et al. 2019), sacred lotus (N. nucifera; Ming et al. 2013; Shi et al. 2020), and grassy-leaved sweet flag (A. tatarinowii; Shi and Chen 2020; Shi et al. 2022). The identification of these gene pairs was conducted using MCSCanX with parameter settings of “-s 6” (minimum of 6 anchor genes per block; Tang et al. 2008; Wang et al. 2012). Furthermore, we utilized the “detect_collinear_tandem_arrays” function in MCSCanX to identify anchor gene pairs associated with tandem arrays. In cases where 2 or more anchor gene pairs are associated with the same tandem array, we only kept the anchor gene pair with the lowest e-value. To advance our understanding of the post-WGD evolutionary trajectory of these homoeologs, we established an approach where we compared, for each species, the duplicate pairs with an outgroup. This entailed the construction of syntenic blocks between N. nucifera and macadamia nut (Macadamia integrifolia; belonging to Proteales), A. tatarinowii and date palm (P. dactylifera) (belonging to monocots), and Ny. colorata and white veined hardy Dutchman's pipe (Aristolochia fimbriata; belonging to ANITA grade). The identification of syntenic putative orthologs within these blocks was executed utilizing MCScanX (Tang et al. 2008; Wang et al. 2012). In instances where duplicates lacked a syntenic putative ortholog in the outgroup species, putative ortholog identification was achieved via a comprehensive analysis of potential protein sequences using OrthoFinder (v2.3.3) with default settings (Emms and Kelly 2019). Following the integration of data from both OrthoFinder and MCScanX, we successfully compiled a dataset of high-confidence gene triplets, each consisting of duplicate gene pairs and their corresponding outgroup orthologs.
Quantification and tests of relative expression difference of duplicate pairs from a WGD
We established gene expression profiles for multiple tissues in Nymphaea (Zhang et al. 2019), Nelumbo (Li et al. 2021a, 2021b; Zhang et al. 2022b; Gao et al. 2023), and Acorus (Shi et al. 2022; Ma et al. 2023), utilizing RNA-seq datasets in the cited manuscripts (Supplementary Table S3). For each species, we processed the RNA-seq reads by mapping them to their respective reference genomes using Hisat2 (v2.1.0; Pertea et al. 2016). The resulting SAM files were then sorted, converted to BAM format, indexed, and further underwent PCR duplicate marking by using Samtools (v0.1.19) and Picard (version 2.0.1). We quantified gene expression levels, denoted as FPKMs, employing StringTie (v1.3.5) with default parameter settings (Pertea et al. 2016). For each pair of homoeologs, we determined the relative expression differences between the 2 copies. This was accomplished by calculating the absolute normalized difference in expression for each tissue type, a metric we refer to as RFPKM. This approach follows the principles outlined in previous studies (Conant and Wagner 2003; Cusack and Wolfe 2006):
To assess the consistency of RFPKM values of duplicate genes across different tissues, we conducted Pearson correlation tests (Supplementary Table S4). These tests compared tissue-specific RFPKM values within the same species using R (v3.5.1; https://www.r-project.org/). For each duplicate pair, we averaged the RFPKM values across all tissues for subsequent analysis. Additionally, we explored the relationship between the average RFPKM of homoeologs and various gene characteristics via regression analyses using R (v3.5.1). We compared linear and log-transformed regressions based on their AIC values in R (v3.5.1) to determine the best fit for gene characteristics that may not exhibit a linear relationship with RFPKM. We chose log regression when it yielded a lower AIC. These characteristics include tissue specificity (τ) of gene expression, protein and CDS lengths, number of exons, number of Pfam domains, number of PPIs (Yilmaz et al. 2022), and lethal-phenotype scores (Lloyd et al. 2015) derived from Arabidopsis (Ar. thaliana) putative orthologs. We also considered nonsynonymous divergence (dN), synonymous divergence (dS), and the dN/dS ratio (ω). The measurements for protein and CDS lengths, as well as the number of exons, were extracted directly from each species’ genome annotation file. The Pfam domain count per gene was based on annotations via emapper-2.1.12 (Cantalapiedra et al. 2021). The τ index, a benchmark of gene expression tissue-specificity metrics, for each gene was calculated using log-transformed FPKM values from different tissues (Kryuchkova-Mostacci and Robinson-Rechavi 2016; Shi et al. 2020; Gao et al. 2023). The computations of dN, dS, and ω were performed using the codeml program within the PAML4 package, following a triplet tree topology of “((copy1, copy2), outgroup)” (Yang 2007). To assess the possible effects of incomplete genome assembly or annotation on our regression analyses, we performed extra correlation tests using Nelumbo. These tests involved simulating different levels of incompleteness by randomly removing 2.5%, 5%, 10%, 20%, and 40% of Nelumbo's total homoeologs from our dataset through “sample()” function in R (v3.5.1). To investigate the variation of average RFPKM values among different GO slim categories, we categorized different homoeologs into TAIR GO slim categories (TAIR_GO_slim_categories.txt from https://www.arabidopsis.org). This categorization was based on the GO annotations of genes obtained via emapper-2.1.12 (Cantalapiedra et al. 2021). We then visualized the distribution of average RFPKM values across these GO slim categories using violin plots, created with Graphpad Prism 9.0.
Assessing dosage sensitivity by copy number change
Putatively orthologous groups (OGs) from 25 representative angiosperm species, including Nymphaea, Nelumbo, and Acorus, were identified using OrthoFinder (v2.3.3) with default settings (Emms and Kelly 2019; see Supplementary Table S7). The expected gene copy number in these lineages, accounting for their respective historical WGDs or WGMs, was determined based on existing literature (Supplementary Table S7). We quantified the relative dosage sensitivity of a putatively OG, referred to as rcopy number, by determining the (Pearson) correlation coefficient. This coefficient was calculated between the observed and expected copy numbers of genes following WGDs or WGMs (see Supplementary Table S7). To further validate the relative dosage sensitivity of different OGs, we calculated the Krylov–Wolf–Rogozin–Koonin PGL for each OG by using the COUNT software (Csűös 2010). Additionally, we conducted an analysis to explore the relationship between the average RFPKM of homoeologs and their PGL. This was achieved by calculating the Pearson correlation between these 2 parameters. The calculations and analyses were performed using R (v3.5.1).
Regulatory changes and premature stop codon mutations associated with different homoeologs
We sought to determine if there is a positive correlation between the RFPKM of homoeologs and the magnitude of cis- and trans-regulatory variations between N. nucifera and N. lutea with Pearson correlation tests in R (v3.5.1). To do this, we compared 3 key values—absolute A, B, and |A − B|. Here, “A” denotes the parental expression difference, “B” represents the cis-regulatory difference, and “|A − B|” indicates the trans-regulatory difference. These definitions and values were based on our previous research focused on the divergence in petal color between N. nucifera and N. lutea across 4 developmental stages (Gao et al. 2023). Furthermore, we investigated whether different homoeologs with varying RFPKM values exhibit distinct frequencies of premature stop codon mutations. This was carried out by analyzing SNP annotations in Nelumbo populations. These annotations were obtained using SnpEff (Version 4.3) and were extracted from our prior studies (Huang et al. 2018; Li et al. 2021b). Further, for each species, we categorized homoeologs within each tissue into 2 groups based on their expression levels: low- and high-expression copies. We then compared the evolutionary trajectories between these 2 groups through either paired t-test or χ2 test via Graphpad Prism 9.0. This comparison encompassed a range of gene characteristics, including synonymous (dN) and nonsynonymous (dS) substitutions, dN/dS ratios (ω), the number of exons, CDS lengths, protein lengths, and Pfam domains. Additionally, we examined the magnitude of cis- and trans-regulatory variations, as well as the frequency of premature stop codon mutations among these groups.
Supplementary Material
Acknowledgments
The authors thank Prof. Jia Li and Prof. Yang Ou from Lanzhou University for their discussion on BAK1 and other LRR-RLKs.
Contributor Information
Tao Shi, Aquatic Plant Research Center, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074, China.
Zhiyan Gao, Aquatic Plant Research Center, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074, China.
Jinming Chen, Aquatic Plant Research Center, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074, China.
Yves Van de Peer, Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent 9052, Belgium; Centre for Plant Systems Biology, VIB, Ghent 9052, Belgium; Department of Biochemistry, Genetics and Microbiology, University of Pretoria, Pretoria 0028, South Africa; College of Horticulture, Academy for Advanced Interdisciplinary Studies, Nanjing Agricultural University, Nanjing 210095, China.
Author contributions
T.S. and Y.V.d.P. designed the research; T.S. and Z.G. performed the research and analyzed the data; T.S. and Z.G. wrote the paper; Y.V.d.P. and J.C. revised the paper.
Supplementary data
The following materials are available in the online version of this article.
Supplementary Figure S1. Pearson correlations of relative expression differences (RFPKM) of WGD duplicates between different tissues.
Supplementary Figure S2. Pearson correlations between average relative expression differences (RFPKM) of WGD duplicates and gene features in Nelumbo.
Supplementary Figure S3. Pearson correlations between average relative expression differences (RFPKM) of WGD duplicates and gene features in Nymphaea.
Supplementary Figure S4. Pearson correlations between average relative expression differences (RFPKM) of WGD duplicates and gene features in Acorus.
Supplementary Figure S5. Pearson correlations of the correlation coefficients (r) of average relative expression difference (RFPKM) and gene features among 3 species.
Supplementary Figure S6. Violin plot showing how the relative expression difference (RFPKM) for Nymphaea homoeologs varies among duplicate genes belonging to different GO slim categories.
Supplementary Figure S7. Violin plot showing how the relative expression difference (RFPKM) of Acorus homoeologs varies among duplicate genes belonging to different GO slim categories.
Supplementary Figure S8. Pearson correlations between average relative expression differences (RFPKM) of WGD duplicates and propensity of gene loss.
Supplementary Figure S9. Distributions of the proportion of genes with premature stop codon mutations in Nelumbo populations according to relative expression difference between the 2 copies (RFPKM).
Supplementary Figure S10. Differences between homoeologs with higher and lower gene expression in immature stamen of Nelumbo.
Supplementary Figure S11 . Differences between homoeologs with higher and lower gene expression in juvenile leaf of Nymphaea.
Supplementary Figure S12 . Differences between homoeologs with higher and lower gene expression in young leaf of Acorus.
Supplementary Figure S13 . Differences in premature stop codon mutations in Nelumbo populations between homoeologs with higher expression and lower expression in 19 different tissues of Nelumbo.
Supplementary Table S1. The genome assembly and annotation information of Nymphaea, Nelumbo, and Acorus.
Supplementary Table S2. Summary of homoeolog pairs with putative orthologs (OGs) and OGs retaining WGD duplicates between Nymphaea, Nelumbo, and Acorus.
Supplementary Table S3 . The RNA-seq information of different tissues from Nelumbo, Nymphaea, and Acorus.
Supplementary Table S4. The Pearson correlations of RFPKM between tissues in each species.
Supplementary Table S5 . The Pearson correlations between RFPKM in different tissues and gene features of the 3 species.
Supplementary Table S6. Pearson correlations between relative expression difference of homoeologs (average RFPKM) and their gene features.
Supplementary Table S7. The expected copy number of putatively orthologous genes in 25 species after WGDs or WGMs, compared with Amborella (a species without WGD during angiosperm diversification).
Supplementary Table S8 . The Pearson correlations between RFPKM and magnitude cis- and trans-regulatory change between N. nucifera and N. lutea.
Supplementary Table S9 . Comparison of the Pearson correlations between RFPKM in different tissues and gene features of the 3 species among no deletion and random deletion of 2.5%, 5%, 10%, 20%, and 40% Nelumbo homoeologs.
Supplementary Table S10. Differences in gene features between homoeologous copies with higher expression and lower expression in different tissues from 3 species via paired t-tests.
Supplementary Table S11. Differences in cis- and trans-regulatory change magnitude between homoeologs with higher expression and lower expression in 19 different tissues of Nelumbo via paired t-tests.
Funding
T.S. acknowledges support by grant from the National Natural Science Foundation of China (no. 32170240). Z.G. acknowledges support by grant from the Natural Science Foundation of Hubei Province of China (no. 2024AFB314). Y.V.d.P. acknowledges support by the European Research Council under the European Union's Horizon 2020 Research and Innovation program (no. 833522) and from Ghent University (Methusalem funding, BOF.MET.2021.0005.01).
Data availability
Data about Nelumbo, Acorus, and Nymphaea analyzed in this study are public and cited in the manuscript.
References
- Amborella Genome Project . The Amborella genome and the evolution of flowering plants. Science. 2013:342(6165):1241089. 10.1126/science.1241089 [DOI] [PubMed] [Google Scholar]
- Birchler JA, Veitia RA. The gene balance hypothesis: from classical genetics to modern genomics. Plant Cell. 2007:19(2):395–402. 10.1105/tpc.106.049338 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Birchler JA, Veitia RA. Gene balance hypothesis: connecting issues of dosage sensitivity across biological disciplines. Proc Natl Acad Sci U S A. 2012:109(37):14746–14753. 10.1073/pnas.1207726109 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Birchler JA, Veitia RA. One hundred years of gene balance: how stoichiometric issues affect gene expression, genome evolution, and quantitative traits. Cytogenet Genome Res. 2021:161(10–11):529–550. 10.1159/000519592 [DOI] [PubMed] [Google Scholar]
- Birchler JA, Yang H. The multiple fates of gene duplications: deletion, hypofunctionalization, subfunctionalization, neofunctionalization, dosage balance constraints, and neutral variation. Plant Cell. 2022:34(7):2466–2474. 10.1093/plcell/koac076 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bird KA, Niederhuth CE, Ou S, Gehan M, Pires JC, Xiong Z, VanBuren R, Edger PP. Replaying the evolutionary tape to investigate subgenome dominance in allopolyploid Brassica napus. New Phytol. 2021:230(1):354–371. 10.1111/nph.17137. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bowers JE, Tang H, Burke JM, Paterson AH. GC content of plant genes is linked to past gene duplications. PLoS One. 2022:17(1):e0261748. 10.1371/journal.pone.0261748 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cantalapiedra CP, Hernández-Plaza A, Letunic I, Bork P, Huerta-Cepas J, Tamura K. eggNOG-mapper v2: functional annotation, orthology assignments, and domain prediction at the metagenomic scale. Mol Biol Evol. 2021:38(12):5825–5829. 10.1093/molbev/msab293 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cardoso-Moreira M, Arguello JR, Gottipati S, Harshman LG, Grenier JK, Clark AG. Evidence for the fixation of gene duplications by positive selection in Drosophila. Genome Res. 2016:26(6):787–798. 10.1101/gr.199323.115 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Carels N, Bernardi G. Two classes of genes in plants. Genetics. 2000:154(4):1819–1825. 10.1093/genetics/154.4.1819 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Carretero-Paulet L, Van de Peer Y. The evolutionary conundrum of whole-genome duplication. Am J Bot. 2020:107(8):1101–1105. 10.1002/ajb2.1520 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Casneuf T, De Bodt S, Raes J, Maere S, Van de Peer Y. Nonrandom divergence of gene expression following gene and genome duplications in the flowering plant Arabidopsis thaliana. Genome Biol. 2006:7(2):R13. 10.1186/gb-2006-7-2-r13 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cheng F, Sun R, Hou X, Zheng H, Zhang F, Zhang Y, Liu B, Liang J, Zhuang M, Liu Y, et al. Subgenome parallel selection is associated with morphotype diversification and convergent crop domestication in Brassica rapa and Brassica oleracea. Nat Genet. 2016:48(10):1218–1224. 10.1038/ng.3634 [DOI] [PubMed] [Google Scholar]
- Cheng F, Wu J, Cai X, Liang J, Freeling M, Wang X. Gene retention, fractionation and subgenome differences in polyploid plants. Nat Plants. 2018:4(5):258–268. 10.1038/s41477-018-0136-7 [DOI] [PubMed] [Google Scholar]
- Coate JE, Farmer AD, Schiefelbein JW, Doyle JJ. Expression partitioning of duplicate genes at single cell resolution in Arabidopsis roots. Front Genet. 2020:11:596150. 10.3389/fgene.2020.596150 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Coate JE, Song MJ, Bombarely A, Doyle JJ. Expression-level support for gene dosage sensitivity in three Glycine subgenus Glycine polyploids and their diploid progenitors. New Phytol. 2016:212(4):1083–1093. 10.1111/nph.14090 [DOI] [PubMed] [Google Scholar]
- Conant GC, Wagner A. Asymmetric sequence divergence of duplicate genes. Genome Res. 2003:13(9):2052–2058. 10.1101/gr.1252603 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Conant GC, Wolfe KH. Turning a hobby into a job: how duplicated genes find new functions. Nat Rev Genet. 2008:9(12):938–950. 10.1038/nrg2482 [DOI] [PubMed] [Google Scholar]
- Csűös M. Count: evolutionary analysis of phylogenetic profiles with parsimony and likelihood. Bioinformatics. 2010:26(15):1910–1912. 10.1093/bioinformatics/btq315 [DOI] [PubMed] [Google Scholar]
- Cusack BP, Wolfe KH. Not born equal: increased rate asymmetry in relocated and retrotransposed rodent gene duplicates. Mol Biol Evol. 2006:24(3):679–686. 10.1093/molbev/msl199 [DOI] [PubMed] [Google Scholar]
- Ebadi M, Bafort Q, Mizrachi E, Audenaert P, Simoens P, Van Montagu M, Bonte D, Van de Peer Y. The duplication of genomes and genetic networks and its potential for evolutionary adaptation and survival during environmental turmoil. Proc Natl Acad Sci U S A. 2023:120(41):e2307289120. 10.1073/pnas.2307289120 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Emms DM, Kelly S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol. 2019:20(1):238. 10.1186/s13059-019-1832-y [DOI] [PMC free article] [PubMed] [Google Scholar]
- Force A, Lynch M, Pickett FB, Amores A, Yan Y-l, Postlethwait J. Preservation of duplicate genes by complementary, degenerative mutations. Genetics. 1999:151(4):1531–1545. 10.1093/genetics/151.4.1531 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fox DT, Soltis DE, Soltis PS, Ashman TL, Van de Peer Y. Polyploidy: a biological force from cells to ecosystems. Trends Cell Biol. 2020:30(9):688–694. 10.1016/j.tcb.2020.06.006 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ganko EW, Meyers BC, Vision TJ. Divergence in expression between duplicated genes in Arabidopsis. Mol Biol Evol. 2007:24(10):2298–2309. 10.1093/molbev/msm158 [DOI] [PubMed] [Google Scholar]
- Gao Z, Yang X, Chen J, Rausher MD, Shi T. Expression inheritance and constraints on cis- and trans-regulatory mutations underlying lotus color variation. Plant Physiol. 2023:191(3):1662–1683. 10.1093/plphys/kiac522 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Garcia-Lozano M, Natarajan P, Levi A, Katam R, Lopez-Ortiz C, Nimmakayala P, Reddy UK. Altered chromatin conformation and transcriptional regulation in watermelon following genome doubling. Plant J. 2021:106(3):588–600. 10.1111/tpj.15256 [DOI] [PubMed] [Google Scholar]
- Geiser C, Mandakova T, Arrigo N, Lysak MA, Parisod C. Repeated whole-genome duplication, karyotype reshuffling, and biased retention of stress-responding genes in buckler mustard. Plant Cell. 2016:28(1):17–27. 10.1105/tpc.15.00791 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gou XP, Yin HJ, He K, Du JB, Yi J, Xu SB, Lin HH, Clouse SD, Li J. Genetic evidence for an indispensable role of somatic embryogenesis receptor kinases in brassinosteroid signaling. PLoS Genet. 2012:8(1):e1002452. 10.1371/journal.pgen.1002452 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gout J-FHao Y, Johri P, Arnaiz O, Doak TG, Bhullar S, Couloux A, Guérin F, Malinsky S, Potekhin A, et al. R. Dynamics of gene loss following ancient whole-genome duplication in the CrypticParameciumComplex. Mol Biol Evol. 2023:40(5):msad107. 10.1093/molbev/msad107 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hoffmeier A, Gramzow L, Bhide AS, Kottenhagen N, Greifenstein A, Schubert O, Mummenhoff K, Becker A, Theißen G, Innan H. A dead gene walking: convergent degeneration of a clade of MADS-Box genes in Crucifers. Mol Biol Evol. 2018:35(11):2618–2638. 10.1093/molbev/msy142 [DOI] [PubMed] [Google Scholar]
- Holland PW, Marletaz F, Maeso I, Dunwell TL, Paps J. New genes from old: asymmetric divergence of gene duplicates and the evolution of development. Philos Trans R Soc B Biol Sci. 2017:372(1713):20150480. 10.1098/rstb.2015.0480 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Huang L, Yang M, Li L, Li H, Yang D, Shi T, Yang P. Whole genome re-sequencing reveals evolutionary patterns of sacred lotus (Nelumbo nucifera). J Int Plant Biol. 2018:60(1):2–15. 10.1111/jipb.12606 [DOI] [PubMed] [Google Scholar]
- Innan H, Kondrashov F. The evolution of gene duplications: classifying and distinguishing between models. Nat Rev Genet. 2010:11(2):97–108. 10.1038/nrg2689 [DOI] [PubMed] [Google Scholar]
- Inoue J, Sato Y, Sinclair R, Tsukamoto K, Nishida M. Rapid genome reshaping by multiple-gene loss after whole-genome duplication in teleost fish suggested by mathematical modeling. Proc Natl Acad Sci U S A. 2015:112(48):14918–14923. 10.1073/pnas.1507669112 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jiao Y, Leebens-Mack J, Ayyampalayam S, Bowers JE, McKain MR, McNeal J, Rolf M, Ruzicka DR, Wafula E, Wickett NJ, et al. A genome triplication associated with early diversification of the core eudicots. Genome Biol. 2012:13(1):R3. 10.1186/gb-2012-13-1-r3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jiao Y, Wickett NJ, Ayyampalayam S, Chanderbali AS, Landherr L, Ralph PE, Tomsho LP, Hu Y, Liang H, Soltis PS, et al. Ancestral polyploidy in seed plants and angiosperms. Nature. 2011:473(7345):97–100. 10.1038/nature09916 [DOI] [PubMed] [Google Scholar]
- Jonas F, Gera T, More R, Barkai N. Evolution of binding preferences among whole-genome duplicated transcription factors. eLife. 2022:11:e73225. 10.7554/eLife.73225 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jordan IK, Marino-Ramirez L, Koonin EV. Evolutionary significance of gene expression divergence. Gene. 2005:345(1):119–126. 10.1016/j.gene.2004.11.034 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kondrashov FA, Rogozin IB, Wolf YI, Koonin EV. Selection in the evolution of gene duplications. Genome Biol. 2002:3(2):RESEARCH0008. 10.1186/gb-2002-3-2-research0008 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kryuchkova-Mostacci N, Robinson-Rechavi M. A benchmark of gene expression tissue-specificity metrics. Brief Bioinform. 2016:18:205–214. 10.1093/bib/bbw008 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kuzmin E, VanderSluis B, Nguyen Ba AN, Wang W, Koch EN, Usaj M, Khmelinskii A, Usaj MM, van Leeuwen J, Kraus O, et al. Exploring whole-genome duplicate gene retention with complex genetic interaction analysis. Science. 2020:368(6498):eaaz5667. 10.1126/science.aaz5667 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li Z, Defoort J, Tasdighian S, Maere S, Van de Peer Y, De Smet R. Gene duplicability of core genes is highly consistent across all angiosperms. Plant Cell. 2016:28(2):326–344. 10.1105/tpc.15.00877 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li J-T, Hou G-Y, Kong X-F, Li C-Y, Zeng J-M, Li H-D, Xiao G-B, Li X-M, Sun X-W. The fate of recent duplicated genes following a fourth-round whole genome duplication in a tetraploid fish, common carp (Cyprinus carpio). Sci Rep. 2015:5(1):8199. 10.1038/srep08199 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li Z, Li M, Wang J. Asymmetric subgenomic chromatin architecture impacts on gene expression in resynthesized and natural allopolyploid Brassica napus. Commun Biol. 2022:5(1):762. 10.1038/s42003-022-03729-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li J, Wen J, Lease KA, Doke JT, Tax FE, Walker JC. BAK1, an Arabidopsis LRR receptor-like protein kinase, interacts with BRI1 and modulates brassinosteroid signaling. Cell. 2002:110(2):213–222. 10.1016/S0092-8674(02)00812-7 [DOI] [PubMed] [Google Scholar]
- Li W-H, Yang J, Gu X. Expression divergence between duplicate genes. Trends Genet. 2005:21(11):602–607. 10.1016/j.tig.2005.08.006 [DOI] [PubMed] [Google Scholar]
- Li H, Yang X, Wang Q, Chen J, Shi T. Distinct methylome patterns contribute to ecotypic differentiation in the growth of the storage organ of a flowering plant (sacred lotus). Mol Ecol. 2021a:30(12):2831–2845. 10.1111/mec.15933 [DOI] [PubMed] [Google Scholar]
- Li H, Yang X, Zhang Y, Gao Z, Liang Y, Chen J, Shi T. Nelumbo genome database, an integrative resource for gene expression and variants of Nelumbo nucifera. Sci Data. 2021b:8(1):38. 10.1038/s41597-021-00828-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liang Z, Schnable JC. Functional divergence between subgenomes and gene pairs after whole genome duplications. Mol Plant. 2018:11(3):388–397. 10.1016/j.molp.2017.12.010 [DOI] [PubMed] [Google Scholar]
- Liao BY, Scott NM, Zhang J. Impacts of gene essentiality, expression pattern, and gene compactness on the evolutionary rate of mammalian proteins. Mol Biol Evol. 2006:23(11):2072–2080. 10.1093/molbev/msl076 [DOI] [PubMed] [Google Scholar]
- Liu M, Grigoriev A. Protein domains correlate strongly with exons in multiple eukaryotic genomes–evidence of exon shuffling? Trends Genet. 2004:20(9):399–403. 10.1016/j.tig.2004.06.013 [DOI] [PubMed] [Google Scholar]
- Liu J, Li J, Shan L. SERKs. Curr Biol. 2020:30(7):R293–R294. 10.1016/j.cub.2020.01.043 [DOI] [PubMed] [Google Scholar]
- Lloyd JP, Seddon AE, Moghe GD, Simenc MC, Shiu S-H. Characteristics of plant essentialgenes allow for within- and between-species prediction of lethal mutant phenotypes. Plant Cell. 2015:27(8):2133–2147. 10.1105/tpc.15.00051 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lynch M, Conery JS. The evolutionary fate and consequences of duplicate genes. Science. 2000:290(5494):1151–1155. 10.1126/science.290.5494.1151 [DOI] [PubMed] [Google Scholar]
- Lynch M, Force AG. The origin of interspecific genomic incompatibility via gene duplication. Am Nat. 2000:156(6):590–605. 10.1086/316992 [DOI] [PubMed] [Google Scholar]
- Ma L, Liu K-W, Li Z, Hsiao Y-Y, Qi Y, Fu T, Tang G-D, Zhang D, Sun W-H, Liu D-K, et al. Diploid and tetraploid genomes of Acorus and the evolution of monocots. Nat Commun. 2023:14(1):3661. 10.1038/s41467-023-38829-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Maere S, De Bodt S, Raes J, Casneuf T, Van Montagu M, Kuiper M, Van de Peer Y. Modeling gene and genome duplications in eukaryotes. Proc Natl Acad Sci U S A. 2005:102(15):5454–5459. 10.1073/pnas.0501102102 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Makino T, McLysaght A. Positionally biased gene loss after whole genome duplication: evidence from human, yeast, and plant. Genome Res. 2012:22(12):2427–2435. 10.1101/gr.131953.111 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Martín-Dacal M, Fernández-Calvo P, Jiménez-Sandoval P, López G, Garrido-Arandía M, Rebaque D, del Hierro I, Berlanga DJ, Torres MÁ, Kumar V, et al. Arabidopsis immune responses triggered by cellulose- and mixed-linked glucan-derived oligosaccharides require a group of leucine-rich repeat malectinreceptor kinases. Plant J. 2023:113(4):833–850. 10.1111/tpj.16088 [DOI] [PubMed] [Google Scholar]
- Ming R, VanBuren R, Liu Y, Yang M, Han Y, Li LT, Zhang Q, Kim MJ, Schatz MC, Campbell M, et al. Genome of the long-living sacred lotus (Nelumbo nucifera Gaertn). Genome Biol. 2013:14(5):R41. 10.1186/gb-2013-14-5-r41 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ohno S. Evolution by Gene Duplication. Springer-Verlag, New York; 1970. [Google Scholar]
- Ou Y, Tao B, Wu Y, Cai Z, Li H, Li M, He K, Gou X, Li J. Essential roles of SERKs in the ROOT MERISTEM GROWTH FACTOR-mediated signaling pathway. Plant Physiol. 2022:189(1):165–177. 10.1093/plphys/kiac036 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pal C, Papp B, Hurst LD. Does the recombination rate affect the efficiency of purifying selection? The yeast genome provides a partial answer. Mol Biol Evol. 2001:18(12):2323–2326. 10.1093/oxfordjournals.molbev.a003779 [DOI] [PubMed] [Google Scholar]
- Panchy NL, Azodi CB, Winship EF, O’Malley RC, Shiu S-H. Expression and regulatory asymmetry of retained Arabidopsis thaliana transcription factor genes derived from whole genome duplication. BMC Evol Biol. 2019:19(1):77. 10.1186/s12862-019-1398-z [DOI] [PMC free article] [PubMed] [Google Scholar]
- Papp B, Pál C, Hurst LD. Dosage sensitivity and the evolution of gene families in yeast. Nature. 2003:424(6945):194–197. 10.1038/nature01771 [DOI] [PubMed] [Google Scholar]
- Pertea M, Kim D, Pertea GM, Leek JT, Salzberg SL. Transcript-level expression analysis of RNA-Seq experiments with HISAT, StringTie and Ballgown. Nat Protoc. 2016:11(9):1650–1667. 10.1038/nprot.2016.095 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Qin L, Hu Y, Wang J, Wang X, Zhao R, Shan H, Li K, Xu P, Wu H, Yan X, et al. Insights into angiosperm evolution, floral development and chemical biosynthesis from the Aristolochia fimbriata genome. Nat Plants. 2021:7(9):1239–1253. 10.1038/s41477-021-00990-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Roman-Palacios C, Molina-Henao YF, Barker MS. Polyploids increase overall diversity despite higher turnover than diploids in the Brassicaceae. Proc R Soc B Biol Sci. 2020:287(1934):20200962. 10.1098/rspb.2020.0962 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Roux J, Liu J, Robinson-Rechavi M. Selective constraints on coding sequences of nervous system genes are a major determinant of duplicate gene retention in vertebrates. Mol BiolEvol. 2017:34(11):2773–2791. 10.1093/molbev/msx199 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ruprecht C, Lohaus R, Vanneste K, Mutwil M, Nikoloski Z, Van de Peer Y, Persson S. Revisiting ancestral polyploidy in plants. Sci Adv. 2017:3(7):e1603195. 10.1126/sciadv.1603195 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shi T, Chen J. A reappraisal of the phylogenetic placement of the Aquilegia whole-genome duplication. Genome Biol. 2020:21(1):295. 10.1186/s13059-020-02212-y [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shi T, Huneau C, Zhang Y, Li Y, Chen J, Salse J, Wang Q. The slow-evolving Acorus tatarinowii genome sheds light on ancestral monocot evolution. Nat Plants. 2022:8(7):764–777. 10.1038/s41477-022-01187-x [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shi T, Rahmani RS, Gugger PF, Wang M, Li H, Zhang Y, Li Z, Wang Q, Van de Peer Y, Marchal K, et al. Distinct expression and methylation patterns for genes with different fates following a single whole-genome duplication in flowering plants. Mol Biol Evol. 2020:37(8):2394–2413. 10.1093/molbev/msaa105 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Song MJ, Potter BI, Doyle JJ, Coate JE. Gene balance predicts transcriptional responses immediately following ploidy change in Arabidopsis thaliana. Plant Cell. 2020:32(5):1434–1448. 10.1105/tpc.19.00832 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tang H, Bowers JE, Wang X, Ming R, Alam M, Paterson AH. Synteny and collinearity in plant genomes. Science. 2008:320(5875):486–488. 10.1126/science.1153917 [DOI] [PubMed] [Google Scholar]
- Tasdighian S, Van Bel M, Li Z, Van de Peer Y, Carretero-Paulet L, Maere S. Reciprocally retained genes in the angiosperm lineage show the hallmarks of dosage balance sensitivity. Plant Cell. 2017:29(11):2766–2785. 10.1105/tpc.17.00313 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tiley GP, Ane C, Burleigh JG. Evaluating and characterizing ancient whole-genome duplications in plants with gene count data. Genome Biol Evol. 2016:8(4):1023–1037. 10.1093/gbe/evw058 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Van Bel M, Silvestri F, Weitz EM, Kreft L, Botzki A, Coppens F, Vandepoele K. PLAZA 5.0: extending the scope and power of comparative and functional genomics in plants. Nucl Acids Res. 2022:50(D1):D1468–D1474. 10.1093/nar/gkab1024 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Van de Peer Y, Ashman TL, Soltis PS, Soltis DE. Polyploidy: an evolutionary and ecological force in stressful times. Plant Cell. 2021:33(1):11–26. 10.1093/plcell/koaa015 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Van de Peer Y, Maere S, Meyer A. The evolutionary significance of ancient genome duplications. Nat Rev Genet. 2009:10(10):725–732. 10.1038/nrg2600 [DOI] [PubMed] [Google Scholar]
- Van de Peer Y, Mizrachi E, Marchal K. The evolutionary significance of polyploidy. Nat Rev Genet. 2017:18(7):411–424. 10.1038/nrg.2017.26 [DOI] [PubMed] [Google Scholar]
- Wang Y, Tang H, Debarry JD, Tan X, Li J, Wang X, Lee TH, Jin H, Marler B, Guo H, et al. MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucl Acids Res. 2012:40(7):e49. 10.1093/nar/gkr1293 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wu S, Han B, Jiao Y. Genetic contribution of paleopolyploidy to adaptive evolution in angiosperms. Mol Plant. 2020:13(1):59–71. 10.1016/j.molp.2019.10.012 [DOI] [PubMed] [Google Scholar]
- Yang Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol. 2007:24(8):1586–1591. 10.1093/molbev/msm088 [DOI] [PubMed] [Google Scholar]
- Yilmaz M, Paulic M, Seidel T. Interactome of Arabidopsis thaliana. Plants. 2022:11(3):350. 10.3390/plants11030350 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang J. Evolution by gene duplication: an update. Trends Ecol Evol. 2003:18(6):292–298. 10.1016/S0169-5347(03)00033-8 [DOI] [Google Scholar]
- Zhang L, Chen F, Zhang X, Li Z, Zhao Y, Lohaus R, Chang X, Dong W, Ho SYW, Liu X, et al. The water lily genome and the early evolution of flowering plants. Nature. 2019:577(7788):79–84. 10.1038/s41586-019-1852-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang D, Leng L, Chen C, Huang J, Zhang Y, Yuan H, Ma C, Chen H, Zhang YE. Dosage sensitivity and exon shuffling shape the landscape of polymorphic duplicates in Drosophila and humans. Nat Ecol Evol. 2022a:6(3):273–287. 10.1038/s41559-021-01614-w [DOI] [PubMed] [Google Scholar]
- Zhang J, Yang J-R. Determinants of the rate of protein sequence evolution. Nat Rev Genet. 2015:16(7):409–420. 10.1038/nrg3950 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang Y, Yang X, Van de Peer Y, Chen J, Marchal K, Shi T. Evolution of isoform-level gene expression patterns across tissues during lotus species divergence. Plant J. 2022b:112(3):830–846. 10.1111/tpj.15984 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhao M, Zhang B, Lisch D, Ma J. Patterns and consequences of subgenome differentiation provide insights into the nature of paleopolyploidy in plants. Plant Cell. 2017:29(12):2974–2994. 10.1105/tpc.17.00595 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhong Y, Liu Y, Wu W, Chen J, Sun C, Liu H, Shu J, Ebihara A, Yan Y, Zhou R, et al. Genomic insights into genetic diploidization in the homosporous fern Adiantum nelumboides. Genome Biol Evol. 2022:14(8):evac127. 10.1093/gbe/evac127 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zou C, Lehti-Shiu MD, Thomashow M, Shiu SH. Evolution of stress-regulated gene expression in duplicate genes of Arabidopsis thaliana. PLoS Genet. 2009:5(7):e1000581. 10.1371/journal.pgen.1000581 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Data about Nelumbo, Acorus, and Nymphaea analyzed in this study are public and cited in the manuscript.