Abstract
Gene and genome duplication are recurring processes in flowering plants, and elucidating the mechanisms by which duplicated genes are lost or deployed is a key component of understanding plant evolution. Using gene ontologies (GO) or protein family (PFAM) domains, distinct patterns of duplicate retention and loss have been identified depending on gene functional properties and duplication mechanism, but little is known about how gene networks encoding interacting proteins (protein complexes or signaling cascades) evolve in response to duplication. We examined patterns of duplicate retention within four major gene networks involved in photosynthesis (the Calvin cycle, photosystem I, photosystem II and the light harvesting complex) across three species and four whole genome duplications, as well as small-scale duplications and showed that photosystem gene family evolution is governed largely by dosage sensitivity.1 In contrast, Calvin cycle gene families are not dosage-sensitive, but exhibit a greater capacity for functional differentiation. Here we review these findings, highlight how this study, by analyzing defined gene networks, is complementary to global studies using functional annotations such as GO and PFAM, and elaborate on one example of functional differentiation in the Calvin cycle gene family, transketolase.
Key words: gene duplication, whole genome duplication, dosage sensitivity, balance hypothesis
Gene duplication is an ongoing process in eukaryotic genomes,2,3 providing raw material for the evolution of novel traits and phenotypic complexity.4–6 Polyploidy (whole genome duplication; WGD) has occurred in most, if not all, flowering plant lineages,7,8 and has thus played a central role in adding new genes to plant genomes.
Recent studies have shown that the likelihood of duplicate gene retention differs by the functional properties of the genes as well as by the mechanism of duplication (WGD vs. non-polyploid [NP] duplication).9–13 In particular, “connected genes,”14 or genes whose products function in multi-subunit complexes (e.g., ribosomal proteins, transcription factors) or signaling cascades (e.g., kinases) are preferentially retained in duplicate following WGD, and preferentially lost following NP duplication. Though several studies have verified these patterns at the level of generalized functional properties (e.g., gene ontologies [GO] or protein family [PFAM] domains), little is known about duplicate retention patterns in the context of specific gene networks.
Selective Forces Driving Duplicate Retention and Loss Differ Among Photosynthetic Gene Networks
We characterized patterns of gene duplicate retention in the context of four major functional groups of photosynthesis: the Calvin cycle (CC), photosystem I (PSI), photosystem II (PSII) and the light harvesting complex (LHC).1 The legume genera Glycine and Medicago shared a WGD approximately 54 million years ago (MYA).15 Glycine subsequently experienced a second WGD within the last 13 MY.16 Arabidopsis has experienced at least two WGDs independent of the legume duplications.17,18 We utilized the polyploid histories of soybean (G. max), barrel medic (M. truncatula) and Arabidopsis (A. thaliana) to examine photosynthetic gene family evolution in response to these two sets of nested WGDs, as well as numerous NP duplications.
Across all three species, PSII exhibited significantly higher retention of genes duplicated by WGD than the genomewide average, and minimal contribution of NP duplication to gene family expansion.1 Conversely, the CC exhibited significantly lower retention of WGD duplicates than PSII, and a higher contribution of NP duplicates to gene family expansion.1
This reciprocal pattern of retention/loss in PSII is consistent with the balance hypothesis,14 which posits that protein complexes frequently require a specific stoichiometric balance between individual components to function optimally and are dosage sensitive. Unbalanced duplications (NP duplications affecting some, but not all subunits) are deleterious because they alter this stoichiometry, and will tend to be eliminated by purifying selection. In contrast, balanced duplications such as WGD, which duplicate all subunits, will preserve the stoichiometry of the complex. Consequently, purifying selection to maintain stoichiometry will tend to preserve polyploid duplicates.
In contrast, enzymes are generally not dosage sensitive,19 and our analysis suggested that this is the case with CC enzymes as well. Over the long term, WGD duplicates were not preferentially retained in the CC relative to the whole genome, and unbalanced (NP) duplications were tolerated to a much greater degree in the CC than in PSII (e.g., in Arabidopsis 32% of gene family expansion in the CC was due to NP duplications, compared to 0% for PSII).1
Is Dosage Sensitivity Circumvented over Time by Changes in Expression?
Despite consistent patterns at the level of photosynthetic functional groups, there is a notable absence of pattern in retention of homeologs (gene duplicates resulting from WGD) at the individual gene family level.1 For example, we observed negligible correlation in percentage retention across the nine PSII gene families (r ≤ 0.24) when comparing any of the WGD events in soybean and Arabidopsis. This lack of pattern would seem to argue against the conclusion that PSII is dosage.sensitive. If it were, one might expect all balanced duplications to be retained, producing clear patterns in individual gene families across species and nested WGDs. Similarly, although previous studies have demonstrated a propensity for “connected” genes to be retained following polyploidy,14 many such genes were lost. What might explain these exceptions to the rule? We speculate that in some cases dosage balance requirements are circumvented by mutations (e.g., in cis-regulatory sequences) that change expression levels.20 This would break the linkage between gene dosage and gene product abundance and relax selective constraints on gene copy number. Consequently, dosage sensitivity could drive high overall retention of balanced duplicates in dosage-sensitive complexes, while random mutational processes would allow different gene families to fractionate in different species or following different polyploidy events.
GO/PFAM Annotation vs. Targeted Studies of Define Gene Networks
Much of the previous evidence in support of the balance hypothesis has come from genome-wide studies of polyploids utilizing GO or PFAM annotations.5,9–12,21 Both GO and PFAM provide functional annotation for a large fraction of the proteome, and consequently have provided a useful framework in which to characterize global patterns of duplicate retention and loss.
However, such studies provide only a broad overall picture of dosage sensitivity, and little detail about specific biochemical pathways and/or protein complexes. In most cases, enzymes in a biochemical pathway or subunits of a multi-subunit protein complex do not have common protein domains. For example, with the exception of fructose bisphosphatase (FBPase) and sedoheptulose-1,7-bisphosphatase (SBPase), no two CC enzymes share a PFAM domain. The GO framework includes terms defining higher-order relationships in the “Biological Process” (BP) and “Cellular Component” (CellComp) categories, but studies utilizing GO have typically only considered generic, high-level parent terms. For example, Maere et al.11 used the GO Slim ontology, which collapses the terms “Photosystem I” (GO: 0009522) and “Photosystem II” (GO: 0009523) into the parent term “Chloroplast.” Thus, such studies provide little resolution for dissecting differences among specific gene networks.
Perhaps most importantly, due to the ongoing nature of GO annotation, assignment of genes to many GO terms remains incomplete. In Arabidopsis only five of 11 CC enzymes are assigned to the GO term for “Carbon fixation” (GO: 0015977), and only six of the nine nuclear-encoded PSII subunits are represented in the GO term for “Photosystem II” (GO: 0009523). A very different picture of duplicate retention for PSII is given based on the genes in the “Photosystem II” GO term compared to our PSII gene set that includes all nine subunits. We showed that 57.1% of PSII genes retain homeologs from the most recent (α) WGD, which is significantly higher than the corresponding estimate (28.5%; Thomas et al. 2006) for the whole genome (x21 = 4.314, p = 0.038; Yate's correction).1 In contrast, only 41.7% of the genes assigned to the “Photosystem II” GO tem have retained α-homeologs—a value that is not significantly different from the whole genome (x21 = 0.953, p = 0.329), obscuring the evidence for dosage sensitivity in PSII.
Thus, although analyses of gene family evolution based on PFAM or GO have provided strong general support for the balance hypothesis, information about specific protein networks or complexes is lacking. Because the molecular basis for dosage sensitivity remains poorly understood, detailed studies of specific gene networks, such as the functional groups of photosynthesis, will complement more global analyses, giving a more detailed picture of what, exactly, makes one complex dosage sensitive and another not.
Dosage Sensitivity vs. Functional Differentiation as Mechanisms of Duplicate Gene Retention
The role of selection differs fundamentally among the different models of duplicate gene retention. Under the balance hypothesis, balanced duplications in a dosage sensitive network will be preserved via purifying selection to maintain proper network stoichiometry.6,14 Other models of duplicate retention (e.g., subfunctionalization, neofunctionalization, escape from adaptive conflict),6 invoke either positive selection or relaxation of purifying selection, resulting in functional differentiation. Thus, the extent to which homologs exhibit evidence for functional divergence provides further insight into the mechanisms driving retention.
We showed that gene duplicates in PSII and the CC differ in degree of functional differentiation.1 For most PSII homeologs, we found no evidence for positive selection or divergence in expression profiles. In contrast, the majority of CC homeologs exhibit evidence for positive selection and/or expression divergence. The fate of most gene duplicates is pseudogenization and/or loss within a few million years.3 The fact that many PSII homeologs have persisted for tens of millions of years in the absence of any obvious functional differentiation further supports the idea that these duplicates have been retained via purifying selection to preserve dosage balance.20 The majority of CC duplicates, in contrast, appear to have been retained as a result of functional differentiation (e.g., sub- or neofunctionalization).
A pair of Arabidopsis transketolase (TKL) homeologs derived from the α polyploidy event provide a striking example of functional divergence in the CC. Expression of the two homeologs is oppositely regulated (r = −0.67) (Fig. 1). TKL is a dual function enzyme acting in the oxidative pentose phosphate pathway (OPPP) as well as the CC.22 One homeolog (AT3G60750) is co-expressed with rubisco small subunit (RbcS), SBPase and phoshoribulokinase (PRK), gene families that only function in the Calvin cycle, whereas the other homeolog (AT2G45290) is oppositely regulated relative to these genes (Fig. 1), suggesting that AT2G45290 has lost CC function. Both TKL homeologs are also positively co-regulated with a complete set of OPPP genes (Table 1), suggesting that they both function in OPPP. However, each homeolog is co-expressed with a distinct set of OPPP pathway paralogues, suggesting that Arabidopsis has two functionally differentiated OPPP pathways, and that the TKL homeologs have partitioned their activities between the two.
Table 1.
OPPP gene family | Gene | TKL homeolog | |
AT3g60750 | AT2g45290 | ||
G6PDH | AT5g35790 | 0.72 | −0.79 |
AT5g40760 | −0.52 | 0.76 | |
AT1G24280 | −0.28 | 0.57 | |
AT5G13110 | −0.30 | 0.54 | |
AT3G27300 | −0.15 | 0.43 | |
AT1G09420 | −0.28 | 0.11 | |
6PGDH | AT1G17650 | 0.72 | −0.74 |
AT1G71180 | 0.22 | −0.27 | |
AT1G64190 | 0.04 | 0.16 | |
AT3G02360 | −0.51 | 0.85 | |
AT5G41670 | −0.11 | 0.51 | |
AT1G71170 | −0.39 | 0.37 | |
6PGL | AT3G49360 | 0.39 | −0.51 |
AT5G24420 | 0.31 | −0.33 | |
AT5G24410 | −0.33 | 0.40 | |
PRI | AT3G04790 | 0.84 | −0.70 |
AT5G44520 | 0.70 | −0.77 | |
AT2G01290 | 0.68 | −0.65 | |
AT1G71100 | −0.11 | 0.24 | |
RPE | AT5G61410 | 0.68 | −0.76 |
AT1G63290 | −0.42 | 0.58 | |
AT3G01850 | −0.27 | 0.11 | |
TA | AT1G12230 | 0.45 | −0.44 |
AT5G13420 | −0.46 | 0.79 |
Pearson correlation coefficients (r) for pairwise comparisons of expression profiles between Arabidopsis transketolase (TKL) homeologs and genes encoding other oxidative pentose phosphate pathway (OPPP) enzymes. Correlation coefficients were derived from all microarray experiments in AffyWatch release 2.0, obtained using CressExpress.23 G6PDH, glucose-6-phosphate dehydrogenase; 6PGDH, 6-phosphogluconate dehydrogenase; 6PGL, 6-phosphogluconolactonase; PRI, ribose 5-phosphate isomerase; RPE, ribulose-5-phosphate 3-epimerase; TA, transaldolase.
In conclusion, detailed examination of four photosynthetic gene networks revealed that gene duplicates encoding photosystem subunits are functionally constrained, and that dosage sensitivity has played an important role in their evolution. In contrast, gene duplicates in the CC do not exhibit evidence for dosage sensitivity, but have greater capacity for functional divergence, as exemplified by the TKL homeologs from the α WGD event in Arabidopsis. Global analyses using GO or PFAM frameworks have outlined the mechanisms that govern gene family evolution. Detailed analyses of defined gene networks, such as the present one, are filling in the details.
Acknowledgments
This work was supported by National Science Foundation grant nos. IOS-0744306 and DEB-0709965.
Abbreviations
- BP
biological process
- CC
Calvin cycle
- CellComp
cellular component
- FBPase
fructose bisphosphatase
- GO
gene ontology
- LHC
light harvesting complex
- NP
non-polyploid
- OPPP
oxidative pentose phosphate pathway
- FFAM
protein family
- PRK
phoshoribulokinase
- PSI
photosystem I
- PSII
photosystem II
- RbcS
rubisco small subunit
- SBPase
sedoheptulose-1,7-bisphosphatase
- TKL
transketolase
- WGD
whole genome duplication
References
- 1.Coate JE, Schlueter JA, Whaley AM, Doyle JJ. Comparative evolution of photosynthetic genes in response to polyploid and nonpolyploid duplication. Plant Phsyiol. 2011;155:2081–2095. doi: 10.1104/pp.110.169599. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Lynch M, Conery JS. The evolutionary fate and consequences of duplicate genes. Science. 2000;290:1151–1155. doi: 10.1126/science.290.5494.1151. [DOI] [PubMed] [Google Scholar]
- 3.Lynch M, Conery JS. The evolutionary demography of duplicate genes. J Struct Funct Genom. 2003;3:35–44. [PubMed] [Google Scholar]
- 4.Ohno S. Evolution by Gene Duplication. Springer: 1970. [Google Scholar]
- 5.Freeling M, Thomas BC. Gene-balanced duplications, like tetraploidy, provide predictable drive to increase morphological complexity. Genome Res. 2006;16:805–814. doi: 10.1101/gr.3681406. [DOI] [PubMed] [Google Scholar]
- 6.Freeling M. Bias in plant gene content following different sorts of duplication: Tandem, whole-genome, segmental or by transposition. Annu Rev Plant Biol. 2009;60:433–453. doi: 10.1146/annurev.arplant.043008.092122. [DOI] [PubMed] [Google Scholar]
- 7.Tang H, Wang X, Bowers JE, Ming R, Alam M, Paterson AH. Unraveling ancient hexaploidy through multiply-aligned angiosperm gene maps. Genome Res. 2008;18:1944–1954. doi: 10.1101/gr.080978.108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Soltis DE, Albert VA, Leebens-Mack J, Bell CD, Paterson AH, Zheng C, et al. Polyploidy and angiosperm diversification. Am J Bot. 2009;96:336–348. doi: 10.3732/ajb.0800079. [DOI] [PubMed] [Google Scholar]
- 9.Blanc G, Wolfe KH. Functional divergence of duplicated genes formed by polyploidy during arabidopsis evolution. Plant Cell. 2004;16:1679–1691. doi: 10.1105/tpc.021410. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Seoighe C, Gehring C. Genome duplication led to highly selective expansion of the Arabidopsis thaliana proteome. Trends Genet. 2004;20:461–464. doi: 10.1016/j.tig.2004.07.008. [DOI] [PubMed] [Google Scholar]
- 11.Maere S, De Bodt S, Raes J, Casneuf T, Van Montagu M, Kuiper M, et al. Modeling gene and genome duplications in eukaryotes. Proc Natl Acad Sci USA. 2005;102:5454–5459. doi: 10.1073/pnas.0501102102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Paterson AH, Chapman BA, Kissinger JC, Bowers JE, Feltus FA, Estill JC. Many gene and domain families have convergent fates following independent whole-genome duplication events in arabidopsis, oryza, saccharomyces and tetraodon. Trends Genet. 2006;22:597–602. doi: 10.1016/j.tig.2006.09.003. [DOI] [PubMed] [Google Scholar]
- 13.Thomas BC, Pedersen B, Freeling M. Following tetraploidy in an arabidopsis ancestor, genes were removed preferentially from one homeolog leaving clusters enriched in dose-sensitive genes. Genome Res. 2006;16:934–946. doi: 10.1101/gr.4708406. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Papp B, Pal C, Hurst LD. Dosage sensitivity and the evolution of gene families in yeast. Nature. 2003;424:194–197. doi: 10.1038/nature01771. [DOI] [PubMed] [Google Scholar]
- 15.Pfeil BE, Schlueter JA, Shoemaker RC, Doyle JJ. Placing paleopolyploidy in relation to taxon divergence: A phylogenetic analysis in legumes using 39 gene families. Syst Biol. 2005;54:441–454. doi: 10.1080/10635150590945359. [DOI] [PubMed] [Google Scholar]
- 16.Schmutz J, Cannon SB, Schlueter J, Ma J, Mitros T, Nelson W, et al. Genome sequence of the palaeopolyploid soybean. Nature. 2010;463:178–183. doi: 10.1038/nature08670. [DOI] [PubMed] [Google Scholar]
- 17.Blanc G, Hokamp K, Wolfe KH. A recent polyploidy superimposed on older large-scale duplications in the Arabidopsis genome. Genome Res. 2003;13:137–144. doi: 10.1101/gr.751803. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Bowers JE, Chapman BA, Rong JK, Paterson AH. Unravelling angiosperm genome evolution by phylogenetic analysis of chromosomal duplication events. Nature. 2003;422:433–438. doi: 10.1038/nature01521. [DOI] [PubMed] [Google Scholar]
- 19.Kondrashov FA, Koonin EV. A common framework for understanding the origin of genetic dominance and evolutionary fates of gene duplications. Trends Genet. 2004;20:287–290. doi: 10.1016/j.tig.2004.05.001. [DOI] [PubMed] [Google Scholar]
- 20.Birchler JA, Veitia RA. The gene balance hypothesis: Implications for gene regulation, quantitative traits and evolution. New Phytol. 2010;186:54–62. doi: 10.1111/j.1469-8137.2009.03087.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Barker MS, Kane NC, Matvienko M, Kozik A, Michelmore W, Knapp SJ, et al. Multiple paleopolyploidizations during the evolution of the compositae reveal parallel patterns of duplicate gene retention after millions of years. Mol Biol Evol. 2008;25:2445–2455. doi: 10.1093/molbev/msn187. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Tobin AK, Bowsher CG. Nitrogen and carbon metabolism in plastids: Evolution, integration and coordination with reactions in the cytosol. Adv Bot Res. 2005;42:113–165. [Google Scholar]
- 23.Srinivasasainagendra V, Page GP, Mehta T, Coulibaly I, Loraine AE. CressExpress: A tool for large-scale mining of expression data from arabidopsis. Plant Physiol. 2008;147:1004–1016. doi: 10.1104/pp.107.115535. [DOI] [PMC free article] [PubMed] [Google Scholar]