Abstract
The occurrence of polyploidy in land plant evolution has led to an acceleration of genome modifications relative to other crown eukaryotes and is correlated with key innovations in plant evolution. Extensive genome resources provide for relating genomic changes to the origins of novel morphological and physiological features of plants. Ancestral gene contents for key nodes of the plant family tree are inferred. Pervasive polyploidy in angiosperms appears likely to be the major factor generating novel angiosperm genes and expanding some gene families. However, most gene families lose most duplicated copies in a quasi-neutral process, and a few families are actively selected for single-copy status. One of the great challenges of evolutionary genomics is to link genome modifications to speciation, diversification and the morphological and/or physiological innovations that collectively compose biodiversity. Rapid accumulation of genomic data and its ongoing investigation may greatly improve the resolution at which evolutionary approaches can contribute to the identification of specific genes responsible for particular innovations. The resulting, more ‘particulate’ understanding of plant evolution, may elevate to a new level fundamental knowledge of botanical diversity, including economically important traits in the crop plants that sustain humanity.
Keywords: genome modification, ancestral gene content, polyploidy, gene family gain and loss
1. Introduction
Genome duplication is a punctuational event in the evolution of a lineage, with permanent consequences for all descendants—if the lineage survives. Most crown eukaryotes pass through different ploidy levels at different stages of development [1,2] and continuously produce aberrant unreduced gametes at low rates. However, the extreme rarity of genome duplications in the evolutionary history of extant lineages, usually surviving only once in many millions of years, shows that the vast majority quickly go extinct [3].
Classical views suggest that genome duplication is potentially advantageous as a source of genes with new functions [4,5]. Some polyploids appear to realize these and other benefits [6], with genome duplication thought to be central to the evolution of morphological complexity [7]. Polyploids have long been suggested to enjoy a variety of capabilities that transgress those of their diploid progenitors, such as adaptation to environmental extremes [8–11].
Angiosperms are an outstanding model in which to elucidate consequences of genome duplication in crown eukaryotes. It has long been suspected that many angiosperms were palaeopolyploids [10,12]. Indeed, recent analyses of genome sequences [13,14] show that all angiosperms are palaeopolyploids. Seminal findings from yeast [15–17] and Paramecium [18] are shedding valuable light on consequences of genome duplication in single-celled organisms. However, these consequences are expected to be very different in higher eukaryotes with small effective population sizes, such as angiosperms and mammals [19,20].
Herein, we describe some representative features of major land plant groups and associated genomic modifications and variations throughout the history of plant evolution. We circumscribe the ancestral gene content for key nodes of the evolutionary series, as well as adaptational genomic changes occurring along the branches leading to these nodes. We also review current approaches for genome structure comparison to unravel ancient polyploidy and discuss the consequences of these polyploidy events, particularly the biased pattern of gene gains and losses following genome duplication.
2. Genome resources in major plant groups
Land plants, also called embryophytes, descended from freshwater algae about 480 Ma [21–24]. Most of the major phylogenetic groups of land plants now have at least one high-quality draft or reference genome sequence. Mosses, with about 12 000 species classified in the Bryophyta, are non-vascular plants that lack xylem and absorb water and nutrients mainly through their leaves. To date, there is only one Bryophyta species (Physcomitrella patens) with a completed genome sequence [25]. Lycophytes, together with euphyllophytes (ferns and seed plants), are the two surviving lineages after the origin of vascular plants. Selaginella moellendorfii, in the lycophyte family Selaginellaceae (spike mosses), is the first vascular non-seed plant with its genome sequenced [26]. Seed plants (spermatophytes) include gymnosperms and angiosperms. Gymnosperms are a group of seed-producing plants; extant gymnosperms include conifers, cycads, Ginkgo and Gnetales. Many gymnosperms have exceptionally large genomes; for example, conifer genome sizes range from 18 to 35 gigabases [27], hindering whole genome sequencing. Picea abies is the first gymnosperm species with a full genome sequence, completed in 2013 [28]. Angiosperms are by far the largest group of land plants, with more than 300 000 living species, of which at least 33 have sequenced genomes (as in Phytozome v9.1), notably Arabidopsis [29], Populus [30], grape [31], Oryza [32], Sorghum [33], banana [34] and many more, with many additional genome sequences in progress. These genome sequences from species across land plant phylogeny provide critical resources to study genome evolution and associated innovations of plant form and function.
3. Ancestral genome modification at key evolutionary nodes
Clarification of the ancestral gene content and lineage-specific variations at key nodes in the land plant phylogeny will advance knowledge of how genomic modifications contributed to the evolution of novel features. Here, we present an exemplar study of reconstructing ancestral gene content. The protein-coding gene sets of 11 sequenced land plant species (five eudicots: Arabidopsis thaliana, Populus trichocarpa, Vitis vinifera, Solanum lycopersicum and Nelumbo nucifera; three monocots: Oryza sativa, Sorghum bicolor and Musa acuminata; one gymnosperm: Picea abies; one lycophyte: Selaginella moellendorffii; one moss: Physcomitrella patens) were used to identify putative gene family clusters by OrthoMCL [35]. The OrthoMCL approach builds tighter gene clusters with fewer false positives than the Tribe-MCL method [36]. However, the OrthoMCL clusters might not represent truly distinct gene families, but could be subfamilies for some more divergent gene families (gene family represents an OrthoMCL cluster as a general term hereafter). For example, the MADS-box gene family includes tens of OrthoMCL clusters. Therefore, we have to emphasize the potential over-estimation of ancestral gene content, as well as the number of gene family gains and losses. The number of genomes used in the gene family classification could also affect the gene clusters, especially since most of the currently available genomes are angiosperms. However, this analysis does illustrate a general trend of genome modifications following land plant evolution. We modelled the changes of gene content occurring along each branch using a Wagner parsimony framework model.
From this gene family classification, we estimated a minimum gene set of 7100 genes (figure 1, node 1) in the common ancestor of all land plants, which is very close to the estimation (6820) of Banks et al. [26]. Banks et al. [26] identified gene families using assignment of orthology by mutual-best-hit criteria and synteny between closely related organisms. The bryophyte Physcomitrella has more than twice the number of genes of aquatic Chlamydomonas (35 938 versus 15 143) [25,37], suggesting a general increase in gene family complexity following the transition to land. Genes associated with aquatic environments (e.g. flagellar components for gametic motility) and dynein-mediated transport have been lost in Physcomitrella [25]. By contrast, Physcomitrella gained members of gene families associated with signal transduction (e.g. through gibberellic acid, jasmonic acid, ethylene and brassinosteroids), transport capabilities and tolerance of abiotic stresses. Adaptation to land required plants to be able to survive much greater variations of temperature, light and water availability. An example of such adaptation may be the heat shock protein 70 (HSP70) gene family. All algal genomes sequenced to date have only one cytosolic HSP70 gene [38], while there are nine in P. patens [25]. Likewise, the light-harvesting complex proteins have significantly expanded in P. patens, perhaps contributing to robustness of the photosynthetic antenna to deal with high light intensities. Photo-protective early light-induced proteins also expanded greatly in P. patens, putatively associated with avoidance of photo-oxidative damage. Some of these expansions may be results of whole-genome duplications [25].
Figure 1.
Global gene family loss and gain in plant genomes. Eleven sequenced plant genomes were selected for phylogenetic representation of major plant groups. Gene families were identified using OrthoMCL, and the Count program was used to determine the minimum gene set for ancestral nodes of the phylogenetic tree using a Wagner parsimony framework. Numbers after each node indicate the estimated ancestral gene numbers, while the numbers above each branch show the number of gene family gains (+) and losses (–). The extra bar under the internal branches means one or more rounds of ancient polyploidy suggested along that time period.
The estimated minimal gene number of the most recent common ancestor of vascular plants is 7790 (figure 1, node 2), with the acquisition of 690 new genes during the transition from non-vascular plants. In the lycophyte Selaginella, which, among plants with sequenced genomes, is sister to all other vascular plants, secondary metabolic genes (such as cytochrome P450, BAHD acyltransferases and terpene synthases) were expanded extensively [26]. It has been suggested that far fewer new genes were needed for the transition from a gametophyte- to a sporophyte-dominated life cycle than for the transition from non-seed vascular plant to a seed plant and subsequently to a flowering plant [26].
A large number of gene family gains (1269) were inferred along the deep branch leading to seed plants, in which the estimated ancestral gene content was 8652. Genome size is exceptionally large for gymnosperms (18–35 gigabases), although polyploidy is thought to be rare in this group [39]. Recent efforts have indicated that the large genome size of gymnosperms might be associated with rapid expansion of retrotransposons and may be limited to conifers, Pinaceae [40–44], which are particularly well studied in view of their economic importance. A recent study suggested that elevated rates of genome size and diversification occurred within the past 100 Myr, especially in Pinus [45]. A draft sequence of the 20-gigabase genome of Picea abies has been recently completed [28]. Despite being more than 100 times larger than that of Arabidopsis, the Picea abies genome contains a similar number of well-supported annotated genes (28 354).
Comparison to Picea abies and/or other conifer genomes provides a valuable reference for studying genome modifications and the evolution of key traits for seed plants, for example the origin of flowering. The representative gymnosperm, Picea abies, lacks flowering locus T (FT)-like genes (that promote flowering in other taxa), instead containing a group of FT/TFL1-like genes which probably act as flowering repressors [28,46,47]. The FT-like genes are exclusively found in angiosperms, including early diverging and eudicot lineages, supporting the hypothesis that the evolution of flowering plants coincided with the evolution of a flower-promoting function for an FT/TFL1-like gene. In plants, MADS-box genes encode transcription factors, which are important regulators of plant development, particularly as regulators of floral organ identity [48]. A total of 278 MADS-box homologies were identified in P. abies, most of which are type II MADS-box genes (MIKC-type proteins, with structure of MADS (M) domain followed by an Intervening (I), a Keratin-like (K) and a C-terminal domain). The VASCULAR NAC DOMAIN (VND) gene family controls the formation of multi-cellular vessels, which has been considered to be one of the key innovations that contribute to the success of the flowering plants [49]. The VND gene family has been expanded with only two VND genes found in P. abies [28] and seven in Arabidopsis thaliana [50].
Another large wave of gene family gains (figure 1, node 4, 1492 new gene families) was inferred along the branch leading to angiosperms, suggesting that a diverse set of novel gene functions originated before the emergence of angiosperms. Angiosperms have seeds contained within a fruit, unlike gymnosperms that have naked seeds (no fruit). The flowers, fruits and other characters of angiosperms are likely to have contributed to their emergence as the most species-rich group of land plants [51–53]. However, because this exemplar analysis does not include genome sequences from basal angiosperms, it cannot distinguish gene families originating before or soon after the earliest diversification of angiosperms. Amborella is thought to be the single living sister species to other extant flowering plants, and the Amborella trichopoda genome, which is finished but not yet available for large-scale analysis, will provide a valuable reference for reconstructing ancestral angiosperm gene content (http://www.amborella.org/) [54].
The estimated minimum gene sets for eudicots and monocots are 10 994 and 10 547 (figure 1, nodes 5 and 9), respectively. A total of 1068 novel gene families were gained in eudicots as a whole (on the branch to node 5), and 662 of these were gained before the diversification of monocots (on the branch to node 9). However, the largest gain of gene families in the internal branches of the studied 11-genome phylogeny was inferred along the branch leading to grasses (figure 1, node 10), with about 4126 gene families newly gained before the divergence of Oryza and Sorghum.
4. Genome structure comparison and synteny detection
Comparisons between and within genomes permit one to detect homologous regions based on conserved gene content and gene order (synteny blocks), in some cases also revealing that large-scale, or whole-genome, duplications (WGDs) occurred during evolutionary history. In vertebrates, synteny blocks are still detectable even after hundreds of millions of years of divergence [55,56]. However, angiosperm genomes are more dynamic with much more rapid structural evolution, which is due partly if not largely to extensive chromosomal rearrangement and gene loss following WGDs in the ancestral lineages [57–59]. For example, the two main clades of angiosperms, monocots and eudicots, were predicted to have diverged about 140–150 Ma [60]. Syntenic orthologue pairs between monocot and eudicot species can only account for 1–7% of the total orthologue pairs, versus 30–50% in intra-species comparison in eudicots or monocots (see table 1 in [61]). Inter-genome synteny blocks can be substantially affected by the number of genome duplications after species divergence, as well as the timing of species divergence (figure 2).
Figure 2.
Impact of divergence time and consecutive rounds of genome duplication on synteny signals. Syntenic blocks were identified using MCscanX [61]. The comparisons of Oryza-Sorghum and Vitis-Carica were used to show synteny when no genome duplication occurred after the divergence. One genome duplication occurred after the divergence of Oryza-Zea and Vitis-Populus respectively, and multiple WGDs occurred after the split of Oryza-Musa and Vitis-Musa. The signal of synteny was eroded following the increasing divergence time and additional number of WGDs after the separation. (Online version in colour.)
Many algorithms and software have been developed to find syntenic blocks within or among genomes. In general, homology matrices are built using all-against-all BLAST searches, and then synteny is detected through clustering neighbouring matches within a matrix, such as the methods implemented in ADHoRe [62] and DiagHunter [63]. Another method to detect synteny is to employ dynamic programming to find the highest scoring ‘chains’ of syntenic genes, performing empirical or statistical tests to determine whether the observed synteny could have arisen by chance as implemented in DAGchainer [64], ColinearScan [65], MCscan [58] and MCscanX [61]. In addition, several web-based systems have been developed for whole-genome analysis. For example, CoGe is a platform for multiple whole-genome comparisons to find and align syntenic regions and visualize the output in an intuitive and interactive manner [66–68]. At the time of writing, it has implemented 15 140 organisms with 19 624 genomes, including Bacteria, Archaea, eukaryotes, organelles, viruses and sub-genomes such as plasmids. The Plant Genome Duplication Database (http://chibba.pgml.uga.edu/duplication) is a public web service to identify synteny information among plant genomes and is mainly focused on unravelling genome duplication events during the history of angiosperm evolution [69].
While syntenic blocks are the most definitive evidence supporting the most recent WGD in a genome, most angiosperm genomes have experienced multiple WGDs. In order to find the signatures of more ancient events, two approaches have been successfully used to track the ancient WGDs, referred to as ‘bottom-up’ [57] and ‘top-down’ approaches [58]. The ‘bottom-up’ approach is to ‘merge’ regions duplicated in the most recent event affecting a genome, using genes that remain duplicated and in syntenic locations to align corresponding regions and infer the gene content and order of an ancestral chromosome that would be sufficient to explain their gene content and order. Deduced ancestral chromosomes are then compared to one another in the same manner, to recursively reconstruct older genome duplications in a stepwise fashion [57,70]. The main challenge is how to order the non-anchor genes located in the syntenic blocks. To date, only the second and the third iteration have been achieved using the bottom-up approach [57,70]. In the ‘top-down’ approach, pair-wise syntenic blocks are first identified between the target genome and an outgroup genome. Then, all homologous segments between the two genomes are clustered together to infer the number of WGDs based on the redundancy level, i.e. the number of homologous segments each corresponding to a single region of the genome [13]. For example, in the tiny Utricularia gibba genome (genome size of 82 megabases), a top-down approach could identify at least three rounds of WGD occurring after the divergence of U. gibba from a common ancestor shared with tomato and grape [71].
5. Palaeopolyploidy and gene family numbers
Genome doubling may confer a number of advantages to a polyploid [6], via mechanisms such as increased gene dosage, ‘intergenomic heterosis’ conferred by multiple alleles in a polyploid nucleus, or the evolution of novel gene functions (neofunctionalization, [4,5]). Each of these advantages requires that each of the two duplicated copies of a gene survive in the doubled genome, thus increasing gene family numbers. However, it is apparent from a host of genome sequences that by far the most common fate of a pair of duplicated genes is silencing (either epigenetically or by mutation) and eventual elimination of one member of a pair, called ‘non-functionalization’. Classical ideas about neofunctionalization as a major advantage of polyploidy [4,5] have more recently been tempered with the suggestion that a more widespread fate of those duplicated genes that survive may be the evolution of subdivisions of ancestral functions (subfunctionalization, [72]) that render them interdependent. Subfunctionalization may sometimes lead to neofunctionalization [73].
In angiosperms (which are certainly representative of crown eukaryotes, and probably of all organisms in this regard), we find three ‘fates’ of individual gene pairs following duplication:
(1) Most gene functional groups show post-duplication gene preservation/loss rates that are indistinguishable from the genome-wide average. Such ‘neutral’ loss of duplicated genes presumably involves inactivating mutations opposed by very weak selection [74], closely resembling the ‘non-functionalization’ described by others (e.g. [75]) as the fate of the vast majority of duplicated genes.
(2) Genes in some specific functional categories tend to be retained after duplication. Several gene functional groups are preferentially preserved in duplicate [58,76–78]. Coding regions of genes preserved in duplicate tend to be functionally complex [76], under purifying selection [75,76] and evolve in concert [79,80]. Tendencies to retain duplicate genes involved in signal transduction and transcription and to lose DNA repair genes are widely observed [81,82]. However, Pfam domain-based groupings reveal heterogeneity in the broader and more widely used gene ontology (GO) categories used in prior studies [81,83], for example, showing one abundant protein–protein interaction domain (LRR, leucine-rich repeat) to be usually preserved in duplicate while two less-abundant domains (SET; TPR, tetratricopeptide repeat) are usually restored to singleton states [78].
(3) Other specific genes and gene functional groups show more extensive loss of duplicate copies than the genome-wide average, and this loss is often convergent following independent duplications separated by hundreds of millions of years (i.e. in angiosperms, yeast and the fish Tetraodon [78]). Some gene functional groups are preserved in duplicate significantly less frequently than the genome-wide average. This observation alone might be viewed as noise—among thousands of functional groups, some must incur more gene loss than others due to random factors. However, the gene functional groups that have incurred the greatest loss of duplicated copies are closely correlated following independent duplications in Arabidopsis and rice, at statistical probabilities that essentially rule out false positives [78]. Multi-alignments show some individual genes to have been repeatedly restored to single-copy status following many different genome duplications in independent angiosperm lineages [13,58]. The two gene sets do not overlap. Repeated restoration of certain genes to singleton status at a greater-than-random frequency suggests that an underlying set of principles of molecular evolution may contribute to the fates of gene and genome duplications [78].
In the following pages, we elaborate in more detail on these respective outcomes.
6. Palaeopolyploidy and gene family gains
Gene duplication has long been thought to be a primary source of material for evolution [4]. As discussed above, the duplicates retained in genomes usually have divergent functions. A now-classical concept is that a duplicate gene is relatively free of selective constraints and can evolve new functions (neofunctionalization), so long as the ancestral copy maintains its original functions [4,84]. Thus, gene duplication has been viewed as one of the main molecular mechanisms in the creation of new genes [85]. Single gene duplicates often lose part or all of the regulatory features of their progenitor genes, thus being prone to evolving new expression patterns and perhaps new functions [86]. However, single-gene duplicates have relatively short half-lives [84], so evolution of a new function that confers a selective advantage needs to happen relatively quickly or the gene may be lost. Under WGD, involving the doubling of the entire genome, duplicated genes retain their ancestral regulatory features and expression patterns at least initially. However, they generally survive longer than single gene duplicates and thus have more time to evolve novel function that justifies their retention, in principle offering a rich potential source of new genes, gene families and sub-families in addition to increasing the number of members of existing gene families.
The rapidly expanding set of whole genome sequences available for plant species has provided useful resources for unravelling palaeopolyploidy events in evolutionary history [13,33,57,83,87], as well as for reconstructing ancestral gene content. We investigated gene family gains along deep branches of the land plant phylogeny, and found that the largest number of gene family gains is on the branches where polyploidy events occurred (figure 1). As mentioned above, these gains could mean true new gene family origination, or just expansion of extensively diverged subfamilies. Ancient polyploidy events in the common ancestors of all extant seed plants, and all extant angiosperms, respectively, were identified using a genome-scale phylogenomic approach [87]. These two ancient WGD events appear on the branches leading to nodes 3 and 4 in figure 1, respectively, on which large numbers of gene family gains occurred (1269 and 1492). The origin of the seed and flower may be partly explained by these novel genes and expanded gene families that appeared before the diversification of seed plants and angiosperms. For example, 35 floral genes in Arabidopsis were clustered in orthogroups that originated before the angiosperm radiation (electronic supplementary material, table S1). Many floral genes originated before seed plants and have been further expanded in number by ancient polyploidy events, such as MADS-box [28], PHYTOCHROME [88] and HD-ZIP III gene families [89]. In total, we found 21 genes associated with flowers in the orthogroups newly gained before the divergence of angiosperms and gymnosperms (electronic supplementary material, table S2). These results suggest that gene recruitment is an ongoing process, while gene origination or expansion could be a short/long time before the recruitment. Indeed, gene recruitment to a function surely requires not only available genes but also an external impetus, such as climate change. For example, duplicate copies of virtually all genes that were eventually recruited into the evolution of C4 photosynthesis in Poaceae grasses were available following a pan-Poaceae genome duplication at about 70 Ma [90]. However, many of these copies were lost, and subsequent single-gene duplicates of the same genes were recruited into C4 photosynthesis about 40 Myr later [91].
The largest gene family gain is 4126, on the internal branch that gave rise to Oryza and Sorghum. At least two (ρ and σ), and perhaps three, WGD events were identified on this branch leading to node 10, which is specific to the Poaceae (grass) clade, not shared with the Musa (grass) lineage [34,70]. By examining syntenic evidence, the most recent WGD, ρ, was identified as a shared event of Oryza and Sorghum [90]. A bottom-up reconstruction approach was used to build pre-ρ ancestral blocks. A second iteration of synteny analysis was able to find a more ancient polyploidy event before ρ [70]. Genome comparisons revealed that generally three regions of the eudicot Vitis (grape) genome matched up to eight homologous regions in Oryza, suggesting that three WGD events occurred in the Oryza lineage after the split with eudicots [70].
Another ancient polyploidy event on the internal branches, named gamma, is most likely shared with all core eudicots [31,92–94]. There are 623 novel gene family gains along the branch leading to core eudicots (node 6). However, more gene family gains (1068) appear on the branch leading to all eudicots (node 5), where no genome duplication has been identified. This is somewhat in conflict with the other large gains, which usually accompanied one or more WGD events. The inconsistency could be due to the timing and nature of the gamma event, which may actually be two events that occurred in close proximity, very early in the evolutionary history of eudicots [93].
Changes in gene family content along terminal branches of the land plant phylogeny could be partly due to further independent polyploidy events in the evolutionary history of many lineages. At least one more polyploidy event was identified along the branches leading to all eudicot lineages depicted except Vitis [30,58,93,95]. Three independent polyploidy events were proposed on the terminal branch to Musa acuminata [34]. Numbers of gains inferred along the terminal branches could also be somewhat inflated by genome annotation errors.
Polyploidy events appear likely to be the major factor generating novel genes, as well as expanding gene families. For angiosperm branches on which no ancient polyploidy event was identified (branches leading to nodes 2, 7, 8 and 9), relatively small numbers of novel gene families were gained (690, 213, 434 and 662, respectively). Small-scale duplications, such as transposition and tandem duplications, might contribute these gene family gains. Indeed, while polyploidy may confer the largest number of new genes, small-scale duplications that separate genes from ancestral regulatory features and place them in new genomic environments, are more prone to the evolution of novel expression patterns and may be more likely than polyploidy-derived genes to give rise to novel functions [96].
It has been widely acknowledged that gene retention after polyploidy events is biased [97–100]. Genes in functional categories of transcription factors, protein kinases and transferases, tend to retain duplicated copies following WGD [81,97]. Several models have been proposed to explain the biased retention pattern following duplication. Observations in Saccharomyces cerevisiae and Caenorhabditis elegans [101] suggest that genes for which duplicates are preserved tend to be slowly evolving, whereas genes restored to singletons tend to have a higher mutation rate. Another idea relates gene retention probability to gene expression level. In yeast, it has been observed that selection for increased levels of gene expression was a significant factor determining which genes were retained and which were returned to a single-copy state [102]. In addition, the ‘gene balance’ hypothesis [97] predicts a relationship between the number of interactions, or ‘connectivity’ of a gene (i.e. with other genes), and its sensitivity to altered gene dosage. ‘Connected’ genes are more likely to be retained after WGD than after tandem duplications, with losses of such genes resulting in a sort of haploinsufficiency relative to the remainder of the genome. It has also been argued that balanced gene drive should tend to drive up morphological complexity [7], which is potentiated by duplicated functional models (gene networks). An important need for further investigation of many of these ideas is a much greater knowledge of the various forms of interactions among genes, including physical, regulatory and feedback-mediated.
7. Palaeopolyploidy and gene losses
Polyploidy has made a significant contribution to gene family expansion and novel gene content as discussed above; however, non-random patterns of loss of duplicated genes are less studied. As noted above, the vast majority of gene duplicates tend to be silenced in a relatively short period of time, and this process is not random [81,103,104]. Specific genes and gene functional groups show more extensive loss of duplicate copies than the genome-wide average, and this loss is often convergent following independent duplications separated by hundreds of millions of years [100,105,106]. Investigations of single-copy genes in sequenced flowering plants has statistically ruled out the possibility of random gene loss (e.g. [58,106]). Single-copy genes are overrepresented in some essential functional categories, such as DNA repair, recombination, enzyme activity and organelle-related functions, and underrepresented in GO categories in which WGD survivors are usually significantly enriched (e.g. transcription factor activity, kinase activity, transport, signal transduction) [105,106]. Furthermore, single-copy genes show more sequence conservation, higher gene expression level, and expression in more tissues than multiple-copy genes [106]. Conserved nuclear single-copy genes have been used as markers for reconstructing angiosperm phylogeny, even eukaryotic relationships, and for improving resolution of the evolutionary history of organisms [107,108]. However, it might be problematic to use some specific single-copy gene families as phylogenetic markers if speciation precedes their restoration to single-copy status. For example, two recent studies [106,107] each show that genes that are single-copy in other organisms are still duplicated in soya bean after its most recent polyploidization event at approximately 13 Ma, and tetraploid cotton appears likewise to experience relatively slow gene loss [109]. If a speciation event occurred before duplicated genes experienced reciprocal gene silencing, a reconstructed phylogeny may not reflect the true evolutionary relationship (figure 3).
Figure 3.
Different gene loss patterns affect the reconstructed phylogeny using single-copy genes. If the single-copy gene was duplicated in the common ancestor of species A, B and C, different gene silencing timing and patterns will impact the final reconstructed evolutionary relationships. The most hypothesized case is that the duplicates in the ancestor are quickly restored to single-copy status. However, several single-copy genes in soya bean have retained the duplicated copy for about 13 Myr. Many speciation events could occur during such a long time. Here, we proposed four different gene loss patterns, and all could restore single-copy gene status. (a) All three genes in one big clade went through gene deletion, and the remaining three genes can be used to reconstruct the phylogeny correctly. However, if gene loss patterns were as in (b) and (c), the constructed phylogeny will not truly reflect the relationships among species. (d) Although gene c 2 is not the orthologue of a1 and b1, the final tree is correct by chance. (Online version in colour.)
It has been suggested that reciprocal gene loss after polyploidy could contribute directly to speciation and reproductive isolation [17,90,110–112]. By investigating gene losses following a WGD in a common ancestor of three yeast species (S. cerevisiae, Saccharomyces castellii and Candida glabrata), it has been shown that 20% of the loci experienced differential gene loss patterns [17]. The speciations were shortly after the WGD event, during a period of precipitous gene loss. Therefore, it is hypothesized that reciprocal gene loss at many ancestrally duplicated genes could be the main factor leading to speciation [17]. However, two recent genome-wide studies of syntenic gene losses in Poaceae found very little evidence supporting reciprocal loss of homologous genes among the grass species [33,113]. While evidence from additional plant groups needs to be explored, substantially different effective population sizes in microbes (very large) and crown eukaryotes (very small) may contribute to differences in the fates of duplicated genes [19,20] and associated evolutionary patterns.
8. Conclusion and future studies
One of the great challenges of evolutionary genomics is linking genome modifications that are evident in the burgeoning sets of angiosperm (and other) genome sequences now available to speciation, diversification and the morphological and/or physiological innovations that collectively constitute biodiversity. Polyploidy events are prevalent throughout angiosperm evolution. While the timing of many ancient WGD events can be circumscribed to specific branches of angiosperm phylogeny, much better resolution is needed to link genomic changes to their biological consequences. Soon, genome sequences will be available for virtually all branches of land plant phylogeny, which will help to improve resolution. Enriched knowledge of botanical diversity may permit us to circumscribe not merely thousands or even hundreds of important functional changes to a branch, but much smaller numbers, approaching a more ‘particulate’ model for evolutionary history in much the same manner that Mendelian genetics transitioned science away from ‘blending’ models of inheritance. Dosage balance could explain some biases in gene retention after different modes of duplication. Detailed analysis of evolutionary history of all members of functional protein complexes might provide stronger support for this hypothesis. More effort is needed in reconstructing, decoding and analysing the inferred ancestral genomes prior to diversification of major angiosperm groups. With reconstructed ancestral genomes, we could thoroughly assess genome modifications such as gene gain and loss through evolution, and associate such changes with diversification and novel innovation and function. In addition, genome resequencing data could provide more insights into genome variation following evolution and adaptation at the population level. Genome modifications, including duplication, fractionation, rearrangements and changes in expression, have the potential to facilitate the origin of new functional variation in organisms, including economically important traits in the crop plants that sustain humanity.
Supplementary Material
References
- 1.Galitski T, Saldanha AJ, Styles CA, Lander ES, Fink GR. 1999. Ploidy regulation of gene expression. Science 285, 251–254. ( 10.1126/science.285.5425.251) [DOI] [PubMed] [Google Scholar]
- 2.Hughes T, et al. 2000. Widespread aneuploidy revealed by DNA microarray expression profiling. Nat. Genet. 25, 333–337. ( 10.1038/77116) [DOI] [PubMed] [Google Scholar]
- 3.Arrigo N, Barker MS. 2012. Rarely successful polyploids and their legacy in plant genomes. Curr. Opin. Plant Biol. 15, 140–146. ( 10.1016/j.pbi.2012.03.010) [DOI] [PubMed] [Google Scholar]
- 4.Ohno S. 1970. Evolution by gene duplication. Berlin, Germany: Springer. [Google Scholar]
- 5.Stephens S. 1951. Possible significance of duplications in evolution. Adv. Genet. 4, 247–265. ( 10.1016/S0065-2660(08)60237-0) [DOI] [PubMed] [Google Scholar]
- 6.Comai L. 2005. The advantages and disadvantages of being polyploid. Nat. Rev. Genet. 6, 836–846. ( 10.1038/nrg1711) [DOI] [PubMed] [Google Scholar]
- 7.Freeling M, Thomas BC. 2006. Gene-balanced duplications, like tetraploidy, provide predictable drive to increase morphological complexity. Genome Res. 16, 805–814. ( 10.1101/gr.3681406) [DOI] [PubMed] [Google Scholar]
- 8.Muntzing A. 1936. The evolutionary significance of autopolyploidy. Hereditas 21, 363–378. ( 10.1111/j.1601-5223.1936.tb03204.x) [DOI] [Google Scholar]
- 9.Love A, Love D. 1949. The geobotanical significance of polyploidy. Portugaliae Acta (Suppl), 273–352. [Google Scholar]
- 10.Stebbins GL. 1950. Variation and evolution in plants. New York, NY: Columbia University Press. [Google Scholar]
- 11.Grant V. 1971. Plant speciation, 1st edn New York, NY: Columbia University Press. [Google Scholar]
- 12.Stebbins G. 1966. Chromosomal variation and evolution; polyploidy and chromosome size and number shed light on evolutionary processes in higher plants. Science 152, 1463–1469. ( 10.1126/science.152.3728.1463) [DOI] [PubMed] [Google Scholar]
- 13.Tang H, Bowers JE, Wang X, Ming R, Alam M, Paterson AH. 2008. Synteny and colinearity in plant genomes. Science 320, 486–488. ( 10.1126/science.1153917) [DOI] [PubMed] [Google Scholar]
- 14.Paterson AH, Freeling M, Tang H, Wang X. 2010. Insights from the comparison of plant genome sequences. Annu. Rev. Plant Biol. 61, 349–372. ( 10.1146/annurev-arplant-042809-112235) [DOI] [PubMed] [Google Scholar]
- 15.Gu ZL, Steinmetz LM, Gu X, Scharfe C, Davis RW, Li WH. 2003. Role of duplicate genes in genetic robustness against null mutations. Nature 421, 63–66. ( 10.1038/nature01198) [DOI] [PubMed] [Google Scholar]
- 16.Christoffels A, Koh EGL, Chia JM, Brenner S, Aparicio S, Venkatesh B. 2004. Fugu genome analysis provides evidence for a whole-genome duplication early during the evolution of ray-finned fishes. Mol. Biol. Evol. 21, 1146–1151. ( 10.1093/molbev/msh114) [DOI] [PubMed] [Google Scholar]
- 17.Scannell DR, Byrne KP, Gordon JL, Wong S, Wolfe KH. 2006. Multiple rounds of speciation associated with reciprocal gene loss in polyploid yeasts. Nature 440, 341–345. ( 10.1038/nature04562) [DOI] [PubMed] [Google Scholar]
- 18.Aury JM, et al. 2006. Global trends of whole-genome duplications revealed by the ciliate Paramecium tetraurelia. Nature 444, 171–178. ( 10.1038/nature05230) [DOI] [PubMed] [Google Scholar]
- 19.Lynch M, O'Hely M, Walsh B, Force A. 2001. The probability of preservation of a newly arisen gene duplicate. Genetics 159, 1789–1804. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Lynch M. 2006. The origins of eukaryotic gene structure. Mol. Biol. Evol. 23, 450–468. ( 10.1093/molbev/msj050) [DOI] [PubMed] [Google Scholar]
- 21.Kenrick P, Crane PR. 1997. The origin and early evolution of plants on land. Nature 389, 33–39. ( 10.1038/37918) [DOI] [Google Scholar]
- 22.Becker B, Marin B. 2009. Streptophyte algae and the origin of embryophytes. Ann. Bot. 103, 999–1004. ( 10.1093/aob/mcp044) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Karol KG, McCourt RM, Cimino MT, Delwiche CF. 2001. The closest living relatives of land plants. Science 294, 2351–2353. ( 10.1126/science.1065156) [DOI] [PubMed] [Google Scholar]
- 24.Lewis LA, McCourt RM. 2004. Green algae and the origin of land plants. Am. J. Bot. 91, 1535–1556. ( 10.3732/ajb.91.10.1535) [DOI] [PubMed] [Google Scholar]
- 25.Rensing SA, et al. 2008. The Physcomitrella genome reveals evolutionary insights into the conquest of land by plants. Science 319, 64–69. ( 10.1126/science.1150646) [DOI] [PubMed] [Google Scholar]
- 26.Banks JA, et al. 2011. The Selaginella genome identifies genetic changes associated with the evolution of vascular plants. Science 332, 960–963. ( 10.1126/science.1203810) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Murray B, Leitch I, Bennett M. 2012. Gymnosperm DNA C-values database (release 5.0, Dec. 2012). See http://www.kew.org/cvalues/ [Google Scholar]
- 28.Nystedt B, et al. 2013. The Norway spruce genome sequence and conifer genome evolution. Nature 497, 579–584. ( 10.1038/nature12211) [DOI] [PubMed] [Google Scholar]
- 29.Arabidopsis Genome Initiative. 2000. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408, 796–815. ( 10.1038/35048692) [DOI] [PubMed] [Google Scholar]
- 30.Tuskan GA, et al. 2006. The genome of black cottonwood, Populus trichocarpa (Torr. & Gray). Science 313, 1596–1604. ( 10.1126/science.1128691) [DOI] [PubMed] [Google Scholar]
- 31.Jaillon O, et al. 2007. The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla. Nature 449, 463–467. ( 10.1038/nature06148 [DOI] [PubMed] [Google Scholar]
- 32.Yu J, et al. 2002. A draft sequence of the rice genome (Oryza sativa L. ssp. indica). Science 296, 79–92. ( 10.1126/science.1068037) [DOI] [PubMed] [Google Scholar]
- 33.Paterson AH, et al. 2009. The Sorghum bicolor genome and the diversification of grasses. Nature 457, 551–556. ( 10.1038/nature07723 [DOI] [PubMed] [Google Scholar]
- 34.D'Hont A, et al. 2012. The banana (Musa acuminata) genome and the evolution of monocotyledonous plants. Nature 488, 213–217. ( 10.1038/nature11241 [DOI] [PubMed] [Google Scholar]
- 35.Li L, Stoeckert CJ, Jr, Roos DS. 2003. OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 13, 2178–2189. ( 10.1101/gr.1224503) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Chen F, Mackey AJ, Vermunt JK, Roos DS. 2007. Assessing performance of orthology detection strategies applied to eukaryotic genomes. PLoS ONE 2, e383 ( 10.1371/journal.pone.0000383) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Merchant SS, et al. 2007. The Chlamydomonas genome reveals the evolution of key animal and plant functions. Science 318, 245–250. ( 10.1126/science.1143609) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Wang W, Vinocur B, Shoseyov O, Altman A. 2004. Role of plant heat-shock proteins and molecular chaperones in the abiotic stress response. Trends Plant Sci. 9, 244–252. ( 10.1016/j.tplants.2004.03.006) [DOI] [PubMed] [Google Scholar]
- 39.Delevoryas T. 1979. Polyploidy in gymnosperms. Basic Life Sci. 13, 215–218. [DOI] [PubMed] [Google Scholar]
- 40.Morse AM, et al. 2009. Evolution of genome size and complexity in Pinus. PLoS ONE 4, e4332 ( 10.1371/journal.pone.0004332) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Kovach A, et al. 2010. The Pinus taeda genome is characterized by diverse and highly diverged repetitive sequences. BMC Genomics 11, 420 ( 10.1186/1471-2164-11-420) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Hall SE, Dvorak WS, Johnston JS, Price HJ, Williams CG. 2000. Flow cytometric analysis of DNA content for tropical and temperate New World pines. Ann. Bot. Lond. 86, 1081–1086. ( 10.1006/anbo.2000.1272) [DOI] [Google Scholar]
- 43.Wakamiya I, Newton RJ, Johnston JS, Price HJ. 1993. Genome size and environmental factors in the genus Pinus. Am. J. Bot. 80, 1235–1241. ( 10.2307/2445706) [DOI] [Google Scholar]
- 44.Grotkopp E, Rejmanek M, Sanderson MJ, Rost TL. 2004. Evolution of genome size in pines (Pinus) and its life-history correlates: Supertree analyses. Evolution 58, 1705–1729. ( 10.1111/j.0014-3820.2004.tb00456.x) [DOI] [PubMed] [Google Scholar]
- 45.Burleigh JG, Barbazuk WB, Davis JM, Morse AM, Soltis PS. 2012. Exploring diversification and genome size evolution in extant gymnosperms through phylogenetic synthesis. J. Bot. 2012, 1–6. ( 10.1155/2012/292857) [DOI] [Google Scholar]
- 46.Klintenas M, Pin PA, Benlloch R, Ingvarsson PK, Nilsson O. 2012. Analysis of conifer Flowering Locus T/Terminal Flower1-like genes provides evidence for dramatic biochemical evolution in the angiosperm FT lineage. New Phytol. 196, 1260–1273. ( 10.1111/j.1469-8137.2012.04332.x) [DOI] [PubMed] [Google Scholar]
- 47.Karlgren A, Gyllenstrand N, Kallman T, Sundstrom JF, Moore D, Lascoux M, Lagercrantz U. 2011. Evolution of the PEBP gene family in plants: functional diversification in seed plant evolution. Plant Physiol. 156, 1967–1977. ( 10.1104/pp.111.176206) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Ma H, dePamphilis C. 2000. The ABCs of floral evolution. Cell 101, 5–8. ( 10.1016/S0092-8674(00)80618-2) [DOI] [PubMed] [Google Scholar]
- 49.Sperry JS, Hacke UG, Pittermann J. 2006. Size and function in conifer tracheids and angiosperm vessels. Am. J. Bot. 93, 1490–1500. ( 10.3732/ajb.93.10.1490) [DOI] [PubMed] [Google Scholar]
- 50.Kubo M, Udagawa M, Nishikubo N, Horiguchi G, Yamaguchi M, Ito J, Mimura T, Fukuda H, Demura T. 2005. Transcription switches for protoxylem and metaxylem vessel formation. Genes Dev. 19, 1855–1860. ( 10.1101/gad.1331305) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Regal PJ. 1977. Ecology and evolution of flowering plant dominance. Science 196, 622–629. ( 10.1126/science.196.4290.622) [DOI] [PubMed] [Google Scholar]
- 52.Crane PR, Friis EM, Pedersen KR. 1995. The origin and early diversification of angiosperms. Nature 374, 27–33. ( 10.1038/374027a0) [DOI] [Google Scholar]
- 53.Soltis PS, Soltis DE. 2004. The origin and diversification of angiosperms. Am. J. Bot. 91, 1614–1626. ( 10.3732/ajb.91.10.1614) [DOI] [PubMed] [Google Scholar]
- 54.Amborella Genome Project. 2013. The Amborella genome and the evolution of flowering plants. Science 342, 1241089 ( 10.1126/science.1241089) [DOI] [PubMed] [Google Scholar]
- 55.Smith SF, Snell P, Gruetzner F, Bench AJ, Haaf T, Metcalfe JA, Green AR, Elgar G. 2002. Analyses of the extent of shared synteny and conserved gene orders between the genome of Fugu rubripes and human 20q. Genome Res. 12, 776–784. ( 10.1101/gr.221802) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Dehal P, Boore JL. 2005. Two rounds of whole genome duplication in the ancestral vertebrate. PLoS Biol. 3, e314 ( 10.1371/journal.pbio.0030314) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Bowers JE, Chapman BA, Rong J, Paterson AH. 2003. Unravelling angiosperm genome evolution by phylogenetic analysis of chromosomal duplication events. Nature 422, 433–438. ( 10.1038/nature01521) [DOI] [PubMed] [Google Scholar]
- 58.Tang H, Wang X, Bowers JE, Ming R, Alam M, Paterson AH. 2008. Unraveling ancient hexaploidy through multiply-aligned angiosperm gene maps. Genome Res. 18, 1944–1954. ( 10.1101/gr.080978.108) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Coghlan A, Eichler EE, Oliver SG, Paterson AH, Stein L. 2005. Chromosome evolution in eukaryotes: a multi-kingdom perspective. Trends Genet. 21, 673–682. ( 10.1016/j.tig.2005.09.009) [DOI] [PubMed] [Google Scholar]
- 60.Chaw SM, Chang CC, Chen HL, Li WH. 2004. Dating the monocot-dicot divergence and the origin of core eudicots using whole chloroplast genomes. J. Mol. Evol. 58, 424–441. ( 10.1007/s00239-003-2564-9) [DOI] [PubMed] [Google Scholar]
- 61.Wang Y, et al. 2012. MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Res. 40, e49 ( 10.1093/nar/gkr1293) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Vandepoele K, Saeys Y, Simillion C, Raes J, Van de Peer Y. 2002. The automatic detection of homologous regions (ADHoRe) and its application to microcolinearity between Arabidopsis and rice. Genome Res. 12, 1792–1801. ( 10.1101/gr.400202) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Cannon SB, Kozik A, Chan B, Michelmore R, Young ND. 2003. DiagHunter and GenoPix2D: programs for genomic comparisons, large-scale homology discovery and visualization. Genome Biol. 4, R68 10.1186/gb-2003-4-10) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Haas BJ, Delcher AL, Wortman JR, Salzberg SL. 2004. DAGchainer: a tool for mining segmental genome duplications and synteny. Bioinformatics 20, 3643–3646. ( 10.1093/bioinformatics/bth397) [DOI] [PubMed] [Google Scholar]
- 65.Wang XY, Shi XL, Li Z, Zhu QH, Kong L, Tang W, Ge S, Luo JC. 2006. Statistical inference of chromosomal homology based on gene colinearity and applications to Arabidopsis and rice. BMC Bioinform. 7, 447 ( 10.1186/1471-2105-7-447) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Lyons E, et al. 2008. Finding and comparing syntenic regions among Arabidopsis and the outgroups papaya, poplar, and grape: CoGe with rosids. Plant Physiol. 148, 1772–1781. ( 10.1104/pp.108.124867) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Lyons E, Pedersen B, Kane J, Freeling M. 2008. The value of nonmodel genomes and an example using synmap within CoGe to dissect the hexaploidy that predates the rosids. Trop. Plant Biol. 1, 181–190. ( 10.1007/s12042-008-9017-y) [DOI] [Google Scholar]
- 68.Lyons E, Freeling M, Kustu S, Inwood W. 2011. Using genomic sequencing for classical genetics in E. coli K12. PLoS ONE 6, e16717 ( 10.1371/journal.pone.0016717) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Lee TH, Tang H, Wang X, Paterson AH. 2013. PGDD: a database of gene and genome duplication in plants. Nucleic Acids Res. 41, D1152–D1158. ( 10.1093/nar/gks1104) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Tang H, Bowers JE, Wang X, Paterson AH. 2009. Angiosperm genome comparisons reveal early polyploidy in the monocot lineage. Proc. Natl Acad. Sci. USA 107, 472–477. ( 10.1073/pnas.0908007107) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Ibarra-Laclette E, et al. 2013. Architecture and evolution of a minute plant genome. Nature 498, 94–98. ( 10.1038/nature12132) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Lynch M, Force A. 2000. The probability of duplicate gene preservation by subfunctionalization. Genetics 154, 459–473. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.He XL, Zhang JZ. 2005. Rapid subfunctionalization accompanied by prolonged and substantial neofunctionalization in duplicate gene evolution. Genetics 169, 1157–1164. ( 10.1534/genetics.104.037051) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Haldane JBS. 1933. The part played by recurrent mutation in evolution. Am. Nat. 67, 5–19. ( 10.1086/280465) [DOI] [Google Scholar]
- 75.Brunet FG, Crollius HR, Paris M, Aury JM, Gibert P, Jaillon O, Laudet V, Robinson-Rechavi M. 2006. Gene loss and evolutionary rates following whole genome duplication in teleost fishes. Mol. Biol. Evol. 23, 1808–1816. ( 10.1093/molbev/msl049) [DOI] [PubMed] [Google Scholar]
- 76.Chapman BA, Bowers JE, Feltus FA, Paterson AH. 2006. Buffering crucial functions by paleologous duplicated genes may impart cyclicality to angiosperm genome duplication. Proc. Natl Acad. Sci. USA 103, 2730–2735. ( 10.1073/pnas.0507782103) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Seoighe C, Gehring C. 2004. Genome duplication led to highly selective expansion of the Arabidopsis thaliana proteome. Trends Genet. 20, 461–464. ( 10.1016/j.tig.2004.07.008) [DOI] [PubMed] [Google Scholar]
- 78.Paterson AH, Chapman BA, Kissinger J, Bowers JE, Feltus FA, Estill J, Marler BS. 2006. Convergent retention or loss of gene/domain families following independent whole-genome duplication events in Arabidopsis, Oryza, Saccharomyces, and Tetraodon. Trends Genet. 22, 597–602. ( 10.1016/j.tig.2006.09.003) [DOI] [PubMed] [Google Scholar]
- 79.Gao LZ, Innan H. 2004. Very low gene duplication rate in the yeast genome. Science 306, 1367–1370. ( 10.1126/science.1102033) [DOI] [PubMed] [Google Scholar]
- 80.Wang X, Tang H, Bowers JE, Feltus FA, Paterson AH. 2007. Extensive concerted evolution of rice paralogs and the road to regaining independence. Genetics 177, 1753–1763. ( 10.1534/genetics.107.073197) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Maere S, De Bodt S, Raes J, Casneuf T, Van Montagu M, Kuiper M, Van de Peer Y. 2005. Modeling gene and genome duplications in eukaryotes. Proc. Natl Acad. Sci. USA 102, 5454–5459. ( 10.1073/pnas.0501102102) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82.Blanc G, Wolfe KH. 2004. Functional divergence of duplicated genes formed by polyploidy during Arabidopsis evolution. Plant Cell 16, 1679–1691. ( 10.1105/tpc.021410) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Blanc G, Wolfe KH. 2004. Widespread paleopolyploidy in model plant species inferred from age distributions of duplicate genes. Plant Cell 16, 1667–1678. ( 10.1105/tpc.021345) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84.Lynch M, Conery JS. 2000. The evolutionary fate and consequences of duplicate genes. Science 290, 1151–1155. ( 10.1126/science.290.5494.1151) [DOI] [PubMed] [Google Scholar]
- 85.Long M, Betran E, Thornton K, Wang W. 2003. The origin of new genes: glimpses from the young and old. Nat. Rev. Genet. 4, 865–875. ( 10.1038/nrg1204) [DOI] [PubMed] [Google Scholar]
- 86.Wang YP, Wang XY, Tang HB, Tan X, Ficklin SP, Feltus FA, Paterson AH. 2011. Modes of gene duplication contribute differently to genetic novelty and redundancy, but show parallels across divergent angiosperms. PLoS ONE 6, e28150.. ( 10.1371/journal.pone.0028150) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 87.Jiao Y, et al. 2011. Ancestral polyploidy in seed plants and angiosperms. Nature 473, U97–U113. ( 10.1038/nature09916) [DOI] [PubMed] [Google Scholar]
- 88.Mathews S, Burleigh JG, Donoghue MJ. 2003. Adaptive evolution in the photosensory domain of phytochrome A in early angiosperms. Mol. Biol. Evol. 20, 1087–1097. ( 10.1093/molbev/msg123) [DOI] [PubMed] [Google Scholar]
- 89.Prigge MJ, Clark SE. 2006. Evolution of the class III HD-Zip gene family in land plants. Evol. Dev. 8, 350–361. ( 10.1111/j.1525-142X.2006.00107.x) [DOI] [PubMed] [Google Scholar]
- 90.Paterson AH, Bowers JE, Chapman BA. 2004. Ancient polyploidization predating divergence of the cereals, and its consequences for comparative genomics. Proc. Natl Acad. Sci. USA 101, 9903–9908. ( 10.1073/pnas.0307901101) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 91.Wang X, Gowik U, Tang H, Bowers JE, Westhoff P, Paterson AH. 2009. Comparative genomic analysis of C4 photosynthetic pathway evolution in grasses. Genome Biol. 10, R68 ( 10.1186/gb-2009-10-6-r68) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 92.Vekemans D, Proost S, Vanneste K, Coenen H, Viaene T, Ruelens P, Maere S, Van de Peer Y, Geuten K. 2012. Gamma paleohexaploidy in the stem lineage of core eudicots: significance for MADS-box gene and species diversification. Mol. Biol. Evol. 29, 3793–3806. ( 10.1093/molbev/mss183) [DOI] [PubMed] [Google Scholar]
- 93.Ming R, et al. 2013. Genome of the long-living sacred lotus (Nelumbo nucifera Gaertn.). Genome Biol. 14, R41 ( 10.1186/gb-2013-14-5-r41) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 94.Jiao Y, et al. 2012. A genome triplication associated with early diversification of the core eudicots. Genome Biol. 13, R3 ( 10.1186/gb-2012-13-1-r3) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 95.Sato S, et al. 2012. The tomato genome sequence provides insights into fleshy fruit evolution. Nature 485, 635–641. ( 10.1038/nature11119) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 96.Yupeng W, Ficklin SP, Wang X, Alex Feltus F, Paterson AH. Submitted. Dispersed duplicated genes and the evolution of genetic complexity.
- 97.Freeling M. 2009. Bias in plant gene content following different sorts of duplication: tandem, whole-genome, segmental, or by transposition. Annu. Rev. Plant Biol. 60, 433–453. ( 10.1146/annurev.arplant.043008.092122) [DOI] [PubMed] [Google Scholar]
- 98.Kassahn KS, Dang VT, Wilkins SJ, Perkins AC, Ragan MA. 2009. Evolution of gene function and regulatory control after whole-genome duplication: comparative analyses in vertebrates. Genome Res. 19, 1404–1418. ( 10.1101/gr.086827.108) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 99.Edger PP, Pires JC. 2009. Gene and genome duplications: the impact of dosage-sensitivity on the fate of nuclear genes. Chromosome Res. 17, 699–717. ( 10.1007/s10577-009-9055-9) [DOI] [PubMed] [Google Scholar]
- 100.Paterson AH, Chapman BA, Kissinger JC, Bowers JE, Feltus FA, Estill JC. 2006. Many gene and domain families have convergent fates following independent whole-genome duplication events in Arabidopsis, Oryza, Saccharomyces and Tetraodon. Trends Genet. 22, 597–602. ( 10.1016/j.tig.2006.09.003) [DOI] [PubMed] [Google Scholar]
- 101.Davis JC, Petrov DA. 2004. Preferential duplication of conserved proteins in eukaryotic genomes. PLoS Biol. 2, E55 ( 10.1371/journal.pbio.0020055) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 102.Seoighe C, Wolfe KH. 1999. Yeast genome evolution in the post-genome era. Curr. Opin. Microbiol. 2, 548–554. ( 10.1016/S1369-5274(99)00015-6) [DOI] [PubMed] [Google Scholar]
- 103.Makino T, McLysaght A. 2012. Positionally biased gene loss after whole genome duplication: evidence from human, yeast, and plant. Genome Res. 22, 2427–2435. ( 10.1101/gr.131953.111) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 104.Woodhouse MR, Schnable JC, Pedersen BS, Lyons E, Lisch D, Subramaniam S, Freeling M. 2010. Following tetraploidy in maize, a short deletion mechanism removed genes preferentially from one of the two homologs. PLoS Biol. 8, e1000409 ( 10.1371/journal.pbio.1000409) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 105.Duarte JM, Wall PK, Edger PP, Landherr LL, Ma H, Pires JC, Leebens-Mack J, dePamphilis CW. 2010. Identification of shared single copy nuclear genes in Arabidopsis, Populus, Vitis and Oryza and their phylogenetic utility across various taxonomic levels. BMC Evol. Biol. 10, 61 ( 10.1186/1471-2148-10-61) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 106.De Smet R, Adams KL, Vandepoele K, van Montagu MCE, Maere S, Van de Peer Y. 2013. Convergent gene loss following gene and genome duplications creates single-copy families in flowering plants. Proc. Natl Acad. Sci. USA 110, 2898–2903. ( 10.1073/pnas.1300127110) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 107.Zhang N, Zeng LP, Shan HY, Ma H. 2012. Highly conserved low-copy nuclear genes as effective markers for phylogenetic analyses in angiosperms. New Phytol. 195, 923–937. ( 10.1111/j.1469-8137.2012.04212.x) [DOI] [PubMed] [Google Scholar]
- 108.Wu FN, Mueller LA, Crouzillat D, Petiard V, Tanksley SD. 2006. Combining bioinformatics and phylogenetics to identify large sets of single-copy orthologous genes (COSII) for comparative, evolutionary and systematic studies: a test case in the euasterid plant clade. Genetics 174, 1407–1420. ( 10.1534/genetics.106.062455) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 109.Rong J, Feltus FA, Liu L, Lin L, Paterson AH. 2010. Gene copy number evolution during tetraploid cotton radiation. Heredity 105, 463–472. ( 10.1038/hdy.2009.192) [DOI] [PubMed] [Google Scholar]
- 110.Werth CR, Windham MD. 1991. A model for divergent, allopatric speciation of polyploid pteridophytes resulting from silencing of duplicate-gene expression. Am. Nat. 137, 515–526. ( 10.1086/285180) [DOI] [Google Scholar]
- 111.Lynch M, Force AG. 2000. The origin of interspecific genomic incompatibility via gene duplication. Am. Nat. 156, 590–605. ( 10.1086/316992) [DOI] [PubMed] [Google Scholar]
- 112.Maclean CJ, Greig D. 2011. Reciprocal gene loss following experimental whole-genome duplication causes reproductive isolation in yeast. Evolution 65, 932–945. ( 10.1111/j.1558-5646.2010.01171.x) [DOI] [PubMed] [Google Scholar]
- 113.Schnable JC, Freeling M, Lyons E. 2012. Genome-wide analysis of syntenic gene deletion in the grasses. Genome Biol. Evol. 4, 265–277. ( 10.1093/gbe/evs009) [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.



