Integrated Syntenic and Phylogenomic Analyses Reveal an Ancient Genome Duplication in Monocots

Yuannian Jiao; Jingping Li; Haibao Tang; Andrew H Paterson

doi:10.1105/tpc.114.127597

. 2014 Jul 31;26(7):2792–2802. doi: 10.1105/tpc.114.127597

Integrated Syntenic and Phylogenomic Analyses Reveal an Ancient Genome Duplication in Monocots^{^[W]}

Yuannian Jiao ^a,¹, Jingping Li ^a,^b,¹, Haibao Tang ^c,^d,¹, Andrew H Paterson ^a,²

PMCID: PMC4145114 PMID: 25082857

Whole-genome duplication (WGD) is a primary source of genetic material for evolutionary variation. This work compares the genomes of four monocots and two eudicots using integrated phylogenomic and syntenic analyses, revealing an ancient WGD that shaped the genomes of all commelinid monocots, including grasses, bromeliads, bananas, gingers, palms, and other economically important plants.

Abstract

Unraveling widespread polyploidy events throughout plant evolution is a necessity for inferring the impacts of whole-genome duplication (WGD) on speciation, functional innovations, and to guide identification of true orthologs in divergent taxa. Here, we employed an integrated syntenic and phylogenomic analyses to reveal an ancient WGD that shaped the genomes of all commelinid monocots, including grasses, bromeliads, bananas (Musa acuminata), ginger, palms, and other plants of fundamental, agricultural, and/or horticultural interest. First, comprehensive phylogenomic analyses revealed 1421 putative gene families that retained ancient duplication shared by Musa (Zingiberales) and grass (Poales) genomes, indicating an ancient WGD in monocots. Intergenomic synteny blocks of Musa and Oryza were investigated, and 30 blocks were shown to be duplicated before Musa-Oryza divergence an estimated 120 to 150 million years ago. Synteny comparisons of four monocot (rice [Oryza sativa], sorghum [Sorghum bicolor], banana, and oil palm [Elaeis guineensis]) and two eudicot (grape [Vitis vinifera] and sacred lotus [Nelumbo nucifera]) genomes also support this additional WGD in monocots, herein called Tau (τ). Integrating synteny and phylogenomic comparisons achieves better resolution of ancient polyploidy events than either approach individually, a principle that is exemplified in the disambiguation of a WGD series of rho (ρ)-sigma (σ)-tau (τ) in the grass lineages that echoes the alpha (α)-beta (β)-gamma (γ) series previously revealed in the Arabidopsis thaliana lineage.

INTRODUCTION

Whole-genome duplication (WGD), or polyploidy, is a primary source of raw genetic material for evolutionary novelties (Ohno, 1970; Lynch and Conery, 2000; Adams and Wendel, 2005; Soltis et al., 2009). Although a straight change in ploidy is generally expected to be deleterious and often lead to an evolutionary dead end (Otto and Whitton, 2000; Mayrose et al., 2011; Arrigo and Barker, 2012), many modern organisms have descended from paleopolyploidized ancestors, including plants (Grant et al., 2000; Vision et al., 2000; Bowers et al., 2003; Blanc and Wolfe, 2004; Jaillon et al., 2007; Ming et al., 2008; Tang et al., 2008a; Barker et al., 2009; Fawcett et al., 2009; Paterson et al., 2009; International Brachypodium Initiative, 2010; Jiao et al., 2011; Wang et al., 2011; D’Hont et al., 2012; Amborella Genome Project, 2013; Ming et al., 2013), animals (Christoffels et al., 2004; Jaillon et al., 2004; Dehal and Boore, 2005), and fungi (Kellis et al., 2004). WGDs are especially common in flowering plant (angiosperm) lineages, all of which have experienced paleopolyploidies in their evolutionary history (Grant et al., 2000; Vision et al., 2000; Bowers et al., 2003; Blanc and Wolfe, 2004; Cui et al., 2006; Ming et al., 2008, 2013; Tang et al., 2008a, 2010; Fawcett et al., 2009; Paterson et al., 2009; Schnable et al., 2009; Soltis et al., 2009; International Brachypodium Initiative, 2010; Jiao et al., 2011; Van de Peer, 2011; Wang et al., 2011; D’Hont et al., 2012; Amborella Genome Project, 2013; Singh et al., 2013). However, although several paleopolyploidy events have been intensively studied in eudicots, including the pan-core eudicot γ event (Jaillon et al., 2007; Tang et al., 2008b; Jiao et al., 2012; Ming et al., 2013), most paleopolyploidies in monocots have remained elusive due to the availability of fewer sequenced lineages.

The first WGD event discovered in the ancestral lineage leading to modern cereals (Poaceae family) was named “ρ,” first estimated to have occurred ∼70 million years ago (MYA) based on molecular data (Paterson et al., 2004; Wang et al., 2005), and possibly being ∼30 million years older considering recent fossil discoveries (Prasad et al., 2005). The main Poaceae clades (BEP and PACCAD) evolved as separate lineages for about two-thirds of the time since ρ (Paterson et al., 2004, 2009). Nonetheless, in those family members that have not experienced further genome duplications and subsequent diploidization, such as rice (Oryza sativa) and sorghum (Sorghum bicolor), synteny conservation spans large genomic regions covering hundreds to thousands of orthologous genes (Paterson et al., 2009). The rice and sorghum genomes also share 97 to 98% of post-ρ gene loss in orthologous regions (Paterson et al., 2009), typical of genome comparisons among other cereals (Schnable et al., 2009; International Brachypodium Initiative, 2010; Bennetzen et al., 2012) and supporting the deduction that the radiation of the major cereal lineages occurred ∼20 million years or more after ρ (Paterson et al., 2009). The banana (Musa acuminata) genome has undergone three successive rounds of WGD since its divergence with grasses (D’Hont et al., 2012). The oil palm (Elaeis guineensis) and date palm (Phoenix dactylifera) genomes share one recent WGD (Al-Mssallem et al., 2013; Singh et al., 2013). Previous genomic analysis in the banana and oil palm provided no indications of additional more ancient WGD before the divergence of Poales and Zingiberales (D’Hont et al., 2012; Singh et al., 2013).

Nested within homoeologous regions formed in well-characterized genome duplications are sometimes found additional paralogs from more ancient paleopolyploidy events that have been obscured by subsequent reduplications and rediploidizations. For example, the event called “σ” in our earlier study (Tang et al., 2010) was suspected to comprise multiple events that could not be resolved due to lack of evidence from multiple monocot lineages needed for disambiguation.

Here, by comparing the genomes of the monocots rice, sorghum (Poales), oil palm (Arecales), and banana (Zingiberales), and eudicot genomes from grape (Vitis vinifera; Vitales) and sacred lotus (Nelumbo nucifera; Proteales, of basal eudicots), we show that prior to ρ, many monocot lineages had already experienced two distinct paleopolyploidies. We denote the first as “τ” and the following “σ” to complete a WGD series of ρ – σ – τ in the grass lineage that echoes the α – β – γ series in the Arabidopsis lineage. Key to this discovery, and of high value for clarifying cryptic genome duplications in other taxa, is integration of phylogenomic and synteny analyses (Figure 1), to explore genome paleo-evolution. This approach can be easily applied to multiple divergent genomes and circumscribe nested paleopolyploidy events, as exemplified in our study, and therefore is attractive for studying modern angiosperm genomes after a long and complex history of paleopolyploidies.

Figure 1. — Schematic Diagram Detailing an Integrated Synteny and Phylogenomic Approach for Circumscribing WGD Events on a Species Phylogeny.

**(A)** Flowchart of the methodology.

**(B)** Rationale of ordering speciation and WGD events in the history of two lineages. Chromosome segments are represented by gray bars, and genes are represented by colored bricks.

**(C)** Illustration of homoeologous genes (anchors) mapping onto the phylogeny of a homologous gene family in the phylogenomic approach used to date these homologous regions.

RESULTS

Phylogenomic Analysis

To investigate ancient WGDs, 15 monocot genomes were used to construct putative gene families or orthogroups (see Methods). Of the 39,427 total gene families, 11,503 containing at least one monocot and one eudicot sequence were used to construct maximum likelihood trees. Using a gene duplication scoring strategy previously described (Jiao et al., 2011), shared duplications (Figure 2) were identified from constructed gene family phylogenies after monocot-eudicot divergence but before grass-banana divergence (1450 duplications in 1421 gene families with bootstrap support [BS] ≥50%; 552 duplications in 540 families with BS ≥80%). Most gene families retaining high levels of shared duplicates are statistically enriched in Gene Ontology categories corresponding to kinase, transferase, transporter, transcription regulator, and transcription factor (Supplemental Data Set 1), consistent with the general postgenome duplication survivorship patterns (Maere et al., 2005; Paterson et al., 2006; Freeling, 2009).

Figure 2. — Phylogenetic Timing of Inferred Gene Duplications.

Values given are the number of orthogroups showing duplications at the specified branches on the trees generated with the maximum likelihood method with bootstrap value ≥50%.

Orthogroup gene trees show excesses of internal nodes potentially corresponding to WGD events. For example, WRKY transcription factors (orthogroup 1297) survived ancient duplication shared by grasses, palms, and Musa after divergence from eudicots (Supplemental Figure 1). Additional younger lineage-specific duplication events within the sampled grasses, palms, and Musa were largely consistent with published views of WGD timing (ρ and σ in grasses, three WGDs in Musa, and one in a palm progenitor) (Tang et al., 2010; D’Hont et al., 2012). Another exemplary gene family is the C3HC4-type RING zinc finger transcription factor family (orthogroup 1312), which showed almost maximal retention across many lineages, surviving two nested duplications in grass lineages, one duplication shared by palms, two nested duplications in Musa lineages, and the gamma duplication in eudicots (Supplemental Figure 2).

Multiple Synteny Alignments of Musa Using Oryza as Reference

We performed a careful analysis of Oryza-Musa synteny to pinpoint phylogenetic signals of deep-time monocot duplication events, identifying 1021 syntenic blocks using MCScanX (Wang et al., 2012), including 9371 gene pairs. Syntenic regions of Musa were then aligned against Oryza chromosomes, with 22,223 Oryza genes in regions which matched one to eight different Musa regions (hereinafter denoted as homologous clusters) and 513 Oryza genes in regions which matched 9∼15 Musa regions. The redundancy ratio of 1 Oryza:8 Musa is consistent with three genome doublings in the Musa genome, while redundancy of 1 to 9∼15 suggests at least one additional WGD.

Duplicated blocks were further dated by mapping the syntenic anchors onto phylogenetic trees (see Methods). Supplemental Figure 3A shows a homologous cluster of eight Musa regions aligned to Oryza chromosome 5. Most anchor genes on these eight regions were determined phylogenetically as having been duplicated after the Musa-Oryza divergence (Supplemental Figure 3B). In total, we found 47 such homologous clusters (1 Oryza to 4+ Musa) duplicated after Musa-Oryza divergence, supporting three previously identified Musa WGDs. We also found 30 homologous clusters that were generated by a more ancient duplication, before Musa-Oryza divergence. For example, one Oryza region (Chr3: 8.7 to 10.5 Mb) matched up to 11 regions in Musa (Figure 3A). The Musa region groups A (Chr2: 17.1 to 17.3 Mb and Chr6: 32.9 to 33.0 Mb) and B (Chr5: 28.4 to 28.8 Mb, Chr6: 9.9 to 10.5 Mb and 32.1 to 32.6 Mb, Chr7: 12.0 to 12.5 Mb, Chr8: 25.8 to 26.0 Mb, Chr9: 2.4 to 3.0 Mb, Chr10: 7.2 to 7.4 Mb, 21.9 to 22.2 Mb and 27.7 to 28.0 Mb, Chr11: 23.6 to 24.2 Mb) were duplicated after the divergence of Musa and Oryza, respectively, while the duplication of the ancestral blocks of A and B occurred before the divergence of Musa and Oryza (Figure 3b).

Overview of Synteny Conservation among Six Representative Genomes

Genome alignments of four monocot (rice, sorghum, banana, and oil palm) and two eudicot (grape and sacred lotus) species (Supplemental Table 1) revealed clear evidence of σ and τ. In total, 97.66% (38,136) of rice genes, 98.70% (34,046) of sorghum genes, 96.50% (35,272) of banana genes, 86.10% (27,746) of oil palm genes, 95.76% (25,228) of grape genes, and 92.87% (24,782) of sacred lotus genes were covered in the aligned regions.

Comparison between lotus and oil palm genomes showed a mode of 2-to-4 multiplicity ratio between orthologs (Table 1), with the “2” indicative of the lotus lambda (λ) duplication (Ming et al., 2013) and the “4” indicating two paleotetraploidy events in the oil palm lineage that include τ and palm-specific “p” (Figure 4A) (Singh et al., 2013). The p WGD in oil palm and the σ and ρ WGDs in rice after their separation were reflected by the 2-to-4 multiplicity ratio. Collectively, the ploidy ratios across these pairwise comparisons indicate one WGD (σ) in the lineage leading to grasses before ρ and an additional ancestral event (τ) preceding the Poaceae-Arecaceae divergence (Figure 4). These events imply a total of 8× paleo-multiplicity in the rice genome after monocot-dicot divergence, as confirmed by both lotus-rice and grape-rice comparisons (Table 1).

Table 1. Multiplicity Ratios between Pairs of Genomes Resulting from Independent WGDs in Bottom Left Section and Number of Anchors (Number of Synteny Blocks) in Upper Right Section.

	Rice	Sorghum	Banana	Oil Palm	Grape	Sacred Lotus
Rice	–	18,377 (57)	18,871 (1,779)	15,879 (826)	10,582 (956)	11,518 (1,015)
Sorghum	1:1	–	18,681 (1,755)	15,447 (802)	10,242 (913)	11,001 (958)
Banana	8:4	8:4	–	20,481 (1,546)	12,718 (1,363)	12,725 (1,327)
Oil palm	2:4	2:4	2:8	–	16,504 (1,007)	17,798 (1,056)
Grape	3:8	3:8	3:16	3:4	–	18,003 (685)
Sacred lotus	2:8	2:8	2:16	2:4	2:3	–

Open in a new tab

Figure 4. — Comparison of Six Genomes and History of Paleopolyploidization.

**(A)** Phylogeny of lineages under study with other angiosperm lineages abstracted. Paleopolyploidy events are represented by orange (duplication) or brown (triplication) circles filled with their names. Color shades on tree branches indicate positions of genome comparison and whether they precede or postdate the adjacent polyploidies.

**(B)** Dot plots (dots represent matching loci between two genomes) of example regions from each of the designated genome comparisons are shown, outlined by the same color as on the tree. Quota ratios of the alignments are shown below the dot plots. Paleopolyploidy events underlying the alignment patterns are detailed in the text. Geologic timing is based on estimation, as precise values are usually unknown for ancient events.

With successive WGDs followed by loss of most duplicated genes, ancestral gene orders become progressively more fragmented and discernible synteny blocks become smaller (Table 1). The number of anchor genes tends to show a negative correlation with species divergence (Table 1), indicating that reciprocal gene loss and/or gene transposition is a persistent process on a large evolutionary scale. Nonetheless, differential retention of ancestral loci in early Poales branches has slowed down in the cereal genomes after the most recent WGD, ρ. About 4.68% of banana and 6.85% of oil palm genes have no orthologs in the other studied monocots but do have grape and lotus (eudicot) orthologs. This percentage is smaller in rice (3.00%) and sorghum (3.51%). However, the majority (∼80%) of ancestral loci that had been differentially retained in the lineage leading to cereals (since their divergence from other Commelinids 120 to ∼83 MYA; The Angiosperm Phylogeny Group, 2009) are still syntenic in present-day rice and sorghum genomes. The remaining ∼20% have been differentially lost or transposed since the divergence of the two species 80 to ∼50 MYA (The Angiosperm Phylogeny Group, 2009).

Circumscribing the σ Duplication

Our grape-rice genome comparison showed that there are more paralogous regions in rice than those produced in the pan-grass ρ, indicating more ancient paleopolyploidizations designated as σ (Tang et al., 2010). Such nested WGDs are often best revealed by bottom-up approaches (Bowers et al., 2003), attempting to reverse the changes resulting from more recent events (in this case ρ) superimposed on them. Using such bottom-up reconstruction (see Methods), a total of 146 p (palm WGD) blocks covering 71.74% of the oil palm genome were merged into pre-p ancestral gene orders. Similarly, 140 ρ (most recent grass WGD) blocks covering 70.63% of the rice genome were merged into pre-ρ gene orders. Direct comparison between the ancestral orders resulted in 209 synteny blocks covering 96.34% of the pre-p order and 72.77% of the pre-ρ order, each pre-p region aligning with up to two paralogous pre-ρ regions, as exemplified in Figure 4B. This clearly revealed that the σ event occurred before the pan-Poaceae ρ event (∼70 MYA; Paterson et al., 2004) but after the Poaceae-Arecaceae split (120 to ∼83 MYA; The Angiosperm Phylogeny Group, 2009).

As the second largest monocot family and fifth largest angiosperm family, the Poaceae is rich in morphological and ecological diversity. Much of the underlying genetic diversity may have resulted from σ and ρ WGDs. The events may also have contributed to some grass-specific characters. Genomes from basal Poales species such as pineapple (Ananas comosus) could be useful to further narrow the dating of ρ and σ.

Unraveling the Precommelinid τ Duplication Event in Early Monocots

To simplify the study of τ in the deep monocot lineages, we compared oil palm and sacred lotus, each having experienced only one subsequent genome duplication. We reconstructed the oil palm pre-p and the sacred lotus pre-λ gene orders as described above for σ, covering 71.74 and 74.05% of the respective genomes. Alignment of the reconstructed genomes resulted in 203 ancestral synteny blocks covering 69.44% of pre-λ order and 73.53% of pre-p order. One pre-λ region generally matched up to two pre-p regions (Figure 4B), indicating that an oil palm ancestor experienced another more ancient paleotetraploidy event before p but after it diverged with the eudicots. Syntenic regions with more than two homoeologous copies retained are distributed across all 16 oil palm chromosomes, confirming that this event was genome-wide. Since we have shown above that oil palm did not share σ, this new paleotetraploidy is inferred to be τ (Figure 4A).

Using the eudicot outgroup genomes of grape or lotus, which only had one paleopolyploidy in their own lineages, up to four orthologous regions can be identified in oil palm, indicating two WGDs, τ and p, and up to eight orthologous regions can be identified in rice, indicating three WGDs, τ, σ, and ρ (Figure 5). However, fast evolutionary rates and three WGDs in rice have severely increased paleoparalog loss, with only 10,877 paralogous copies of pre-τ loci retained (excluding tandem duplications) among 39,049 rice genes. In contrast, oil palm evolved at a much slower rate (about one half that of grasses) and had one less WGD. About 16,092 pre-τ paralogs are retained (excluding tandem duplications) among 32,225 oil palm genes. In both genomes, recurring WGDs severely eroded signals of paleo-paralogy from τ, with pre-τ loci retained at 1.4 copies on average in rice (versus 8 at full retention), and 1.5 copies on average in oil palm (versus 4 at full retention).

Figure 5. — Multiple Alignment of a Set of Syntenic Regions in Oil Palm, Rice, and Lotus.

Triangles represent individual genes and their transcriptional orientations. Genes with no syntenic matches in the selected regions are not plotted. The event τ is the pan-Commelinid paleotetraploidy that is shared by the oil palm and rice lineages but not with the dicots. The events σ and ρ are two paleotetraploidies in the ancestral grass lineages that are not shared with the palm and banana lineages. The event λ is the paleotetraploidy in the lotus (*Nelumbo nucifera*) lineage in basal eudicots. Aligned genes in the homoeologous regions are merged into consensus orders to approximate the pre-duplicated ancestral regions. Ancestral genes with uncertain orientations are represented by squares.

Simultaneous Circumscription of Paleopolyploidy Events in Multiple Lineages

To integrate many of the individual synteny analyses described above, a streamlined procedure (see Methods) was applied to the rice, sorghum, banana, oil palm, grape, and sacred lotus genomes. The six genomes were compared in a pairwise manner and segmented at boundaries of synteny blocks, forming sets of putative ancestral regions (PARs; clusters of homoeologous regions with high anchor density) as described (Tang et al., 2010). By integrating syntenic mapping information from 1217 groups of PARs from 15 pairs of genome comparisons, paleopolyploidy levels were inferred on each branch of the given species phylogeny (Supplemental Figure 4). Two alternative topologies of the species tree were used to reflect contending evolutionary scenarios differing in whether Arecales are sister to both Poales (containing grasses) and Zingiberales (containing banana) (The Angiosperm Phylogeny Group, 2009) or with the Arecales-Zingiberales split after their most recent common ancestor (MRCA) diverged with the Poales MRCA (D’Hont et al., 2012; Singh et al., 2013). Estimation on the two species trees each inferred two genome duplications (σ and ρ) on the pan-grass lineage and one (τ) preceding the grass-palm-banana divergence. The results verified individual analyses from previous and above studies and systematically dissected the six genomes into homoeologous clusters, the PARs, as produced from their paleopolyploid history.

DISCUSSION

Integrating synteny and phylogenomic comparisons achieves better resolution of ancient polyploidy events than either approach individually, a principle that is exemplified in the disambiguation of a WGD series of ρ – σ – τ in the grass lineages that echoes the α – β – γ series previously revealed in the Arabidopsis lineage.

Using this integrated approach, we were able to reveal a WGD, τ, in the early evolution of monocots, also confirming nine previously identified WGDs (Figure 4). Such additional WGD was hinted at previously (Tang et al., 2010) but not discernible due to lack of genomic data from non-grass monocots. We infer τ to have occurred near the origin of monocots, before the origin of commelinids (which spans all sequenced monocot genomes thus far) but after the monocot-eudicot split. After this shared ancestral tetraploidy, grass lineages experienced another two tetraploidies. Oil palm experienced one tetraploidy, likely shared with other palms (Singh et al., 2013). Banana experienced three additional duplications (D’Hont et al., 2012), making it among the more duplicated monocot genomes known, a paleo-dotriacontaploid (2n=32x) like maize (Zea mays; Schnable et al., 2009).

Our results clearly showed that τ is shared by all 11 sequenced monocot genomes, i.e., all commelinids. Commelinids originated ∼120 to ∼100 MYA, not long after the origin of monocots ∼150 to ∼130 MYA (Hedges et al., 2006; The Angiosperm Phylogeny Group, 2009), and comprise ∼18,800 Old World and New World monocot species that include many economically important plants, such as palm, banana, ginger, cereals, and grasses. The estimated timing of τ is very close to the monocot origin, making it analogous to the γ event foreshadowing the initial radiation of core eudicots (Figure 4A). Additional basal monocot genome sequences are needed to resolve the exact timing of τ.

Integrating Synteny and Phylogenomic Approaches for Systematic Genome Comparison

Repeated polyploidization and subsequent diploidization through fractionation and structural rearrangement are genome-wide mutational forces and a striking feature in plants, presumably contributing to their marvelous genetic, phenotypic, and ecological diversity (Bowers et al., 2003; Blanc and Wolfe, 2004; Adams and Wendel, 2005; Freeling and Thomas, 2006; Doyle et al., 2008; Fawcett et al., 2009; Freeling, 2009; Schnable et al., 2009; Soltis et al., 2009; Van de Peer et al., 2009; Paterson et al., 2010; D’Hont et al., 2012; Ming et al., 2013). Nonetheless, conserved genetic content and order are observed to different degrees in closely and distantly related plant genomes (Bonierbale et al., 1988; Kowalski et al., 1994; Paterson et al., 1996; Ku et al., 2000; Tang et al., 2008a; Tang et al., 2010). However, plant genome comparisons differ qualitatively from those that are suitable for mammals and most other taxa and require methods that reflect and represent the multiway homoeologous relationships resulting from paleopolyploidies.

Early studies of WGD using whole-genome sequences were phylogenetic, based either on synonymous substitution rate (Ks) distribution between duplicated gene pairs (Lynch and Conery, 2000; Blanc and Wolfe, 2004) or topology of gene family trees (Bowers et al., 2003). Age grouping of paralog Ks values can be calculated without information about gene position and is the only method for dating WGDs using transcriptome data alone but risks blending of gene families with different modes of Ks divergence and for ancient WGDs base substitution often approaches saturation and sequence-based methods become less useful. An early phylogenetic approach for dating WGD based on discriminating duplication-first (external) and speciation-first (internal) tree topologies (Bowers et al., 2003) was constrained by limited EST sampling, data for few taxa, and lineage evolutionary rate variation (Tang et al., 2008b), although later improved with more rigorous taxon sampling in the phylogenomic dating approach (Jiao et al., 2011).

While the phylogenomic approach utilizes temporal signals, the synteny approach identifies WGDs from spatial signals, i.e., conservation of gene position and order along the chromosomes (Vandepoele et al., 2002; Haas et al., 2004; Tang et al., 2008a). Synteny conservation provides independent and reliable phylogenetic signals unaffected by DNA substitution rate variation (Tang et al., 2008a). In synteny-based WGD dating, coalescence of duplicated regions can be traced before or after speciation depending on the patterns of one-to-one correspondence versus one-to-multiple (or multiple-to-multiple) correspondence (Figure 1B).

Integration of temporal and spatial evidence (Figure 1) is by far the most accurate approach for the circumscription of WGD events. It also naturally connects studies of genome structure and molecular evolution. With the fast growth of genomic data, systematic comparisons of divergent taxa aided by WGD-aware ancestral reconstruction promises better delineation and interpretation of plant genome evolution by providing an effective framework for multiway genomic alignments, with the unique advantage of tolerating long evolutionary distance and extensive genome rearrangement.

The Value of Ancestral Gene Order in Genome Comparisons

Inferred or approximate ancestral gene order often provides the best reference to thread genomic alignments in taxa with WGDs (Bowers et al., 2003; Kellis et al., 2004; Zheng et al., 2013). First, it compensates for post-WGD gene loss, increasing the proportion of aligned genes among homoeologous regions. Second, it expands synteny blocks as lineage specific breakpoints are removed. Therefore, ancestral reconstruction reverses the effects of more recent WGDs and better reveals the interleaving pattern of gene loss (as illustrated in Figure 1; Kellis et al., 2004), greatly facilitating recovery of full syntenic mapping among homoeologous regions. When a WGD-free outgroup genome is not available, as is the case in many plant clades, ancestral gene order can be approximated by in silico reconstruction. Reconstructed ancestral genomes, while not necessarily identical to the true ancestral genome, are important references in effective genome alignments.

Recurring polyploidization and diploidization, ubiquitous in plant evolution, greatly obscure the network of homology in plant genome comparisons. More than 20 ancestral and independent paleopolyploidies have been identified in ∼50 sequenced angiosperms, affecting 100% of lineages. Consequently, many angiosperm genomes contain populations of homoeologous regions of different depth resulting from combination of different levels of paleopolyploidization and postpolyploidy gene loss (examples from the six studied genomes are shown in Supplemental Figure 5). Inter-genomic comparisons are complicated as well. Arabidopsis thaliana and rice, for example, have 829 synteny blocks averaging 43 genes, resulting from six WGDs, three in each lineage. A comparison between diploid Brassica rapa and banana genomes would involve as many as 52 copies involving eight WGD events (3x2x2x3=36 in Brassica and 2x2x2x2=16 in banana). It is beyond current technology to directly find and align all the homoeologous regions in such genomes. Instead, ancestral reconstruction based on an established framework of historical WGDs is able to effectively recover the grand hierarchy of homoeology.

Variation of Lineage Nucleotide Evolutionary Rates and Estimated Ages of σ and τ

Nucleotide evolutionary rates can vary greatly among sites, gene families, nuclear and organellar genomes (Wolfe et al., 1987; Zhang et al., 2002; Mower et al., 2007; Gaut et al., 2011), and lineages with different life history traits and population characters (Smith and Donoghue, 2008; Gaut et al., 2011). Shared paleopolyploidy events provide inherent reference points to calibrate evolutionary rates among affected lineages. Homoeologs bearing synteny information are also intrinsically more accurate phylogenetic markers for multicopy gene families. By applying such reasoning, the Vitis lineage nucleotide substitution rate is estimated to be less than half that of Arabidopsis (Tang et al., 2008b), and the Nelumbo lineage rate is 30% slower than Vitis (Ming et al., 2013). The synonymous site substitution rate between paralogs formed in the shared τ WGD (Supplemental Figure 6) is ∼1.7 times larger in rice than oil palm. This is much less than the ∼5-fold difference between grasses (faster) and palms estimated from chloroplast ribulose-bisphosphate carboxylase large subunit (rbcL) genes (Gaut et al., 2011) and ∼4-fold difference estimated from a combination of rDNA, chloroplast, and mitochondrial genes (Smith and Donoghue, 2008). These differences in rate estimates emphasize the heterogeneous nature of molecular evolution in plant genomes. Reliable WGD and homoeology identification is clearly essential for detailed evolutionary analysis on a genome-wide scale.

The pan-grass ρ event was estimated to be ∼70 million years old (Paterson et al., 2004) based on descendant paralogs in rice having a Ks distribution of mode ∼0.86 and estimated rice lineage rate of 6.5e-9 substitutions per synonymous site per year. Using the same rate, the age of σ (paralogous Ks mode ∼1.65) was estimated to be ∼127 MYA. The palm WGD p event was estimated to be ∼75 MYA (D’Hont et al., 2012; Singh et al., 2013), with descendant paralogs in oil palm having a Ks distribution of mode ∼0.36, giving an estimated oil palm lineage rate of ∼2.4e-9. Paralogs of τ have Ks distributions of mode ∼1.13 in oil palm and ∼1.87 in rice. Taking the average rate of rice and oil palm lineages (4.45e-9) to be the approximate substitution rate on their MRCA lineage, the age of τ can be estimated to be ∼73 MYA before the Arecaceae-Poaceae split using Ks distribution of oil palm paralogs or ∼64 MYA using rice Ks distribution. Again, we note that recent fossil discoveries (Prasad et al., 2005) have a host of implications for estimated evolutionary rates and associated dates of these and many other key events in monocot evolution, a topic that we are exploring comprehensively under separate cover. Taking into consideration the uncertainties in molecular evolutionary rate estimation, and availability of paleontological records, τ likely occurred in primitive monocot branches. Due to a lack of fully sequenced early-diverging monocot genomes and limitation of resolution in current sequence-based dating methods, further refinement of the exact timing of σ and τ awaits future studies.

While disambiguation of ρ – σ – τ was manual, this approach to integrating synteny and phylogenomic comparisons and demonstration of its value to resolve otherwise cryptic paleopolyploidy events has provided results useful to validate efforts to automate the approach, another topic that we plan to report on comprehensively elsewhere.

METHODS

Data Source

We selected 15 taxa to represent all of the major land plant lineages for which genome sequence data are available, including four rosid genomes (Arabidopsis thaliana, Populus trichocarpa, Theobroma cacao, and Vitis vinifera), two asterids (Solanum lycopersicum and Solanum tuberosum), one basal eudicot (Nelumbo nucifera), six monocots (Oryza sativa, Brachypodium distachyon, Sorghum bicolor, Elaeis guineensis, Phoenix dactylifera, and Musa acuminata), one lycophyte (Selaginella moellendorffii), and one moss (Physcomitrella patens). Genome data were downloaded from either Phytozome (version 6) or respective project websites.

Global Gene Family Phylogeny

The OrthoMCL method (Li et al., 2003) was used to construct a complete set of protein coding genes into orthogroups. For each ortholog, amino acid alignments were generated with MAFFT (Katoh et al., 2005) using default parameters. Corresponding DNA sequences were then forced onto the amino acid alignments using custom Perl scripts. DNA alignments were trimmed using trimAl (Capella-Gutiérrez et al., 2009) using the heuristic “automated1” method. Then, maximum likelihood phylogenetic trees were constructed using RAxML version 7.2.1 (Stamatakis, 2006) with the GTRGAMMA model and 100 bootstrap replicates. In total, we constructed 11,503 phylogenetic trees containing monocot and at least one outgroup genes. Due to the classification stringency of OrthoMCL, two anchor genes on synteny blocks could be classified into different orthogroups, and these two orthogroups were combined to run phylogeny for dating the duplication of anchors.

Phylogenomic Dating of Synteny Blocks

Duplication time of synteny blocks was dated by carefully mapping homologous anchor genes onto phylogenetic trees. We considered gene duplications only when the BS value equal or greater than 50%. If homologous anchor genes are clustered in more than one orthogroup, we combined them and recomputed the phylogenetic trees for dating. After all anchor genes duplication dated, we scored the synteny blocks duplication by following criteria: (1) at least two anchor genes for each region could be dated phylogeneticly with BS ≥50%, and (2) the timing of the anchor gene duplications must be mostly consistent.

Circumscribing Paleopolyploidy Events in Multiple Lineages Based on Synteny Patterns

For each pair of the six input genomes (Supplemental Table 1), representative coding sequences from each gene locus were aligned by LASTZ (Harris, 2007) with default parameters. Weak matches with C-score (Putnam et al., 2008) <0.5 (indicating that their similarity falls below 50% of the best matched pair between the two genomes) were filtered out. Remaining matches within 30 to ∼40 Manhattan distance units (40 for grape-oil palm, grape-rice, grape-sorghum, and rice-sorghum comparisons and 30 for the remaining comparisons) were clustered into the same synteny block, with blocks having at least five anchors retained. Chromosomes (or scaffolds) in each genome were segmented at major alignment breakpoints, resulting in smaller genomic regions with simple, continuous synteny patterns. Segmented regions between each pair of genomes were hierarchically clustered based on similarity of homolog density and filtered based on their statistical significance (P < 1e-30) by modeling the probability of homologous matches with Poisson distribution as described previously. Each set of homologous region is defined as a PAR (Tang et al., 2010). We produced a total of 1217 high quality PARs across the studied genomes in this study.

Homologous regions inside each PAR were further refined at their boundaries to trim off unaligned proportions and then analyzed to infer genomic multiplicity using an evolutionary model based on pairwise synteny distance between the clustered regions. Synteny distance is calculated between pairs of aligned syntenic regions similar to the p-distance in nucleotide alignment (gene order alignment in our case), which was then used to infer their phylogenetic relationship using the UPGMA method. From the resulting tree topology, the copy number of paralogous segments in each genome was counted and used to deduce the multiplicity of genomic regions in each individual lineage since their divergence. The pairwise differences of WGD multiplicity were then combined to infer the number of WGDs on each branch of the given species phylogeny by applying the Fitch-Margoliash algorithm (Fitch and Margoliash, 1967). The Fitch-Margoliash algorithm applies a least-squares method to estimate per branch length (in our case, equivalent to the genome multiplicity) based on all pairwise distances between the genomes in comparison. The procedure was pipelined by an in-house Python script with manual inspections at various intermediate stages.

Inferring Ancestral Gene Order Prior to Polyploidy Event

In WGDs, ancestral regions were duplicated and then diverged by sequence and structural mutations. Each descendant homoeologous region contains a fraction of mutated ancestral gene content. Once synteny blocks are identified, the ancestral content can be reconstructed by coalescence of all the descendant regions contained in them. While ancestral reconstruction can be performed at different resolutions to facilitate comparing regions with different extents of divergence, reconstruction of ancestral gene order recovers the alignability and syntenic mapping of homoeologous regions across highly diverged genomes, like those under this study. In our approach, first, multiple alignment of homoeologous regions was computed by the top-down algorithm implemented in MCscan (Tang et al., 2008b). Local alignment inconsistencies, mostly resulted from tandem gene duplications and micro-inversions, were linearized by dynamic programming maximizing anchor gene matching. The aligned homoeologous regions were then merged into a single preduplication segment by interpolation, following methods in previous studies (Bowers et al., 2003; Aury et al., 2006). This results in approximated ancestral gene order and syntenic mappings.

Synonymous Substitution (Ks) Calculation

For each pair of homologous genes, we aligned their protein sequences using ClustalW2 (Larkin et al., 2007) and converted the protein alignment to DNA alignment using PAL2NAL (Suyama et al., 2006). Some homologous genes could not produce reliable alignment for various reasons and were discarded from further analysis. Ks values were calculated using the Nei-Gojobori algorithm (Nei and Gojobori, 1986) implemented in the PAML package (Yang, 1997). Ks values for gene pairs with average GC3 > 75% are considered unreliable and discarded (Tang et al., 2010). Ks values >3.0 indicate saturated substitutions at synonymous sites and those gene pairs were excluded from further analysis.

Supplemental Data

The following materials are available in the online version of this article.

Supplemental Figure 1. Exemplar Maximum Likelihood Phylogeny of WRKY Gene Family
Supplemental Figure 2. Exemplar Maximum Likelihood Phylogeny of Zinc Finger Gene Family.
Supplemental Figure 3. Example Alignment Showing Eight Homologous Musa Regions Duplicated after the Divergence of Musa and Oryza.
Supplemental Figure 4. WGD Multiplicity Level Estimation on Each Branch of Given Species Phylogeny.
Supplemental Figure 5. Homoeolog Depth (Paleopolyploidy Level) of Genomic Regions in the Studied Taxa.
Supplemental Figure 6. Distributions of Synonymous Substitution Rates (Ks) among Groups of Orthologous or Paralogous Gene Pairs.
Supplemental Table 1. Sources and Basic Information about Angiosperm Genomes Used in Synteny Comparison.
Supplemental Data Set 1. Significant Enrichment of GO-SLIM Term for the Gene Families with Ancient Tau Duplication Measured by Hypergeometric Test Followed by FDR Multitest Adjustment Method.

Supplementary Material

Supplemental Data

supp_26_7_2792__index.html^{(1KB, html)}

Acknowledgments

We appreciate funding from the National Science Foundation (DBI 0849896, MCB 0821096, and MCB 1021718).

AUTHOR CONTRIBUTIONS

Y.J., J.L., H.T., and A.H.P. designed the research. Y.J. and J.L. performed analyses. H.T. contributed analytic tools. Y.J., J.L., H.T., and A.H.P. wrote the article.

Glossary

WGD: whole-genome duplication
MYA: million years ago
BS: bootstrap support
PAR: putative ancestral region
MRCA: most recent common ancestor

Footnotes

^[W]

Online version contains Web-only data.

References

Adams K.L., Wendel J.F. (2005). Polyploidy and genome evolution in plants. Curr. Opin. Plant Biol. 8: 135–141. [DOI] [PubMed] [Google Scholar]
Al-Mssallem I.S., Hu S., Zhang X., Lin Q., Liu W., Tan J., Yu X., Liu J., Pan L., Zhang T., Yin Y., Xin C., et al. (2013). Genome sequence of the date palm Phoenix dactylifera L. Nat. Commun. 4: 2274. [DOI] [PMC free article] [PubMed] [Google Scholar]
Amborella Genome Project (2013). The Amborella genome and the evolution of flowering plants. Science 342: 1241089. [DOI] [PubMed] [Google Scholar]
Arrigo N., Barker M.S. (2012). Rarely successful polyploids and their legacy in plant genomes. Curr. Opin. Plant Biol. 15: 140–146. [DOI] [PubMed] [Google Scholar]
Aury J.M., et al. (2006). Global trends of whole-genome duplications revealed by the ciliate Paramecium tetraurelia. Nature 444: 171–178. [DOI] [PubMed] [Google Scholar]
Barker M.S., Vogel H., Schranz M.E. (2009). Paleopolyploidy in the Brassicales: analyses of the Cleome transcriptome elucidate the history of genome duplications in Arabidopsis and other Brassicales. Genome Biol. Evol. 1: 391–399. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bennetzen J.L., et al. (2012). Reference genome sequence of the model plant Setaria. Nat. Biotechnol. 30: 555–561. [DOI] [PubMed] [Google Scholar]
Blanc G., Wolfe K.H. (2004). Widespread paleopolyploidy in model plant species inferred from age distributions of duplicate genes. Plant Cell 16: 1667–1678. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bonierbale M.W., Plaisted R.L., Tanksley S.D. (1988). RFLP maps based on a common set of clones reveal modes of chromosomal evolution in potato and tomato. Genetics 120: 1095–1103. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bowers J.E., Chapman B.A., Rong J., Paterson A.H. (2003). Unravelling angiosperm genome evolution by phylogenetic analysis of chromosomal duplication events. Nature 422: 433–438. [DOI] [PubMed] [Google Scholar]
Capella-Gutiérrez S., Silla-Martínez J.M., Gabaldón T. (2009). trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25: 1972–1973. [DOI] [PMC free article] [PubMed] [Google Scholar]
Christoffels A., Koh E.G.L., Chia J.M., Brenner S., Aparicio S., Venkatesh B. (2004). Fugu genome analysis provides evidence for a whole-genome duplication early during the evolution of ray-finned fishes. Mol. Biol. Evol. 21: 1146–1151. [DOI] [PubMed] [Google Scholar]
Cui L., et al. (2006). Widespread genome duplications throughout the history of flowering plants. Genome Res. 16: 738–749. [DOI] [PMC free article] [PubMed] [Google Scholar]
Dehal P., Boore J.L. (2005). Two rounds of whole genome duplication in the ancestral vertebrate. PLoS Biol. 3: e314. [DOI] [PMC free article] [PubMed] [Google Scholar]
D’Hont A., et al. (2012). The banana (Musa acuminata) genome and the evolution of monocotyledonous plants. Nature 488: 213–217. [DOI] [PubMed] [Google Scholar]
Fawcett J.A., Maere S., Van de Peer Y. (2009). Plants with double genomes might have had a better chance to survive the Cretaceous-Tertiary extinction event. Proc. Natl. Acad. Sci. USA 106: 5737–5742. [DOI] [PMC free article] [PubMed] [Google Scholar]
Fitch W.M., Margoliash E. (1967). Construction of phylogenetic trees. Science 155: 279–284. [DOI] [PubMed] [Google Scholar]
Freeling M. (2009). Bias in plant gene content following different sorts of duplication: tandem, whole-genome, segmental, or by transposition. Annu. Rev. Plant Biol. 60: 433–453. [DOI] [PubMed] [Google Scholar]
Freeling M., Thomas B.C. (2006). Gene-balanced duplications, like tetraploidy, provide predictable drive to increase morphological complexity. Genome Res. 16: 805–814. [DOI] [PubMed] [Google Scholar]
Gaut B., Yang L., Takuno S., Eguiarte L.E. (2011). The patterns and causes of variation in plant nucleotide substitution rates. Annu. Rev. Ecol. Evol. Syst. 42: 245–266. [Google Scholar]
Grant D., Cregan P., Shoemaker R.C. (2000). Genome organization in dicots: genome duplication in Arabidopsis and synteny between soybean and Arabidopsis. Proc. Natl. Acad. Sci. USA 97: 4168–4173. [DOI] [PMC free article] [PubMed] [Google Scholar]
Haas B.J., Delcher A.L., Wortman J.R., Salzberg S.L. (2004). DAGchainer: a tool for mining segmental genome duplications and synteny. Bioinformatics 20: 3643–3646. [DOI] [PubMed] [Google Scholar]
Harris, R.S. (2007). Improved Pairwise Alignment of Genomic DNA. PhD dissertation (The Pennsylvania State University). [Google Scholar]
Hedges S.B., Dudley J., Kumar S. (2006). TimeTree: a public knowledge-base of divergence times among organisms. Bioinformatics 22: 2971–2972. [DOI] [PubMed] [Google Scholar]
International Brachypodium Initiative (2010). Genome sequencing and analysis of the model grass Brachypodium distachyon. Nature 463: 763–768. [DOI] [PubMed] [Google Scholar]
Jaillon O., et al. French-Italian Public Consortium for Grapevine Genome Characterization (2007). The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla. Nature 449: 463–467. [DOI] [PubMed] [Google Scholar]
Jaillon O., et al. (2004). Genome duplication in the teleost fish Tetraodon nigroviridis reveals the early vertebrate proto-karyotype. Nature 431: 946–957. [DOI] [PubMed] [Google Scholar]
Jiao Y., et al. (2011). Ancestral polyploidy in seed plants and angiosperms. Nature 473: 97–100. [DOI] [PubMed] [Google Scholar]
Jiao Y., et al. (2012). A genome triplication associated with early diversification of the core eudicots. Genome Biol. 13: R3. [DOI] [PMC free article] [PubMed] [Google Scholar]
Katoh K., Kuma K., Toh H., Miyata T. (2005). MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Res. 33: 511–518. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kellis M., Birren B.W., Lander E.S. (2004). Proof and evolutionary analysis of ancient genome duplication in the yeast Saccharomyces cerevisiae. Nature 428: 617–624. [DOI] [PubMed] [Google Scholar]
Kowalski S.P., Lan T.H., Feldmann K.A., Paterson A.H. (1994). Comparative mapping of Arabidopsis thaliana and Brassica oleracea chromosomes reveals islands of conserved organization. Genetics 138: 499–510. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ku H.M., Vision T., Liu J., Tanksley S.D. (2000). Comparing sequenced segments of the tomato and Arabidopsis genomes: large-scale duplication followed by selective gene loss creates a network of synteny. Proc. Natl. Acad. Sci. USA 97: 9121–9126. [DOI] [PMC free article] [PubMed] [Google Scholar]
Larkin M.A., et al. (2007). Clustal W and Clustal X version 2.0. Bioinformatics 23: 2947–2948. [DOI] [PubMed] [Google Scholar]
Li L., Stoeckert C.J., Jr and Roos D.S. (2003). OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 13: 2178–2189. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lynch M., Conery J.S. (2000). The evolutionary fate and consequences of duplicate genes. Science 290: 1151–1155. [DOI] [PubMed] [Google Scholar]
Maere S., De Bodt S., Raes J., Casneuf T., Van Montagu M., Kuiper M., Van de Peer Y. (2005). Modeling gene and genome duplications in eukaryotes. Proc. Natl. Acad. Sci. USA 102: 5454–5459. [DOI] [PMC free article] [PubMed] [Google Scholar]
Mayrose I., Zhan S.H., Rothfels C.J., Magnuson-Ford K., Barker M.S., Rieseberg L.H., Otto S.P. (2011). Recently formed polyploid plants diversify at lower rates. Science 333: 1257. [DOI] [PubMed] [Google Scholar]
Ming R., et al. (2013). Genome of the long-living sacred lotus (Nelumbo nucifera Gaertn.). Genome Biol. 14: R41. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ming R., et al. (2008). The draft genome of the transgenic tropical fruit tree papaya (Carica papaya Linnaeus). Nature 452: 991–996. [DOI] [PMC free article] [PubMed] [Google Scholar]
Mower J.P., Touzet P., Gummow J.S., Delph L.F., Palmer J.D. (2007). Extensive variation in synonymous substitution rates in mitochondrial genes of seed plants. BMC Evol. Biol. 7: 135. [DOI] [PMC free article] [PubMed] [Google Scholar]
Nei M., Gojobori T. (1986). Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol. Biol. Evol. 3: 418–426. [DOI] [PubMed] [Google Scholar]
Ohno, S. (1970). Evolution by Gene Duplication. (Berlin: Springer). [Google Scholar]
Otto S.P., Whitton J. (2000). Polyploid incidence and evolution. Annu. Rev. Genet. 34: 401–437. [DOI] [PubMed] [Google Scholar]
Paterson A.H., et al. (1996). Toward a unified genetic map of higher plants, transcending the monocot-dicot divergence. Nat. Genet. 14: 380–382. [DOI] [PubMed] [Google Scholar]
Paterson A.H., Bowers J.E., Chapman B.A. (2004). Ancient polyploidization predating divergence of the cereals, and its consequences for comparative genomics. Proc. Natl. Acad. Sci. USA 101: 9903–9908. [DOI] [PMC free article] [PubMed] [Google Scholar]
Paterson A.H., Freeling M., Tang H., Wang X. (2010). Insights from the comparison of plant genome sequences. Annu. Rev. Plant Biol. 61: 349–372. [DOI] [PubMed] [Google Scholar]
Paterson A.H., Chapman B.A., Kissinger J.C., Bowers J.E., Feltus F.A., Estill J.C. (2006). Many gene and domain families have convergent fates following independent whole-genome duplication events in Arabidopsis, Oryza, Saccharomyces and Tetraodon. Trends Genet. 22: 597–602. [DOI] [PubMed] [Google Scholar]
Paterson A.H., et al. (2009). The Sorghum bicolor genome and the diversification of grasses. Nature 457: 551–556. [DOI] [PubMed] [Google Scholar]
Prasad V., Strömberg C.A., Alimohammadian H., Sahni A. (2005). Dinosaur coprolites and the early evolution of grasses and grazers. Science 310: 1177–1180. [DOI] [PubMed] [Google Scholar]
Putnam N.H., et al. (2008). The amphioxus genome and the evolution of the chordate karyotype. Nature 453: 1064–1071. [DOI] [PubMed] [Google Scholar]
Schnable P.S., et al. (2009). The B73 maize genome: complexity, diversity, and dynamics. Science 326: 1112–1115. [DOI] [PubMed] [Google Scholar]
Singh R., et al. (2013). Oil palm genome sequence reveals divergence of interfertile species in old and new worlds. Nature 500: 335–339. [DOI] [PMC free article] [PubMed] [Google Scholar]
Smith S.A., Donoghue M.J. (2008). Rates of molecular evolution are linked to life history in flowering plants. Science 322: 86–89. [DOI] [PubMed] [Google Scholar]
Soltis D.E., Albert V.A., Leebens-Mack J., Bell C.D., Paterson A.H., Zheng C., Sankoff D., Depamphilis C.W., Wall P.K., Soltis P.S. (2009). Polyploidy and angiosperm diversification. Am. J. Bot. 96: 336–348. [DOI] [PubMed] [Google Scholar]
Stamatakis A. (2006). RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22: 2688–2690. [DOI] [PubMed] [Google Scholar]
Suyama M., Torrents D., Bork P. (2006). PAL2NAL: robust conversion of protein sequence alignments into the corresponding codon alignments. Nucleic Acids Res. 34: W609–W612. [DOI] [PMC free article] [PubMed] [Google Scholar]
Tang H., Bowers J.E., Wang X., Paterson A.H. (2010). Angiosperm genome comparisons reveal early polyploidy in the monocot lineage. Proc. Natl. Acad. Sci. USA 107: 472–477. [DOI] [PMC free article] [PubMed] [Google Scholar]
Tang H., Bowers J.E., Wang X., Ming R., Alam M., Paterson A.H. (2008a). Synteny and collinearity in plant genomes. Science 320: 486–488. [DOI] [PubMed] [Google Scholar]
Tang H., Wang X., Bowers J.E., Ming R., Alam M., Paterson A.H. (2008b). Unraveling ancient hexaploidy through multiply-aligned angiosperm gene maps. Genome Res. 18: 1944–1954. [DOI] [PMC free article] [PubMed] [Google Scholar]
The Angiosperm Phylogeny Group (2009). An update of the Angiosperm Phylogeny Group classification for the orders and families of flowering plants: APG III. Bot. J. Linn. Soc. 161: 105–121. [Google Scholar]
Van de Peer Y. (2011). A mystery unveiled. Genome Biol. 12: 113. [DOI] [PMC free article] [PubMed] [Google Scholar]
Van de Peer Y., Maere S., Meyer A. (2009). The evolutionary significance of ancient genome duplications. Nat. Rev. Genet. 10: 725–732. [DOI] [PubMed] [Google Scholar]
Vandepoele K., Saeys Y., Simillion C., Raes J., Van De Peer Y. (2002). The automatic detection of homologous regions (ADHoRe) and its application to microcolinearity between Arabidopsis and rice. Genome Res. 12: 1792–1801. [DOI] [PMC free article] [PubMed] [Google Scholar]
Vision T.J., Brown D.G., Tanksley S.D. (2000). The origins of genomic duplications in Arabidopsis. Science 290: 2114–2117. [DOI] [PubMed] [Google Scholar]
Wang X., Shi X., Hao B., Ge S., Luo J. (2005). Duplication and DNA segmental loss in the rice genome: implications for diploidization. New Phytol. 165: 937–946. [DOI] [PubMed] [Google Scholar]
Wang X., et al. Brassica rapa Genome Sequencing Project Consortium (2011). The genome of the mesopolyploid crop species Brassica rapa. Nat. Genet. 43: 1035–1039. [DOI] [PubMed] [Google Scholar]
Wang Y., Tang H., Debarry J.D., Tan X., Li J., Wang X., Lee T.H., Jin H., Marler B., Guo H., Kissinger J.C., Paterson A.H. (2012). MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Res. 40: e49. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wolfe K.H., Li W.H., Sharp P.M. (1987). Rates of nucleotide substitution vary greatly among plant mitochondrial, chloroplast, and nuclear DNAs. Proc. Natl. Acad. Sci. USA 84: 9054–9058. [DOI] [PMC free article] [PubMed] [Google Scholar]
Yang Z. (1997). PAML: a program package for phylogenetic analysis by maximum likelihood. Comput. Appl. Biosci. 13: 555–556. [DOI] [PubMed] [Google Scholar]
Zhang L., Vision T.J., Gaut B.S. (2002). Patterns of nucleotide substitution among simultaneously duplicated gene pairs in Arabidopsis thaliana. Mol. Biol. Evol. 19: 1464–1473. [DOI] [PubMed] [Google Scholar]
Zheng C., Chen E., Albert V.A., Lyons E., Sankoff D. (2013). Ancient eudicot hexaploidy meets ancestral eurosid gene order. BMC Genomics 14 (suppl. 7): S3. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental Data

supp_26_7_2792__index.html^{(1KB, html)}

supp_tpc.114.127597_tpc127597_SupplementalData.pdf^{(1.6MB, pdf)}

supp_tpc.114.127597_tpc127597_SupplementalDS1.txt^{(18.5KB, txt)}

[bib1] Adams K.L., Wendel J.F. (2005). Polyploidy and genome evolution in plants. Curr. Opin. Plant Biol. 8: 135–141. [DOI] [PubMed] [Google Scholar]

[bib2] Al-Mssallem I.S., Hu S., Zhang X., Lin Q., Liu W., Tan J., Yu X., Liu J., Pan L., Zhang T., Yin Y., Xin C., et al. (2013). Genome sequence of the date palm Phoenix dactylifera L. Nat. Commun. 4: 2274. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib3] Amborella Genome Project (2013). The Amborella genome and the evolution of flowering plants. Science 342: 1241089. [DOI] [PubMed] [Google Scholar]

[bib4] Arrigo N., Barker M.S. (2012). Rarely successful polyploids and their legacy in plant genomes. Curr. Opin. Plant Biol. 15: 140–146. [DOI] [PubMed] [Google Scholar]

[bib5] Aury J.M., et al. (2006). Global trends of whole-genome duplications revealed by the ciliate Paramecium tetraurelia. Nature 444: 171–178. [DOI] [PubMed] [Google Scholar]

[bib6] Barker M.S., Vogel H., Schranz M.E. (2009). Paleopolyploidy in the Brassicales: analyses of the Cleome transcriptome elucidate the history of genome duplications in Arabidopsis and other Brassicales. Genome Biol. Evol. 1: 391–399. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib7] Bennetzen J.L., et al. (2012). Reference genome sequence of the model plant Setaria. Nat. Biotechnol. 30: 555–561. [DOI] [PubMed] [Google Scholar]

[bib8] Blanc G., Wolfe K.H. (2004). Widespread paleopolyploidy in model plant species inferred from age distributions of duplicate genes. Plant Cell 16: 1667–1678. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib9] Bonierbale M.W., Plaisted R.L., Tanksley S.D. (1988). RFLP maps based on a common set of clones reveal modes of chromosomal evolution in potato and tomato. Genetics 120: 1095–1103. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib10] Bowers J.E., Chapman B.A., Rong J., Paterson A.H. (2003). Unravelling angiosperm genome evolution by phylogenetic analysis of chromosomal duplication events. Nature 422: 433–438. [DOI] [PubMed] [Google Scholar]

[bib11] Capella-Gutiérrez S., Silla-Martínez J.M., Gabaldón T. (2009). trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25: 1972–1973. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib12] Christoffels A., Koh E.G.L., Chia J.M., Brenner S., Aparicio S., Venkatesh B. (2004). Fugu genome analysis provides evidence for a whole-genome duplication early during the evolution of ray-finned fishes. Mol. Biol. Evol. 21: 1146–1151. [DOI] [PubMed] [Google Scholar]

[bib13] Cui L., et al. (2006). Widespread genome duplications throughout the history of flowering plants. Genome Res. 16: 738–749. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib14] Dehal P., Boore J.L. (2005). Two rounds of whole genome duplication in the ancestral vertebrate. PLoS Biol. 3: e314. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib15] D’Hont A., et al. (2012). The banana (Musa acuminata) genome and the evolution of monocotyledonous plants. Nature 488: 213–217. [DOI] [PubMed] [Google Scholar]

[bib16] Fawcett J.A., Maere S., Van de Peer Y. (2009). Plants with double genomes might have had a better chance to survive the Cretaceous-Tertiary extinction event. Proc. Natl. Acad. Sci. USA 106: 5737–5742. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib17] Fitch W.M., Margoliash E. (1967). Construction of phylogenetic trees. Science 155: 279–284. [DOI] [PubMed] [Google Scholar]

[bib18] Freeling M. (2009). Bias in plant gene content following different sorts of duplication: tandem, whole-genome, segmental, or by transposition. Annu. Rev. Plant Biol. 60: 433–453. [DOI] [PubMed] [Google Scholar]

[bib68] Freeling M., Thomas B.C. (2006). Gene-balanced duplications, like tetraploidy, provide predictable drive to increase morphological complexity. Genome Res. 16: 805–814. [DOI] [PubMed] [Google Scholar]

[bib19] Gaut B., Yang L., Takuno S., Eguiarte L.E. (2011). The patterns and causes of variation in plant nucleotide substitution rates. Annu. Rev. Ecol. Evol. Syst. 42: 245–266. [Google Scholar]

[bib20] Grant D., Cregan P., Shoemaker R.C. (2000). Genome organization in dicots: genome duplication in Arabidopsis and synteny between soybean and Arabidopsis. Proc. Natl. Acad. Sci. USA 97: 4168–4173. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib21] Haas B.J., Delcher A.L., Wortman J.R., Salzberg S.L. (2004). DAGchainer: a tool for mining segmental genome duplications and synteny. Bioinformatics 20: 3643–3646. [DOI] [PubMed] [Google Scholar]

[bib22] Harris, R.S. (2007). Improved Pairwise Alignment of Genomic DNA. PhD dissertation (The Pennsylvania State University). [Google Scholar]

[bib23] Hedges S.B., Dudley J., Kumar S. (2006). TimeTree: a public knowledge-base of divergence times among organisms. Bioinformatics 22: 2971–2972. [DOI] [PubMed] [Google Scholar]

[bib24] International Brachypodium Initiative (2010). Genome sequencing and analysis of the model grass Brachypodium distachyon. Nature 463: 763–768. [DOI] [PubMed] [Google Scholar]

[bib25] Jaillon O., et al. French-Italian Public Consortium for Grapevine Genome Characterization (2007). The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla. Nature 449: 463–467. [DOI] [PubMed] [Google Scholar]

[bib26] Jaillon O., et al. (2004). Genome duplication in the teleost fish Tetraodon nigroviridis reveals the early vertebrate proto-karyotype. Nature 431: 946–957. [DOI] [PubMed] [Google Scholar]

[bib27] Jiao Y., et al. (2011). Ancestral polyploidy in seed plants and angiosperms. Nature 473: 97–100. [DOI] [PubMed] [Google Scholar]

[bib28] Jiao Y., et al. (2012). A genome triplication associated with early diversification of the core eudicots. Genome Biol. 13: R3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib29] Katoh K., Kuma K., Toh H., Miyata T. (2005). MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Res. 33: 511–518. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib30] Kellis M., Birren B.W., Lander E.S. (2004). Proof and evolutionary analysis of ancient genome duplication in the yeast Saccharomyces cerevisiae. Nature 428: 617–624. [DOI] [PubMed] [Google Scholar]

[bib69] Kowalski S.P., Lan T.H., Feldmann K.A., Paterson A.H. (1994). Comparative mapping of Arabidopsis thaliana and Brassica oleracea chromosomes reveals islands of conserved organization. Genetics 138: 499–510. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib31] Ku H.M., Vision T., Liu J., Tanksley S.D. (2000). Comparing sequenced segments of the tomato and Arabidopsis genomes: large-scale duplication followed by selective gene loss creates a network of synteny. Proc. Natl. Acad. Sci. USA 97: 9121–9126. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib32] Larkin M.A., et al. (2007). Clustal W and Clustal X version 2.0. Bioinformatics 23: 2947–2948. [DOI] [PubMed] [Google Scholar]

[bib33] Li L., Stoeckert C.J., Jr and Roos D.S. (2003). OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 13: 2178–2189. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib34] Lynch M., Conery J.S. (2000). The evolutionary fate and consequences of duplicate genes. Science 290: 1151–1155. [DOI] [PubMed] [Google Scholar]

[bib35] Maere S., De Bodt S., Raes J., Casneuf T., Van Montagu M., Kuiper M., Van de Peer Y. (2005). Modeling gene and genome duplications in eukaryotes. Proc. Natl. Acad. Sci. USA 102: 5454–5459. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib36] Mayrose I., Zhan S.H., Rothfels C.J., Magnuson-Ford K., Barker M.S., Rieseberg L.H., Otto S.P. (2011). Recently formed polyploid plants diversify at lower rates. Science 333: 1257. [DOI] [PubMed] [Google Scholar]

[bib37] Ming R., et al. (2013). Genome of the long-living sacred lotus (Nelumbo nucifera Gaertn.). Genome Biol. 14: R41. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib38] Ming R., et al. (2008). The draft genome of the transgenic tropical fruit tree papaya (Carica papaya Linnaeus). Nature 452: 991–996. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib39] Mower J.P., Touzet P., Gummow J.S., Delph L.F., Palmer J.D. (2007). Extensive variation in synonymous substitution rates in mitochondrial genes of seed plants. BMC Evol. Biol. 7: 135. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib40] Nei M., Gojobori T. (1986). Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol. Biol. Evol. 3: 418–426. [DOI] [PubMed] [Google Scholar]

[bib41] Ohno, S. (1970). Evolution by Gene Duplication. (Berlin: Springer). [Google Scholar]

[bib42] Otto S.P., Whitton J. (2000). Polyploid incidence and evolution. Annu. Rev. Genet. 34: 401–437. [DOI] [PubMed] [Google Scholar]

[bib71] Paterson A.H., et al. (1996). Toward a unified genetic map of higher plants, transcending the monocot-dicot divergence. Nat. Genet. 14: 380–382. [DOI] [PubMed] [Google Scholar]

[bib43] Paterson A.H., Bowers J.E., Chapman B.A. (2004). Ancient polyploidization predating divergence of the cereals, and its consequences for comparative genomics. Proc. Natl. Acad. Sci. USA 101: 9903–9908. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib44] Paterson A.H., Freeling M., Tang H., Wang X. (2010). Insights from the comparison of plant genome sequences. Annu. Rev. Plant Biol. 61: 349–372. [DOI] [PubMed] [Google Scholar]

[bib45] Paterson A.H., Chapman B.A., Kissinger J.C., Bowers J.E., Feltus F.A., Estill J.C. (2006). Many gene and domain families have convergent fates following independent whole-genome duplication events in Arabidopsis, Oryza, Saccharomyces and Tetraodon. Trends Genet. 22: 597–602. [DOI] [PubMed] [Google Scholar]

[bib46] Paterson A.H., et al. (2009). The Sorghum bicolor genome and the diversification of grasses. Nature 457: 551–556. [DOI] [PubMed] [Google Scholar]

[bib47] Prasad V., Strömberg C.A., Alimohammadian H., Sahni A. (2005). Dinosaur coprolites and the early evolution of grasses and grazers. Science 310: 1177–1180. [DOI] [PubMed] [Google Scholar]

[bib48] Putnam N.H., et al. (2008). The amphioxus genome and the evolution of the chordate karyotype. Nature 453: 1064–1071. [DOI] [PubMed] [Google Scholar]

[bib49] Schnable P.S., et al. (2009). The B73 maize genome: complexity, diversity, and dynamics. Science 326: 1112–1115. [DOI] [PubMed] [Google Scholar]

[bib50] Singh R., et al. (2013). Oil palm genome sequence reveals divergence of interfertile species in old and new worlds. Nature 500: 335–339. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib51] Smith S.A., Donoghue M.J. (2008). Rates of molecular evolution are linked to life history in flowering plants. Science 322: 86–89. [DOI] [PubMed] [Google Scholar]

[bib52] Soltis D.E., Albert V.A., Leebens-Mack J., Bell C.D., Paterson A.H., Zheng C., Sankoff D., Depamphilis C.W., Wall P.K., Soltis P.S. (2009). Polyploidy and angiosperm diversification. Am. J. Bot. 96: 336–348. [DOI] [PubMed] [Google Scholar]

[bib53] Stamatakis A. (2006). RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22: 2688–2690. [DOI] [PubMed] [Google Scholar]

[bib54] Suyama M., Torrents D., Bork P. (2006). PAL2NAL: robust conversion of protein sequence alignments into the corresponding codon alignments. Nucleic Acids Res. 34: W609–W612. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib55] Tang H., Bowers J.E., Wang X., Paterson A.H. (2010). Angiosperm genome comparisons reveal early polyploidy in the monocot lineage. Proc. Natl. Acad. Sci. USA 107: 472–477. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib56] Tang H., Bowers J.E., Wang X., Ming R., Alam M., Paterson A.H. (2008a). Synteny and collinearity in plant genomes. Science 320: 486–488. [DOI] [PubMed] [Google Scholar]

[bib57] Tang H., Wang X., Bowers J.E., Ming R., Alam M., Paterson A.H. (2008b). Unraveling ancient hexaploidy through multiply-aligned angiosperm gene maps. Genome Res. 18: 1944–1954. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib58] The Angiosperm Phylogeny Group (2009). An update of the Angiosperm Phylogeny Group classification for the orders and families of flowering plants: APG III. Bot. J. Linn. Soc. 161: 105–121. [Google Scholar]

[bib59] Van de Peer Y. (2011). A mystery unveiled. Genome Biol. 12: 113. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib70] Van de Peer Y., Maere S., Meyer A. (2009). The evolutionary significance of ancient genome duplications. Nat. Rev. Genet. 10: 725–732. [DOI] [PubMed] [Google Scholar]

[bib60] Vandepoele K., Saeys Y., Simillion C., Raes J., Van De Peer Y. (2002). The automatic detection of homologous regions (ADHoRe) and its application to microcolinearity between Arabidopsis and rice. Genome Res. 12: 1792–1801. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib61] Vision T.J., Brown D.G., Tanksley S.D. (2000). The origins of genomic duplications in Arabidopsis. Science 290: 2114–2117. [DOI] [PubMed] [Google Scholar]

[bib62] Wang X., Shi X., Hao B., Ge S., Luo J. (2005). Duplication and DNA segmental loss in the rice genome: implications for diploidization. New Phytol. 165: 937–946. [DOI] [PubMed] [Google Scholar]

[bib63] Wang X., et al. Brassica rapa Genome Sequencing Project Consortium (2011). The genome of the mesopolyploid crop species Brassica rapa. Nat. Genet. 43: 1035–1039. [DOI] [PubMed] [Google Scholar]

[bib64] Wang Y., Tang H., Debarry J.D., Tan X., Li J., Wang X., Lee T.H., Jin H., Marler B., Guo H., Kissinger J.C., Paterson A.H. (2012). MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Res. 40: e49. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib65] Wolfe K.H., Li W.H., Sharp P.M. (1987). Rates of nucleotide substitution vary greatly among plant mitochondrial, chloroplast, and nuclear DNAs. Proc. Natl. Acad. Sci. USA 84: 9054–9058. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib66] Yang Z. (1997). PAML: a program package for phylogenetic analysis by maximum likelihood. Comput. Appl. Biosci. 13: 555–556. [DOI] [PubMed] [Google Scholar]

[bib67] Zhang L., Vision T.J., Gaut B.S. (2002). Patterns of nucleotide substitution among simultaneously duplicated gene pairs in Arabidopsis thaliana. Mol. Biol. Evol. 19: 1464–1473. [DOI] [PubMed] [Google Scholar]

[bib72] Zheng C., Chen E., Albert V.A., Lyons E., Sankoff D. (2013). Ancient eudicot hexaploidy meets ancestral eurosid gene order. BMC Genomics 14 (suppl. 7): S3. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Integrated Syntenic and Phylogenomic Analyses Reveal an Ancient Genome Duplication in Monocots[W]

Yuannian Jiao

Jingping Li

Haibao Tang

Andrew H Paterson

Abstract

INTRODUCTION

Figure 1.

RESULTS

Phylogenomic Analysis

Figure 2.

Multiple Synteny Alignments of Musa Using Oryza as Reference

Figure 3.

Overview of Synteny Conservation among Six Representative Genomes

Table 1. Multiplicity Ratios between Pairs of Genomes Resulting from Independent WGDs in Bottom Left Section and Number of Anchors (Number of Synteny Blocks) in Upper Right Section.

Figure 4.

Circumscribing the σ Duplication

Unraveling the Precommelinid τ Duplication Event in Early Monocots

Figure 5.

Simultaneous Circumscription of Paleopolyploidy Events in Multiple Lineages

DISCUSSION

Integrating Synteny and Phylogenomic Approaches for Systematic Genome Comparison

The Value of Ancestral Gene Order in Genome Comparisons

Variation of Lineage Nucleotide Evolutionary Rates and Estimated Ages of σ and τ

METHODS

Data Source

Global Gene Family Phylogeny

Phylogenomic Dating of Synteny Blocks

Circumscribing Paleopolyploidy Events in Multiple Lineages Based on Synteny Patterns

Inferring Ancestral Gene Order Prior to Polyploidy Event

Synonymous Substitution (Ks) Calculation

Supplemental Data

Supplementary Material

Acknowledgments

AUTHOR CONTRIBUTIONS

Glossary

Footnotes

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

Integrated Syntenic and Phylogenomic Analyses Reveal an Ancient Genome Duplication in Monocots^{^[W]}