Abstract
The grass family (Poaceae) includes all commercial cereal crops and is a major contributor to biomass in various terrestrial ecosystems. The ancestry of all grass genomes includes a shared whole-genome duplication (WGD), named rho (ρ) WGD, but the evolutionary significance of ρ-WGD remains elusive. We sequenced the genome of Pharus latifolius, a grass species (producing a true spikelet) in the subfamily Pharoideae, a sister lineage to the core Poaceae including the (Panicoideae, Arundinoideae, Chloridoideae, Micrairoideae, Aristidoideae, and Danthonioideae (PACMAD) and Bambusoideae, Oryzoideae, and Pooideae (BOP) clades. Our results indicate that the P. latifolius genome has evolved slowly relative to cereal grass genomes, as reflected by moderate rates of molecular evolution, limited chromosome rearrangements and a low rate of gene loss for duplicated genes. We show that the ρ-WGD event occurred approximately 98.2 million years ago (Ma) in a common ancestor of the Pharoideae and the PACMAD and BOP grasses. This was followed by contrasting patterns of diploidization in the Pharus and core Poaceae lineages. The presence of two FRIZZY PANICLE-like genes in P. latifolius, and duplicated MADS-box genes, support the hypothesis that the ρ-WGD may have played a role in the origin and functional diversification of the spikelet, an adaptation in grasses related directly to cereal yields. The P. latifolius genome sheds light on the origin and early evolution of grasses underpinning the biology and breeding of cereals.
The Pharus genome fills an important genomic gap, providing numerous insights into how whole-genome duplication contributed to the origin and diversification of the grass family.
Introduction
The grass family (Poaceae) is one of the five largest families of angiosperms and the most economically important. Different members of this family provide us with calorie-rich grains, pasturage for domesticated animals, commercial sucrose, and building materials (Kellogg, 2001; Kellogg, 2015; Soreng et�al., 2017). The Poaceae contains more than 11,500 species segregated into 12 monophyletic subfamilies or lineages. Nine subfamilies (Soreng et�al., 2017) comprise the two core grass clades Bambusoideae (bamboos), Oryzoideae (including rice), and Pooideae (including wheat, barley, and Brachypodium; BOP) and Panicoideae (including maize, sorghum, and sugarcane), Arundinoideae, Chloridoideae, Micrairoideae, Aristidoideae, and Danthonioideae (PACMAD). Three other subfamilies, the Anomochlooideae, Pharoideae, and Puelioideae, contain only 27 species and are successive sister lineages to the core grasses (Clark et�al., 1995; GPWG, 2001; Soreng et�al., 2017). Over a relatively short period of geologic time ancestral grasses migrated out of forest understories becoming the dominant vegetation of ecosystems classified as prairies, savannas, and some wetlands (Clayton and Renvoize, 1986; Linder et�al., 2018; Gallaher et�al., 2019). This broad adaptation was underpinned by the selection of morphological and physiological traits including C4 photosynthesis, and potentially the specialized flowers included in spikelets, which are arranged in structurally complex inflorescences (Clifford, 1987; Kellogg, 2001; Schrager-Lavelle et�al., 2017). Because of their influence on yield, inflorescence and spikelet architecture are primary targets for cereal crop improvement (Schilling et�al., 2018).
In general, whole-genome duplication (WGD) is regarded as a driving force in the evolution of flowering plants (Schranz et�al., 2012; Murat et�al., 2017; Clark and Donoghue, 2018; Wendel et al., 2018; Soltis et�al., 2019; Escudero and Wendel, 2020), because it provides genomic opportunities for evolutionary innovations through sub-functionalization or neo-functionalization of duplicated genes. In particular, the grass family has undergone three rounds of WGD events (τ, σ, and ρ; Tang et�al., 2010; Ming et�al., 2015; McKain et�al., 2016). The first two events are shared with other monocots. The third ρ-WGD event is assumed to be unique to the Poaceae and believed to contribute to the vast adaptive radiation of the family over time. It is shared by all extant grass species based on analyses of the transcriptome from Streptochaeta (within the Anomochlooideae subfamily, sister to all other extant grasses) and whole genomes from core grasses (McKain et�al., 2016). However, no similar analysis has been conducted on members of the two other sister lineages to the core grasses (Pharoideae and Puelioideae), particularly at the whole genomic level.
Although the Poaceae remains the most frequently sequenced family of flowering plants (e.g. Yu et�al., 2002; Paterson et�al., 2009; Schnable et�al., 2009; The International Brachypodium Initiative, 2010; VanBuren et�al., 2015; Varshney et al., 2017; IWGSC, 2018; Ling et�al., 2018; Pardo et�al., 2020), uncertainty remains regarding the functional significance and timing of the ρ-WGD event. This is due to a lack of genomic sequences of any of the extant successive sister lineages to the core grasses. Thus far, nearly all efforts at whole-genome sequencing have focused predictably on economically important species in the core clades of grasses with an emphasis on cereals. Whole-genome comparisons of genome structure and gene content between core Poaceae and successive sister lineages may provide missing pieces of the puzzle for understanding the role of ρ-WGD in the early evolution of grasses.
To fill the gap, we present genome sequences of Pharus latifolius, obtained using PacBio single-molecule real-time sequencing technology. This species is presumed to be diploid with a chromosome number of 2n = 24 (Davidse and Pohl, 1972; Judziewicz et�al., 1999). The Pharus genus belongs to the Pharoideae subfamily (Clark and Judziewicz, 1996). This genus represents a lineage that diverged from the core Poaceae tens of millions of years ago based on the oldest macrofossil known for the grass family, that of a spikelet in 35- to 40-million-year old amber (Poinar and Columbus, 1992). There are only seven extant Pharus species, restricted to humid forests of Mesoamerica (Judziewicz et�al., 1999; Kellogg, 2015; Soreng et�al., 2017). Floral morphology in this genus shows unique traits. These include one unisexual floret per spikelet, three minute and rudimentary lodicules present only in the male spikelet, and each perfect floret containing three stigmas and six stamens. Some of these traits, such as the six stamens in a grass floret, may represent ancestral traits in Poaceae (Judziewicz et�al., 1999; Sajo et�al., 2007; Kellogg, 2015). Therefore a comparison of the P. latifolius genome to those of core lineages within the Poaceae may provide insight into ancestral genomic characteristics and early evolution of the grasses. In particular, the P. latifolius genome offers an opportunity to identify and investigate genomic features that may have contributed to the origin and evolution of the spikelet. By clarifying the timing and genomic consequences of the ρ-WGD event in P. latifolius within a phylogenetic framework, we inferred the timing of its origin and genomic evolution at both chromosomal and genic levels following polyploidization. This genome will serve as an important resource for comparative analyses of grass and other plant genomes and, more specifically, in future studies manipulating the biology of cereal crops.
Results
Genome sequencing, assembly, and annotation
Pharus latifolius has an estimated genome size of ∼1,135 Mb and 1,034 Mb based on flow cytometry and k-mer statistics (Supplemental Figure 1), respectively. By integrating ∼159-Gb PacBio long reads, ∼110-Gb Illumina short paired-end reads, and ∼130 Gb high-throughput chromosome conformation capture (Hi-C) data (Supplemental Tables 1 and 2), we generated a chromosome-level assembly of P. latifolius. The genome was assembled into 535 contigs with an N50 size of 5.09 Mb and a total length of 1,002.88 Mb (Table�1). Genome completeness was estimated at about 92.6%. Over 99.6% of the assembly was anchored onto 12 pseudo-chromosomes ranging from 59.91 Mb to 96.85 Mb (Figure�1A). To evaluate the assembly quality, we mapped Illumina reads back to the assembly revealing a mapping efficiency of 99.8%. The completeness and contiguity of the assembly were also validated by assessment of the long-terminal repeat (LTR) retrotransposons completeness using the LTR Assembly Index (LAI; Ou et�al., 2018). This provided a LAI value of 13.7, which is comparable to those of published reference genomes of grasses (Figure�1B).
Table 1.
Overview of genome assembly and gene annotation for P. latifolius
| Assembly size | 1,002.88 Mb |
| Number of contigs | 535 |
| N50 length (contig) | 5.09 Mb |
| Longest contig | 20.38 Mb |
| Number of pseudo-chromosomes | 12 |
| Pseudo-chromosomes size | 999.73 Mb |
| GC content | 44.3% |
| Repeats | 790.91 Mb (78.9%) |
| Number of protein-coding genes | 32,007 |
| Number of genes with annotation | 25,728 (80.4%) |
| Number of noncoding RNAs | 1,100 |
| Complete BUSCOs | 97.7% |
Figure 1.
Genome organization of P. latifolius and comparative genomic analyses among grass genomes. A, Distribution of LTR retrotransposons with Gypsy and Copia (percent nucleotides per 1 Mb), protein-coding genes (number per Mb), and GC content (percent per 1 Mb) along the 12 pseudo-chromosomes of P. latifolius. B, Genome size variations among P. latifolius and four other diploid core grasses showing the differential content of repeat elements. The LAI is shown for each genome in violin plots. C, Insertion time of LTR estimated using the paired-end sequences of LTR retrotransposons for P. latifolius and four other grass genomes. D, Distribution of GC3 content of protein-coding genes for grasses and representative flowering plants. The weak bimodal distribution in P. latifolius is highlighted in the filled curve. E, Venn diagram showing the numbers of shared and unique gene families between P. latifolius and the two major clades, BOP and PACMAD, of core grasses.
In a combination of ab initio and homology-based gene prediction methods, we annotated 32,007 protein-coding genes together with 1,100 noncoding RNAs (microRNAs, transfer RNAs, and ribosomal RNAs; see Supplemental Table 3) in the P. latifolius genome. Approximately 80.4% of the predicted protein-coding genes were assigned functional annotations (Supplemental Table 4). For the annotated genes, 97.7% of the 1,440 Plantae Benchmarking Universal Single-Copy Orthologs (BUSCO) genes could be identified (Supplemental Table 5), indicating the high quality of genome assembly and annotation. On average, the protein-coding genes in P. latifolius encode transcripts of 1,320 bp with 4.7 exons and proteins of 356 amino acids, similar to those of other grass genomes (Supplemental Table 6).
Evolution of repeats, GC content, and gene families
While similar to the majority of diploid grasses (Gaut, 2002), the P. latifolius genome proved much larger than sequenced diploid genomes of four core grasses we analyzed including sorghum (Sorghum bicolor, Panicoideae), rice (Oryza sativa, Oryzoideae), Brachypodium distachyon (Pooideae), and Oropetium thomaeum (Chloridoideae). Transposable elements (TEs) content is indicated as the major cause of this genome size variation (Figure�1B). Nearly 78.9% of the P. latifolius genome consisted of TE sequences (790.91 Mb), compared to 62.8% in S. bicolor, 44.1% in O. thomaeum, 32.3% in O. sativa, and 28% in B. distachyon. The majority of TEs were LTR retrotransposons, of which the Gypsy and Copia families comprised 46.7% and 7.6% of the P. latifolius genome (Supplemental Table 7), respectively. In the insertion time analyses of LTR retrotransposons, P. latifolius experienced a major wave of retrotransposition around 0.5 Ma after a smaller wave of ∼3–4 Ma (Figure�1C). In contrast, the burst of LTR retrotransposons in our other core species happened more recently, with all <0.5 Ma. Similar patterns were observed when the Gypsy and Copia families were separately analyzed (Supplemental Figure 2). Together, these observations indicated a low activity of recent LTR retrotransposons amplification in the P. latifolius genome. Moreover, LTR retrotransposons and Gypsy, in particular, tend to accumulate in the heterochromatic pericentromeric regions with a high density. They appear unevenly distributed along a single chromosome in the P. latifolius genome (Figure�1A). There is also a particularly sharp transition between Gypsy-rich and gene-rich regions.
We plotted the distribution of GC3 content (G or C in the third codon position) in P. latifolius and compared it with those of core grasses and representative flowering plants (Figure�1D; Supplemental Figure 3). From unimodal distribution in Amborella trichopoda and Arabidopsis thaliana to bimodal distribution in core grasses, P. latifolius showed two peaks although the second peak was less distinctive. This weak bimodal distribution was somewhat similar to those of pineapple (Ananas comosus, Bromeliaceae, Poales) and banana (Musa acuminata, Musaceae, Zingiberales) while not the seagrass (Zostera marina, Zosteraceae) from more distantly related monocot Order, Alismatales (Figure�1D).
Orthologous genes among P. latifolius and 13 representative species spanning five core grass subfamilies (Bambusoideae, Oryzoideae, Pooideae, Chloridoideae, and Panicoideae), as well as the two outgroup taxa (A. comosus and M. acuminata) were clustered into 23,272 groups. Among them, 7,497 ones were shared by all 16 species (Supplemental Figure 4). The P. latifolius genome had the least number of orthogroups of 14,574 within Poaceae and 22.8% of its genes (7,300) were unassigned to any groups. This suggested a high level of gene divergence in P. latifolius compared to the core grasses. Furthermore, the proportion of un-assignment in P. latifolius was in the range of 7.3% and 25.2% estimated for the remaining grasses but much larger than an average value of 16.8% for them. In comparison to A. comosus and M. acuminata, a total of 594 orthogroups specific for the Poaceae were identified (Supplemental Figure 5). All 14 grass species shared a core set of 8,716 orthogroups (Figure�1E). The P. latifolius shared 13 orthogroups exclusively with all of the 8 BOP grasses but shared none with the 5 PACMAD species. In contrast, a particularly large number of 602 unique orthogroups were shared by the 13 core grasses. Intriguingly, many of these orthogroups, and those specific to the whole family, had gene ontology (GO) annotations related to plant reproduction. This included floral development, pollination, pollen-pistil interactions, embryonic, and post-embryonic development (Supplemental Data Set 1). Such results may underlie the origin and evolution of those reproductive traits unique to grasses.
We further analyzed the expansion and contraction of gene families during the origin and evolution of grasses focusing on two key nodes, the common ancestor of P. latifolius and core grasses and the common ancestor of all core grasses. There were more expanded gene families than contracted gene families for both nodes, and the number was 524 and 257 for the first node and 698 and 177 for the second one (Supplemental Figure 6). In addition, there were more gene families contracted (2,779) compared to those expanded (929) in P. latifolius.
Cretaceous origin of grasses and the slowly evolving P. latifolius genome
To study the evolution of grass genomes within a phylogenetic context, we used 480 single-copy genes from the orthogroups identified above that were common to all 16 species for phylogenetic inference. The resulting phylogeny (Figure�2) corroborates known phylogenetic relationships within the grass family (Soreng et�al., 2017) and we obtained full bootstrap support for the placement of the P. latifolius lineage outside of the core Poaceae lineage including the BOP and PACMAD clades. We noted that the branch leading to P. latifolius is the shortest among the grasses (Figure�2B), indicating a slow rate of protein sequence evolution. A Tajima’s relative rate test (Tajima, 1993) also confirmed a significantly slower substitution rate in P. latifolius compared to the remaining species (P < 0.001; Chi-square test; Supplemental Table 8).
Figure 2.
Phylogeny and the time-scale evolution of the grass family. A, Estimated divergence times for the grass family based on 480 single-copy genes from 14 grasses representing six subfamilies and two outgroups, showing the origin of this family in the Cretaceous and divergence of its major lineages around the K–Pg boundary (∼65 Ma). All nodes in the phylogeny received 100% bootstrap support values in the maximum likelihood analysis. B, Identical phylogeny as in (A) with branch lengths given in substitution per site. The short branch in red for P. latifolius shows the low rate of molecular evolution for this species.
Divergence time analyses suggested that the Pharus lineage originated ∼89.9 Ma [95% confidence interval (CI) 112.9–70.9 Ma] and the core BOP and PACMAD clades diverged from each other at ∼73.6 (95% CI 93.3–58.6) Ma in the Upper Cretaceous (Figure�2A; Supplemental Figure 7). The three BOP subfamilies arose in a short period from ∼67.8 (95% CI 86.5–54.3) to ∼62 (95% CI 79.1–49.7) Ma. These estimated ages indicate an early origin of the Poaceae in Cretaceous and radiation of the core grass lineages around the Cretaceous–Paleogene (K–Pg) boundary after a ∼16 million years lag.
The ρ-WGD event pre-dated the divergence of the Pharoideae and the diversification of the core grasses
To assess the WGD history of the grass family, we plotted the distributions of synonymous substitutions per synonymous site (Ks) for paralogous genes within the genomes of P. latifolius and the other five grasses, as well as for orthologous genes between P. latifolius and these grasses, and outgroup A. comosus. A clear peak at the Ks value of 0.52 for paralogs in collinear regions was shown for P. latifolius (Figure�3A). Similar patterns with larger peak values from 0.79 to 1.09 were also observed for the five core grasses, pointing to a possible common WGD event (Supplemental Figure 8). A comparison of the P. latifolius Ks peak of WGD with orthologs of P. latifolius–A. comosus and P. latifolius–O. sativa indicates that this WGD event postdated the segregation between P. latifolius and A. comosus but was closely aligned with the P. latifolius–O. sativa divergence. However, comparisons of WGD in core grasses with speciation events all suggested that the ρ-WGD occurred before the divergence of P. latifolius and core grasses (Figure�3A; Supplemental Figures 8 and 9). These results reflected a likely single WGD event (ρ) in a common ancestor of P. latifolius and the core grasses. The nonoverlapped WGD peaks could be explained by varied mutation rates among species (Wang et�al., 2015) with P. latifolius evolving more slowly, thereby retaining the smallest Ks peak value for it.
Figure 3.
Inference of WGD and the evolution of associated duplicated genes in the grass genomes. A, Distribution of synonymous substitution per site (Ks) of paralogues in collinear regions of P. latifolius and O. sativa and of orthologues between P. latifolius and O. sativa, and the outgroup A. comosus. The major peaks within genomes might point to a single WGD event and the non-overlap is likely due to different substitution rates between P. latifolius and O. sativa. B, Phylogenomic analysis of the WGD event in P. latifolius. Numbers above the line are the number of gene families with duplicate gene pairs from P. latifolius supporting the duplication events and those below are the number of pairs. The numbers in parentheses are those requiring duplicate gene pairs to be present in five core grasses. The majority of duplicated genes in P. latifolius coalesced on the grass stem branch. All mapped duplication events have bootstrap value ≥95% in the gene trees and result with bootstrap value ≥80% is shown in Supplemental Figure 11. C, The selective pressure on the retained duplicated genes from ρ-WGD in different grass genomes. The Ka/Ks for P. latifolius is significantly higher than other grasses. D, A model of time-scale evolution of ρ-WGD and duplicated genes in the Poaceae. The numbers along the branch are the total number of duplicated genes with one gene copy lost and the rate per million years (My) over the interval. The red star denotes the occurrence of the ρ-WGD event, generating duplicated genes as represented by two colored lines.
In addition, a second but minor peak at a smaller value around 0.1 is evident in the Ks distribution for P. latifolius paralogs (Figure�3A). A similar phenomenon has been observed in analyses of the O. sativa, S. bicolor, and B. distachyon genomes (Paterson et�al., 2009; Wang et�al., 2011), and was attributed to concerted evolution of a homoeologous chromosome segments derived from ρ-WGD and thus lower sequence divergences for paralogous genes. Hence, this concerted evolution may have occurred in parallel within the core grass and sister lineages following a common WGD event, lending additional support for the shared ρ-WGD in all grasses.
The WGD event in P. latifolius and shared with core grasses was also evident from the intra- and inter-genomic synteny analyses. The dot plot of the P. latifolius genome showed extensive collinearity with 9,786 genes (30.6% of the total) located in syntenic blocks (Supplemental Figure 10). An inter-genomic comparison between P. latifolius and O. sativa showed a clear 2:2 syntenic pattern (Figure�4A), consistent with the shared ρ-WGD event.
Figure 4.
Synteny and chromosome evolution of the grass genomes at the subfamilial level following WGD. A, Dot plot demonstrating the conserved synteny between the genomes of P. latifolius and O. sativa. Intergenomic alignments are in different colors for different chromosomes and intragenomic in black with an apparent 2:2 pattern. B, Synteny comparisons of chromosome 4 of the AGK to corresponding chromosomes in the descending grass genomes. Inversions and translocations are highlighted by red links. C, A grass evolutionary scenario of chromosomes from the AGK showing the massive rearrangements following ρ-WGD. The numbers on the branch are fissions (fi) or fusions (fu) relative to the AGK and those below are numbers of protein-coding genes in each genome. The red triangle indicates independent WGD event for the bamboo lineage.
We further employed a phylogenomic approach to place ρ-WGD onto the phylogeny of the grass family and infer its timing. We extracted paralogs in collinear regions arising from WGD in the P. latifolius genome and identified their orthologous genes from the five core grass species, as well as the two outgroup species and estimated phylogenetic tree for each gene family (Supplemental Data Set 2). In the overwhelming majority of gene families (430 out of 496), the duplicated genes of P. latifolius coalesced on the branch connecting the P. latifolius and core grasses (Figure�3B; Supplemental Figures 11 and 12). A similar pattern was obtained when requiring duplicate gene pairs to be present in all the five core grasses (Figure�3B). This provided strong evidence that P. latifolius and the core grasses shared the ρ-WGD event. Using the gene families with duplicated genes supporting ρ-WGD, we deduced that this event happened approximately 98.2 (95% CI 115.7–82.7) Ma (Figure�3C; Supplemental Figure 13). This dated the event to ∼8.3 million years before the divergence of the P. latifolius lineage.
Evolution of duplicated genes
There were duplicate gene pairs between 3,483 and 4,893 retained within each genome and the largest number was found in P. latifolius (Supplemental Table 9) for genomes of P. latifolius compared to the five core grasses without additional WGD after the common ρ-WGD event. Nearly half of these duplicate gene pairs ranging from 49.9% to 56.5% had a relatively high GC3 content (≥75%) in the core grasses, while P. latifolius remained low at 32.7%. Using the P. latifolius genome as a reference, we identified 31,913 syntenic ortholog–paralog gene clusters among these six grasses (Supplemental Data Set 3) and inferred the retention and loss of WGD duplicated genes in their evolutionary history (Figure�3D; Supplemental Figure 14). Gene loss occurred mostly in the ancestor common to P. latifolius and the core grasses with 5,858 duplicate gene pairs losing one copy prior to species divergence. Following this event, P. latifolius lost one copy for every 1,755 duplicate gene pairs over a period of approximately 90 million years. In contrast, a nearly two-fold higher loss rate was detected along the branch connecting to the core grasses (Figure�3D).
To assess selective pressure for duplicate gene pairs we used a pairwise test of Ka/Ks (the ratio of nonsynonymous to synonymous substitutions) on P. latifolius and the five core grasses. The overwhelming majority of duplicated genes in all the species were under purifying selection with ω (Ka/Ks) less than one having a median value ranging from 0.190 in O. thomaeum to 0.272 in P. latifolius (Figure�3C; Supplemental Table 10 and Supplemental Data Set 4). Intriguingly, P. latifolius had a significantly higher median value of ω than in any of the core grasses (Supplemental Table 10), despite its lower rate of molecular evolution. This suggested different levels of selective constraint for duplicated genes that were retained in P. latifolius and the core grasses with elevated ω in P. latifolius indicating a relaxed purifying selection toward more neutral-like evolution. Moreover, there were 106 duplicate gene pairs in P. latifolius with a value of ω larger than one (Supplemental Data Set 4). This is a signal of possibly adaptive positive selection, perhaps associated with neofunctionalization or subfunctionalization at the protein level.
Chromosome evolution following WGD
The chromosome-level assembly of the P. latifolius genome provided an opportunity to investigate the early evolution of chromosomes in grasses following the ρ-WGD event. The dot plot between genomes of P. latifolius and O. sativa that is considered as the reference genome of the grass family (Murat et�al., 2017) revealed well-persevered synteny at the chromosome level (Figure�4A). The only major change was a reciprocal translocation between chromosomes 1 and 7 of P. latifolius. Even small genomic rearrangements such as inversion proved limited between the two genomes and only one inversion event was observable in chromosome 2 of P. latifolius. Additionally, significant P. latifolius and O. sativa homologous quartets clearly exhibited the shared ρ-WGD event (Figure�4A). The P. latifolius genome differentiated from the reconstructed ancestral grass karyotype (AGK; Murat et�al., 2017) by two fissions and two fusions involving chromosomes 1 and 7 (Figure�4C; Supplemental Figure 15). Chromosome 1 of AGK may be a hotspot for rearrangements, as evidenced by inversion between ρ-WGD-derived chromosomes 1 and 5 in O. sativa (Supplemental Figure 16), as well as recurrent inversion and translocation events for it in B. distachyon, O. thomaeum, and S. bicolor. The rearrangements in P. latifolius may indicate a different chromosome organization of ancestral grasses following ρ-WGD from the reconstructed AGK or, more parsimoniously, are lineage specific and P. latifolius shared this AGK with core grasses.
We further reconstructed the grass chromosome evolution and rearrangements from the AGK at the subfamilial level with seven representative species (Figure�4C; Supplemental Figure 15). The fewest chromosomal rearrangements were recorded in O. sativa from the Oryzoideae. The moso bamboo (Phyllostachys edulis, Bambusoideae) had 2n = 4x = 48 chromosomes due to recent tetraploidization (Peng et�al., 2013; Guo et�al., 2019) evolving with 11 fissions and 10 fusions. The other five species in the three subfamilies Chloridoideae, Panicoideae, and Pooideae all underwent similar massive rearrangements with S. bicolor showing the second fewest changes. However, no common chromosomal rearrangement could be identified among the core grass subfamilies following their diversification from the AGK, even for those species within a single subfamily. This indicates that the AGK has probably remained evolutionarily static for as long as ∼42 million years following the split of P. latifolius and the lineage leading to Pooideae (and other core grass lineages). Therefore, the majority if not all of the rearrangements are lineage specific occurring within subfamilies. On the other hand, the small genomic rearrangements of inversion and intra-chromosomal translocation could not be captured by reconstructing the corresponding relationship between chromosomes of AGK and extant genomes (Figure�4C). A visualization of chromosome synteny recovered the occurrence of inversions and translocations in P. latifolius and O. sativa and inferred changes in genome structure were particularly frequent in B. distachyon, O. thomaeum, and S. bicolor (Figure�4B; Supplemental Figure 17). This largely corroborated the trends of rearrangements involving different chromosomes for different grasses.
Gene families related to grass spikelet and floret development
The first emergence of a distinctive grass spikelet in the P. latifolius lineage (Figure�5A) and its sequenced genome opens a unique window into the origin and early evolution of this key trait in the Poaceae. We identified six subfamily VIII members of APETALA2 (AP2), a superfamily of transcription factors including FRIZZY PANICLE (FZP) and BRANCHED SILKLESS1 (BD1; Chandler, 2018), from the P. latifolius genome (Supplemental Figure 18). Interestingly, the clade of FZP and BD1 contains only one member for diploid grasses or multiple members in the polyploid species such as the tetraploid bamboo whereas there are two gene copies present in P. latifolius (Figure�5B; Supplemental Figure 19). One copy (B) has a single exon of 285 amino acids with a gene structure like FZP and BD1 (Chuck et�al., 2002; Komatsu et�al., 2003). The other one (A) is 7,053 bp with a 6,171-bp intron, encoding 293 amino acids. Both sequences of the two copies were verified by Sanger sequencing through PCR (Polymerase chain reaction) amplifications (Supplemental Table 11). On the gene tree, copy A presents a close relationship with a third gene copy of P. edulis, for which only two copies were expected given its tetraploid status, receiving 100% bootstrap support and comprising a clade sister to the other grass genes (Figure�5B). Moreover, copy A with the third gene copy of P. edulis shared the identical motifs seen in genes sampled from other monocots, eudicots, and Amborella and Nymphaea (Figure�5B). However, copy B acquired motifs unique to species in the Poaceae. In addition, there are special motifs for bamboos and PACMAD grasses. We found that copy B with typical motifs of grasses showed a very low expression in all sampled tissues with transcripts per million (TPM) <1 and thus were considered unexpressed (Figure�5D). In contrast, copy A was expressed specifically in young inflorescences of P. latifolius.
Figure 5.
Evolution and expression of FZP-like and MADS-box gene families underlying the origin of the grass spikelet. A, Paired male and female spikelets on the inflorescence (upper) with detailed pictures of male (lower left) and female (lower right) florets of P. latifolius, showing the characters of the grass spikelet, some of which, such as the presence of six stamens, may be ancestral. B, Phylogeny and evolution of FZP-like genes in the grass family with representative plants as outgroups. It is notable that there are two FZP-like gene copies in the P. latifolius genome with one sharing the unique motifs of grasses and the other together with one copy from the moso bamboo P. edulis having the identical motifs to those of other plants. The PACMAD grasses and bamboos also have distinct motifs. The AP2 domain labeled is conserved among all species. The numbers on the branch indicate bootstrap values equal or larger than 60%. C, The ABCDE MADS-box genes identified in different grass genomes and other representative flowering plants, showing the increase in number of genes associated with the ρ-WGD event. D, Gene expression profiles of FZP-like genes and ABCDE MADS-box genes across leaf, shoot, and inflorescence at different developmental stages in P. latifolius. Expression values are scaled by log2(TPM). YIa, young inflorescence of ≤5 mm; YIb, young inflorescence of 5–10 mm; FS, female spikelet; MS, male spikelet; DI, developed inflorescence with flowers at anthesis.
Although the gene tree topology supported that the two FZP-like genes in P. latifolius might be associated with the ρ-WGD event (Figure�5B), they are not located in the collinear region derived from genome duplication. Both of them were derived from the same chromosome 7 of P. latifolius and only copy B is located in a conserved syntenic region among grass genomes (Supplemental Figure 20). Considering that there are also two members identified in the genome of the closely related A. comosus (Figure�5B), we may assume that the presence of two FZP-like genes in P. latifolius are indictive of an ancestral state in the Poaceae with one of them diverging through newly acquired motifs.
A total of 52 MADS-box genes were identified in the P. latifolius genome (Supplemental Table 12 and Supplemental Data Set 5), much less than an average of 72 genes for core grasses. This was attributed primarily to expanded Type I MADS-box genes in the core grasses, resulting from lineage-specific tandem gene duplications (Supplemental Data Set 5). On the other hand, compared to two other monocots (A. comosus and Z. marina), the expansion of Type II MADS-box genes in grasses was also evident in P. latifolius, due primarily to ρ-WGD with six duplicate gene pairs retained since the genome duplication (Figure�5C). Moreover, these gene pairs are retained in almost all grasses. An interesting exception is for the AGL6-like genes as only P. latifolius and O. sativa kept the duplicated copies (Reinheimer and Kellogg, 2009). The Type II MADS-box genes in P. latifolius are expressed mainly in inflorescence tissues with a wide expression range from 1 to 1,536 TPM (Figure�5D; Supplemental Data Set 6). These expression profiles largely agree with their putative roles in reproductive organ development (Gramzow and Theissen, 2010; Callens et�al., 2018). For example, the expression of P. latifolius D class STK-like genes are restricted to the female spikelets and associated with ovule development. In turn, B class AP3-like and PI-like genes controlling lodicule and stamen development (Whipple et al., 2007; Schrager-Lavelle et�al., 2017) are strongly expressed in the male spikelets. This was predictable as lodicules developed in male but not in female spikelets (Judziewicz et�al., 1999; Sajo et�al., 2007). In general, the six MADS-box genes with retained duplicated pairs exhibit divergence in expression patterns between two copies (Figure�5D). Notably, the P. latifolius AGL6-like OsMADS17 homolog has consistently lower expression levels than the other homoeolog shared with all grasses.
Discussion
The genome of O. sativa (Yu et�al., 2002) was the first to be sequenced among cereal crops and became one of the earliest reference genomes for plants. At present, all of the major cereal crops (e.g. wheat, maize, and sorghum) have had their genomes sequenced, as well as other grasses used as model plants such as B. distachyon and O. thomaeum (Paterson et�al., 2009; Schnable et�al., 2009; The International Brachypodium Initiative, 2010; VanBuren et�al., 2015; IWGSC, 2018). However, all of these species belong to the clade comprising the core grasses within the family. As a member of a sister clade to the core grasses, the P. latifolius genome bridges an important gap in studying the origin and early evolution of this important family of angiosperms. The assembled P. latifolius genome with contig N50 of 5.09 Mb is of high quality as evaluated by LAI index, Illumina reads mapping and BUSCO analysis. It should become a valuable genomic resource to the grass community on genome evolution and biology research in future.
Compared to sequenced genomes from core grasses (Paterson et�al., 2009; Schnable et�al., 2009; The International Brachypodium Initiative, 2010; VanBuren et�al., 2015; IWGSC, 2018; Ling et�al., 2018), the P. latifolius genome has been evolving relatively slowly. The protein-coding genes of P. latifolius evolved more slowly compared to O. sativa, regarded previously as showing the lowest molecular rate of evolution within Poaceae (Wang et�al., 2015; Murat et�al., 2017). Slow evolution in P. latifolius is also reflected by low activity of recent LTR retrotransposition, limited chromosomal rearrangements, and fewer gene losses for duplicated genes from ρ-WGD. The P. latifolius genome appears to retain ancestral features of grass genomes as observed in the GC3 content. In contrast to the typical bimodal distribution of GC3 content in the core grasses (Carels and Bernardi, 2000), both P. latifolius and Streptochaeta angustifolia (representing Anomochlooideae, the sister lineage to all extant grasses) show a weak bimodal distribution, underlying the transition from unimodal distribution in other monocots to bimodal distribution in core grasses (McKain et�al., 2016). This is consistent with the evolutionary trend of genomic GC content in monocots where there is an increase in GC content for the grass family (Šmarda et�al., 2014). Moreover, the increased GC3 content in the core grasses may be related to the ρ-WGD event but it is not evident in P. latifolius genes.
In line with a generally stable genome, the genus Pharus shows a slow rate of morphological evolution, retaining certain traits likely characteristic of ancestral grasses, particularly in inflorescences and spikelets (Judziewicz et�al., 1999; Kellogg, 2001; Schrager-Lavelle et�al., 2017). This genus shows little to no change in the female spikelet compared to a fossil spikelet dated 35–40 Ma (Poinar and Columbus, 1992). This is hypothesized to be due to a long-term stable habitat within the tropical understory, likely where the first grasses originated and Pharus species are still found today (Judziewicz et�al., 1999; Kellogg, 2015).
The genomic and morphological features of P. latifolius reflect its phylogenetic position, which was inferred previously to have a much earlier divergence from the core grasses based on plastid or a few nuclear loci (Clark et�al., 1995; GPWG, 2001; Soreng et�al., 2017). Its position is further confirmed here using a large-scale dataset of single-copy nuclear genes. Based on phytoliths from the late Cretaceous (Prasad et�al., 2005), we estimated the crown age of Poaceae to be at least 90 Ma. This is compatible with Gallaher et�al. (2019) and recently described microfossils identified as sister to core grasses (101–113 Ma, Wu et�al., 2018). While calibrations using phytoliths must be interpreted with caution (Christin et�al., 2014), it now looks like that the origin and diversification of Poaceae may not be associated with the K–Pg boundary as previously suggested (Vanneste et�al., 2014). Instead, the actual radiation of core grass lineages appears related to this mass extinction event. Under this timescale, phylogenomic dating placed the ρ-WGD of grasses at about 98.2 Ma, earlier than previous estimates at ∼70–97 Ma (The International Brachypodium Initiative, 2010; Vanneste et�al., 2014; Wang et�al., 2015; Murat et�al., 2017; Clark and Donoghue, 2018).
It has long been suggested that the origin of grasses was predated by a whole-genome duplication named ρ-WGD (Paterson et�al., 2004; Tang et�al., 2010; Ming et�al., 2015; McKain et�al., 2016). With the P. latifolius genome sequenced here we used multiple lines of evidence particularly from phylogenetic analyses of duplicate gene pairs to support this hypothesis, demonstrating that this event was shared by at least the core grasses and the Pharoideae sister lineage. A WGD is usually followed by a fractionation (diploidization) processes at both chromosomal and genic levels (Murat et�al., 2017; Wendel et al., 2018; Escudero and Wendel, 2020). Inclusion of the P. latifolius genome is not only essential for the accurate timing of ρ-WGD but also allows for a more rigorous assessment of the diploidization process before and after the split of the P. latifolius lineage, thus providing further insight into the evolution of grass genomes following WGD.
Five or seven protochromosomes before ρ-WGD have been proposed, and there is general agreement that an intermediate AGK contains 12 chromosomes (Salse et�al., 2008; Murat et�al., 2017). The transition from duplicated 10 or 14 chromosomes to 12 may have occurred in a shorter period of ∼8 million years following WGD but before the split of P. latifolius as estimated here. The 12 intermediate chromosomes are also supported by the well-conserved synteny between chromosomes of P. latifolius and AGK. Nevertheless, more genomes from other lineages sister to the core grass lineages and sister families of the Poaceae (e.g. Ecdeiocoleaceae and Joinvilleaceae; Hochbach et�al., 2018) are needed to test this hypothesis. It is likely that these genomes will also clarify the chromosome number before ρ-WGD. After the split of P. latifolius the chromosomes of AGK did not change much until the emergence of core grass subfamilies. Instead, massive and lineage-specific chromosomal rearrangements happened within these subfamilies (The International Brachypodium Initiative, 2010; Murat et�al., 2017; Ling et�al., 2018).
Similarly, gene loss accelerated immediately following ρ-WGD as demonstrated in diverse neopolyploids (e.g. Zhao et�al., 2017; Wendel et al., 2018; Chen et�al., 2019, 2020; Escudero and Wendel, 2020). Accordingly, the majority of gene losses occurred in the common ancestor of grasses before the P. latifolius split (Paterson et�al., 2009). Gene loss continued after divergence of P. latifolius but the rate declined sharply, parallel to the trend observed for the chromosomal rearrangements. Intriguingly, the loss rate on the lineage leading to the core grasses is about twice that for the P. latifolius lineage. This indicates different process of genome fractionation that may be related to contrasting species diversity within the core and successive sister lineages (Clark and Donoghue, 2018; Wendel et al., 2018). Thus more duplicate gene pairs were retained while selective pressures relaxed within the P. latifolius genome compared to core grass genomes.
WGD is pervasive throughout the evolution of flowering plants but how it contributed to species diversification and macro-evolution remains elusive (Clark and Donoghue, 2018; Wendel et al., 2018; Soltis et�al., 2019; Escudero and Wendel, 2020). Generally, a time lag is expected between the WGD event and radiation (Schranz et�al., 2012). This “WGD lag-time” model is predicted at approximately 8 or 25 million years between ρ-WGD and the split of P. latifolius or the initial diversification of grass core lineages. In the first interval, the ancestor genome of grasses was likely to have been subjected to a rapid diploidization process with the “diploid state” restored. This process continued, likely resulting in contrasting patterns of chromosomal rearrangements and gene loss for core and sister grass lineages. Thus we consider that genome fractionation rather than ρ-WGD per se was the general driving force behind the radiation of grasses; WGD laid the foundation by providing raw genetic material. Moreover, the K–Pg mass extinction (Vajda and Bercovici, 2014) might have acted as an environmental trigger for this radiation with new species filling vacated niches. On the other hand, the distinctive inflorescence and spikelet, a key innovation of grasses, may have also played a role in radiation through modifications of reproductive traits related to breeding systems and seed dispersal (Clayton and Renvoize, 1986; Kellogg, 2001; Kellogg, 2015; Linder et�al., 2018).
For the origin and evolution of this key innovation in grasses, we find two FZP-like genes present in the P. latifolius genome. The extra copy is validated by PCR sequencing and also supported by transcriptional evidence with an expression pattern as observed in other grasses (Chuck et�al., 2002; Komatsu et�al., 2003; Derbyshire and Byrne, 2013; Dobrovolskaya et�al., 2015; Poursarebani et�al., 2015). Therefore, the extra copy rather than the common one to all the grasses may be involved in the ontogeny of the P. latifolius inflorescence and spikelet. This is congruent with the similarity between the inflorescence morphology of P. latifolius, i.e. several rounds of branching and spikelets bearing on short branchlets (Judziewicz et�al., 1999; Sajo et�al., 2007), and mutant phenotypes of FZP-like genes in B. distachyon, Hordeum vulgare, O. sativa, and Zea mays (Chuck et�al., 2002; Komatsu et�al., 2003; Derbyshire and Byrne, 2013; Dobrovolskaya et�al., 2015; Poursarebani et�al., 2015). Notably, the moso bamboo producing the so-called pseudo-spikelets, combining features of branches and spikelets (Judziewicz et�al., 1999; Li et�al., 2006), also has a similar extra copy of FZP-like genes, following the prediction of Kellogg (2015). Although we could not ascribe it specifically to the ρ-WGD event, the presence of two copies of FZP-like genes may represent an ancestral state resulting in potential neo-functionalization for one copy in the early grass evolution. This hypothesis warrants further investigation. It would be interesting to determine their respective functions with both copies present. Furthermore, the increased number of MADS-box genes in the Poaceae is indeed associated with ρ-WGD, as well as tandem duplications in the core grasses. As the lineage with the first occurrence of a real spikelet (GPWG, 2001; Kellogg, 2001) the functional diversification of these genes underlying spikelet origin was supposed to have occurred before the divergence of P. latifolius and was then followed by lineage-specific evolution towards developmental complexity of spikelets in allied lineages (Whipple et al., 2007; Reinheimer and Kellogg, 2009; Bartlett et�al., 2015; Schrager-Lavelle et�al., 2017; Callens et�al., 2018).
By filling the genomic gap between the origin of the Poaceae and diversification of the core grasses with the high-quality assembly, annotation, and analysis of the P. latifolius genome, we demonstrate how this genome evolved following ρ-WGD to contribute to the origin and diversification of grasses. The P. latifolius genome will also be a valuable reference to aid future work on cereal genomics and biology.
Materials and methods
Plant materials and sequencing
Plants of P. latifolius originally from a population in Costa Rica, Central America, were grown in the greenhouse of Germplasm Bank of Wild Species with accession number CRB03-4-1 at the Kunming Institute of Botany for sequencing. The voucher specimen (LDZ2020042) has been deposited at the Herbarium of Kunming Institute of Botany (KUN). Fresh leaves for flow cytometry analysis were used to estimate C-value. Total DNA for genome sequencing was extracted from young leaves, using a modified cetyltrimethylammonium bromide protocol (Allen et�al., 2006). We also collected leaves, shoots, and inflorescences at different developmental stages and young male and female spikelets for transcriptome sequencing.
A combination of Illumina, PacBio, and Hi-C sequencing was performed for P. latifolius. Short insert of 300–500-bp libraries were prepared for sequencing on the Illumina HiSeq X-Ten platform and the generated reads were used to estimate genome size based on k-mer frequency distribution. The transcriptome and Hi-C libraries were also sequenced on the Illumina HiSeq platform for 150-bp paired-end reads. For long-read sequencing, approximately 20-kb SMRTbell libraries were sequenced on PacBio Sequel system (Pacific Biosciences).
Genome assembly and annotation
Before assembly, the raw long reads were filtered to remove those of low-quality and short length. The resulting reads were used for error correction and assembled using Falcon v1.8.7 (Chin et�al., 2016) with optimized paraments of pa_HPCdaligner_option = -v -B188 -t12 -e.75 -k18 -h280 -l2800 -w8 -M24 -s1000, falcon_sense_option = –output_multi–min_idt 0.70 –min_cov 4 –max_n_read 300, ovlp_HPCdaligner_option = -v -B128 -k22 -h1280 -e.96 -l3200 -s1000 -T16, and overlap_filtering_setting = –max_diff 50 –max_cov 70 –min_cov 1 –bestn 10. Initially, we polished the draft assemblies with PacBio long reads using ARROW (Chin et�al., 2013) and then used highly accurate Illumina paired-end reads to further correct the assembly using the Pilon program v1.22 (https://github.com/broadinstitute/pilon). For the chromosome-level assembly, high-quality Illumina read pairs obtained through Hi-C sequencing were mapped onto the draft assembled contigs. Only uniquely mapped and valid di-tags Hi-C read pairs were analyzed and used to construct pseudo-chromosomes based on chromatin interactions. Lachesis software (https://github.com/shendurelab/LACHESIS) with parameters cluster_min_re_sites = 200, cluster_max_link_density = 3.6, cluster_noninformative_ratio = 2.8, order_min_n_res_in_trunk = 100, and order_min_n_res_in_shreds = 100 were applied to hierarchically cluster the assembled contigs, under manual correction if necessary, into 12 pseudo-chromosomes (Supplemental Figure 21). Finally, the quality of the assemblies was evaluated with BUSCO v3.0.1 (Sim�o et�al., 2015) and by mapping Illumina reads against the assembly with bwa-0.7.12 in default parameters (Li and Durbin, 2009). In addition, the LAI score indicating assembly continuity by full-length LTR retrotransposons (Ou et�al., 2018) was calculated for the assembled genome sequences of P. latifolius, as well as for B. distachyon, O. thomaeum, O. sativa, and S. bicolor for comparison.
The TEs were annotated through homology-based and de novo predictions using RepeatModeler v1.0.11 (http://www.repeatmasker.org/RepeatModeler/), LTRharvest v1.5.10 (Ellinghaus et�al., 2008) and LTR_FINDER v1.07 (Xu and Wang, 2007). We then used RepeatMasker v4.0.5 (http://repeatmasker.org/) to annotate and mask the genome. The resulting outputs of LTRharvest and LTR_FINDER were fed into the LTR retriever programmer v2.8.7 (Ou and Jiang, 2018) to extract the final full-length LTR retrotransposons. The age of these LTR retrotransposons was determined by obtaining evolutionary distance between the LTRs using the Jukes–Cantor model (Jukes and Cantor, 1969), which was converted by a substitution rate. The rate used is 1.3 � 10−8 mutations per site per year for O. sativa (Ma and Bennetzen, 2004) and adjusted for different grasses according to the ratio of their individual WGD peak value to that of O. sativa (Wang et�al., 2015).
A comprehensive strategy was employed by combining results obtained from the homology-based prediction, ab initio prediction, and RNAseq-based prediction methods to annotate protein-coding genes. Protein-coding genes of B. distachyon, O. sativa, and S. bicolor were used for homology-based annotation with GeneWise v2.4.1 (Birney et�al., 2004). Different ab initio gene prediction programs including Augustus v3.2.3 (Stanke et�al., 2006), Geneid v1.4.4 (Blanco et�al., 2007), and GlimmerHMM v3.0.4 (Majoros et�al., 2004) were used. We aligned the RNA-sequencing data of leaves, shoots, and inflorescences to the genome using TopHat v2.1.1 (Trapnell et�al., 2009), and the alignments were used as input for Cufflinks v2.2.1 (Trapnell et�al., 2012). Finally, gene model data from different methods were combined by EvidenceModeler (EVM) v1.1.1 (Haas et�al., 2008) into a nonredundant set of gene annotations. These annotated genes were named following rules used for genes in the O. sativa genome (http://rice.plantbiology.msu.edu/analyses_nomenclature.shtml). Functional annotations of protein-coding genes were obtained by aligning them to publicly available databases and InterProScan v5.2-45.0 (Jones et�al., 2014) was used to annotate the motifs and domains. For noncoding RNA genes annotation, the tRNA genes were searched by tRNAscan-SE v1.3.1 (Lowe and Eddy, 1997) and other types were predicted by search from the Rfam database. The distribution of genomic features along chromosomes was drawn using the program RIdeogram v0.2.2 (Hao et�al., 2020).
Gene family analysis
To identify gene family groups, we analyzed protein coding genes from 14 grasses representing six subfamilies in the Poaceae and two outgroup taxa. We downloaded genome and annotation data from Phytozome (https://phytozome.jgi.doe.gov/) for A. comosus (v3), B. distachyon (v3.1), H. vulgare (v1), O. sativa (v7), Seteria italica (v2.2), S. bicolor (v2.1), and Z. mays (v4); Ensembl (http://plants.ensembl.org/) for Aegilops tauchii (v4.0), Leersia perrieri (v1.4), and Triticum urartu (ASM34745v1); GIGAdb (http://gigadb.org/dataset/) for Cenchrus americanus (v1.1) and P. edulis, and of M. acuminata (v2, https://banana-genome-hub.southgreen.fr/), O. latifolia (http://www.genobank.org/bamboo), and O. thomaeum (v2, https://genomeevolution.org/coge). The OrthoFinder v2.3.3 in default parameters (Emms and Kelly, 2019) was used to identify orthologous clusters to perform a comparative analysis among P. latifolius and core grasses. For specific gene families to grasses, we conducted GO annotations with O. sativa genes as a reference. CAFE v5.0 (De Bie et�al., 2006; 10.5281/zenodo.3625141) with a stochastic birth-and-death model was used to identify expansion and contraction of gene families along the evolution of the grass family based on identified orthogroups. The analysis was carried out using local gene family evolutionary rates with one parameter class for the terminal branch of P. latifolius and stem branch of grasses, a second one for terminal branches of outgroup taxa and their associated internal branches, and all other grasses and internal branches to a third one. We ran the software with a Poisson distribution for the root frequency (parameter -p) and using 1,000 re-samples to calculate P-values at 0.05 significant level.
Phylogenetic analysis and divergence time estimation
A total of 480 single-copy orthologous groups found between all 16 species above were identified for phylogenetic analysis. Multiple sequence alignments of individual genes were performed by MAFFT v7.407 using the E-INS-I algorithm (Katoh and Toh, 2008) and then concatenated with a total length of 462,456 bp. We generated phylogeny on the alignment as a single partition using maximum likelihood (ML) method in RAxML v0.9.0 (Stamatakis, 2014) with the GTR (General time reversible) + Gamma model. The support values of the tree were calculated by running 1,000 bootstrap analysis. The relative rates of evolution between P. latifolius and core grasses were evaluated with A. comosus as the outgroup using the sequence alignments of 480 single-copy genes.
To estimate the origin and evolutionary timescale of grasses, we calibrated a relaxed molecular clock with two phytolith-based age constraints for the phylogeny. One was for the crown-group of Poaceae at >65 Ma and the other >47.8 Ma for the crown-group of subfamily Pooideae (Prasad et�al., 2005), and the root age of the tree was set to a maximum age of 150 Ma. The divergence time was estimated using the program MCMCtree in the PAML package v4.9h (Yang, 2007) based on only four-fold-degenerate site data derived from the 480-gene alignments. The Markov chain Monte Carlo process was run for 2 million iterations with samples drawn every 100 steps following a 10% of iterations as burn-in. We ran two independent runs for convergence and checked for sufficient sampling.
WGD and duplicated genes analyses
To investigate the WGD history of P. latifolius, we first searched for genome wide duplications in the assembled genome. We performed self-alignment of the genome sequences using Large-Scale Genome Alignment Tool (LAST; http://last.cbrc.jp/). Synteny within the genome was then identified using MCScan (Python version, JCVI utility libraries v0.8.12; Tang et�al., 2008) in full mode including one-to-one quota synteny blocks and reciprocal best match.
We further constructed the Ks distributions for paralogues following the approach used by Vanneste et�al. (2013) with the wgd suite (Zwaenepoel and Van de Peer, 2019). Similar analyses were also done for the core grasses B. distachyon, O. latifolia, O. sativa, O. thomaeum, and S. bicolor and for the outgroup A. comosus. The complete set of paralogous genes (paranome) within each of these genomes were identified by all-versus-all BLASTP with e-value at 1 � 10−10. Gene families were built using the Markov cluster (MCL) algorithm (Enright et�al., 2002) and were aligned in MUSCLE version 3.8.1551 (Edgar, 2004). The Ks values for all pairwise comparisons within a gene family were calculated with the ML method using the model of Nielsen and Yang (1998) as implemented in CodeML of the PAML package. We also plotted the Ks distributions for paralogs located in collinear regions identified by i-ADHoRe v3.0 (Proost et�al., 2012) within the genome. To compare the relative timing of WGD event and the split of P. latifolius from the core grasses and the outgroup, we plotted the Ks distributions for one-to-one orthologs pairs between P. latifolius and B. distachyon, O. latifolia, O. sativa, O. thomaeum, and S. bicolor, as well as A. comosus.
We finally performed phylogenomic analyses as described in (Zhang et al., 2017) to determine whether the identified WGD event was shared by P. latifolius and core grasses. Briefly, we included the above six grasses and outgroup taxa A. comosus and M. acuminata for identification of gene families using OrthoFinder. Those gene families with no more than 100 genes that had at least one pair of duplicated genes in collinear regions from P. latifolius and one gene from A. comosus and M. acuminata were selected for gene tree inference. In result, we obtained 1,864 gene families containing 1,934 duplicate gene pairs in the P. latifolius genome. Sequence alignments for each gene family were done in MAFFT and tree building required FastTree v2.1.10 (Price et�al., 2010) in the default set. The occurrence of the duplication event for each gene pair on the grass phylogeny was then inferred by mapping the gene trees onto the species tree, which was built by RAxML software based on eight taxa and their shared 480 single-copy genes. Finally, there were 996 gene families with 1,013 gene pairs that could be placed without ambiguity as duplication events on the species tree and bootstrap value for the node directly connecting gene pair on the gene tree was used as support for a duplication event. Among them, 504 or 841 gene pairs received support values equal or larger than 95% or 80%. We also employed these gene families containing at least one pair of duplicated genes from P. latifolius and all the other five grasses for a similar analysis, and the number was 130 or 221 depending on bootstrap values of 95% or 80%. Subsequently, 219 gene pairs coalesced on the stem lineage of the grass family supported by at least 80% bootstrap values were chosen for inferring the timing of the WGD event. The analysis was performed with MCMCtree in the same way and based on the secondary calibration point from the estimated divergence time of the Poaceae above.
We assembled a data set of syntenic orthologous genes (Supplemental Data Set 3) among P. latifolius, B. distachyon, O. latifolia, O. sativa, O. thomaeum, and S. bicolor with corresponding ρ-WGD paralogs within each genome identified as above. Then this data set was used to count the number of one copy lost for ρ-WGD duplicated genes along the phylogeny of Poaceae (Supplemental Figure 14).
Chromosome evolution
To investigate the chromosome evolution of grass genomes, we selected representative species (Figure�4C) from six subfamilies in the Poaceae with chromosome-level genome assembly. We identified homologous proteins between extant genomes and the reconstructed AGK (Murat et�al., 2017) using LAST and detected the syntenic blocks with MCScan as described above. Subsequently, dot plots of synteny were drawn and the chromosomal rearrangements were reconstructed.
Identification of AP2 and MADS-box gene families and gene expression analysis
The predicted genes of selected representative plant genomes were searched for AP2 (PF00847 domain) and MADS-box (PF00319 domain) gene families using hmmsearch in HMMER v3.2.1 with -E 1e−10 –domE 1e−10 (Eddy, 2011). The genome and annotation data for A. trichopoda (v1.0), A. thaliana (TAIR10), and Z. marina (v2.2) were downloaded from Phytozome, and for Azolla filiculoides (v1.1) from ftp.fernbase.org and Nymphaea colorata from http://bigd.big.ac.cn. Extracted sequences were further checked for protein domains using InterproScan v5.2-45.0. The protein sequences of these resulting candidates together with those from A. thaliana and O. sativa for multiple sequence alignment with MUSCLE v3.8.1551 (Edgar, 2004) and tree building was done in FastTree v2.1.10 in the default set. Meanwhile, the syntenic relationships of the chromosomal regions surrounding these identified genes between different grass genomes were visualized and checked. The conserved motifs for the FZP-like genes were searched by MEME v5.0.5 (Bailey and Elkan, 1994) using zero or one occurrence per sequence, maximum number of motif = 18, minimum width = 6, and maximum width = 50.
RNA-seq data derived from leaves, shoots, and inflorescences at three developmental stages, and young male and female spikelets with at least three biological duplicates were mapped to the P. latifolius genome using HISAT2 v2.2.1 (Kim et�al., 2015) using default parameters to generate gene-level counts. Gene expressions of all P. latifolius genes in each sample were estimated by StringTie v2.1.2 (Pertea et�al., 2015) and quantified as values of TPM.
Accession numbers
Raw sequence data, whole-genome assembly, and transcriptomes of P. latifolius were deposited under the BioProject PRJNA682003 in the National Center for Biotechnology Information database. The genome assembly and gene annotations were deposited at CoGe database (https://genomevolution.org/coge/) under genome ID 60161. The genome assembly, annotation files and multiple sequence alignments used in phylogenetic analyses and tree files are also available at http://www.genobank.org/grass.
Supplemental Data
Supplemental Figure 1. Genome size estimation of P. latifolius based on flow cytometry (A) and k-mer analysis (B).
Supplemental Figure 2. Insertion time of Gypsy and Copia retrotransposons for P. latifolius and the other four grass genomes.
Supplemental Figure 3. Distribution of GC3 content of protein-coding genes for Amborella trichopoda and four core grasses.
Supplemental Figure 4. Results of gene family clustering among genomes of 14 grasses and outgroup taxa Ananas comosus and Musa acuminata.
Supplemental Figure 5. The Venn diagram showing the numbers of shared and unique gene families between Poaceae and outgroup taxa Ananas comosus and Musa acuminata.
Supplemental Figure 6. Phylogenetic tree showing the significantly (p < 0.05) expanded or contracted gene families along the evolution of the grass family.
Supplemental Figure 7. Divergence time estimates for the grass family with 95% confidence interval.
Supplemental Figure 8. Distribution of synonymous substitution per site (Ks) for the whole paranome (grey) and paralogues in collinear regions (black) in genomes of six grasses.
Supplemental Figure 9. Distribution of synonymous substitution per site (Ks) of orthologues between P. latifolius and other species.
Supplemental Figure 10. Dot plot map showing syntenic relationships within the P. latifolius genome.
Supplemental Figure 11. Phylogenomic analysis of paralogues in collinear regions of P. latifolius as in Figure�3B with duplication events supported by bootstrap value ≥80%.
Supplemental Figure 12. Example of gene tree with duplicate gene pairs (green star) of P. latifolius coalescing on the stem branch of the grasses (green circle).
Supplemental Figure 13. Timing of ρ-WGD estimated from duplicate gene pairs with 95% confidence interval.
Supplemental Figure 14. A model of time-scale evolution of ρ-WGD and duplicated genes in the Poaceae as in Figure�3D with species names given (A) and how the number of gene loss for duplicated genes was calculated (B).
Supplemental Figure 15. Dot plot maps between 8 genomes of extant grasses and ancestral grass karyotype (AGK) showing syntenic relationships.
Supplemental Figure 16. Dot plot maps showing syntenic relationships within four grass genomes and the red box indicating the alignment of homologous chromosome pair of 1 and 5 derived from ρ-WGD.
Supplemental Figure 17. Syntenic map between individual chromosomes of AGK and five grasses.
Supplemental Figure 18. Phylogenetic tree of AP2 b members from different plant species.
Supplemental Figure 19. The same phylogenetic tree of FZP-like genes as in Figure�5B with gene IDs shown.
Supplemental Figure 20. Micro-collinearity patterns among genomic regions surrounding the FZP-like genes and their homologues among grass genomes highlighted in orange link.
Supplemental Figure 21. Characterization of Hi-C contact matrix for the 12 pseudo-chromosomes of P. latifolius with dark red dots showing high probability of interaction and light yellow showing low probability of interaction.
Supplemental Table 1. Summary of genome sequencing and read output for P. latifolius.
Supplemental Table 2. Summary of Hi-C reads mapping to the P. latifolius assembled sequences.
Supplemental Table 3. Summary of predicted non-coding RNAs in the P. latifolius genome.
Supplemental Table 4. Functional annotation of the predicted genes in the P. latifolius genome.
Supplemental Table 5. Result of the evaluation of the P. latifolius genome by BUSCO.
Supplemental Table 6. Comparison of gene numbers and features of P. latifolius with those of five core grass genomes from different subfamilies.
Supplemental Table 7. Summary of repetitive DNA content in the P. latifolius genome.
Supplemental Table 8. Results of Tajima’s relative rate test between P. latifolius and 13 core grasses using 480 single-copy genes and Ananas comosus as outgroup.
Supplemental Table 9. Number of retained duplicate gene pairs from ρ-WGD and their GC3 content in genomes of P. latifolius and five core grasses.
Supplemental Table 10. Summary of the ratio of non-synonymous to synonymous substitutions (ω) for duplicate gene pairs within genomes of P. latifolius and five core grasses.
Supplemental Table 11. Primes used for validation of FZP-like genes in the P. latifolius genome.
Supplemental Table 12. MADS-box genes in grasses and other representative flowering plants.
Supplemental Data Set 1. List of gene ontology (GO) terms for grass- and core grass-specific gene families.
Supplemental Data Set 2. The 1,864 gene families with 1,934 duplicate gene pairs from P. latifolius used for gene tree inference.
Supplemental Data Set 3. Identified syntenic ortholog-paralog gene clusters between P. latifolius and core grasses.
Supplemental Data Set 4. Results of Ka and Ks for retained duplicate gene pairs in grasses.
Supplemental Data Set 5. List of MADS-box genes in grasses and Ananas comosus.
Supplemental Data Set 6. Expression of FZP-like and ABCDE MADS-box genes in different tissues of P. latifolius.
Supplementary Material
Acknowledgments
We thank Zu-Chang Xu for taking photographs, and Yang Yang, Cen Guo, and Xia-Ying Ye for assistance in plant sampling, and Hong-Tao Li for computational support. We thank Peter Bernhardt for critical reading and restructuring of the manuscript.
Funding
This work was supported by the grant from the Strategic Priority Research Program of Chinese Academy of Sciences (No. XDB31000000), the National Natural Science Foundation of China (No. 31770239), Youth Innovation Promotion Association of Chinese Academy of Sciences (No. Y201972), and facilitated by the Germplasm Bank of Wild Species.
Conflict of interest statement. None declared.
D.-Z. L. and P.-F. M. designed the research. P.-F. M., J.-X. L., and J. H. collected and sequenced the plant material. Y.-L. L. and P.-F. M. planned and carried out analysis with the help of Z.-H. G. and H. W.. P.-F. M. wrote the first draft of the manuscript with input from all authors, particularly D.-Z. L. and Y.-L. L.
The author responsible for distribution of materials integral to the findings presented in this article in accordance with the policy described in the Instructions for Authors (https://academic.oup.com/plcell) is: De-Zhu Li (dzl@mail.kib.ac.cn).
References
- Allen GC, Flores-Vergara MA, Krasynanski S, Kumar S, Thompson WF (2006) A modified protocol for rapid DNA isolation from plant tissues using cetyltrimethylammonium bromide. Nat Protoc 1: 2320–2325 [DOI] [PubMed] [Google Scholar]
- Bailey TL, Elkan C (1994) Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc Int Conf Intell Syst Mol Biol 2: 28–36 [PubMed] [Google Scholar]
- Bartlett ME, Williams SK, Taylor Z, DeBlasio S, Goldshmidt A, Hall DH, Schmidt RJ, Jackson DP, Whipple CJ (2015) The maize PI/GLO ortholog Zmm16/sterile tassel silky ear1 interacts with the zygomorphy and sex determination pathways in flower development. Plant Cell 27: 3081–3098 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Birney E, Clamp M, Durbin R (2004) GeneWise and genomewise. Genome Res 14: 988–995 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Blanco E, Parra G, Guigo R (2007) Using geneid to identify genes. Curr Protoc Bioinformatics, Chapter 4: Unit 4.3 [DOI] [PubMed] [Google Scholar]
- Callens C, Tucker MR, Zhang D, Wilson ZA (2018) Dissecting the role of MADS-box genes in monocot floral development and diversity. J Exp Bot 69: 2435–2459 [DOI] [PubMed] [Google Scholar]
- Carels N, Bernardi G (2000) Two classes of genes in plants. Genetics 154: 1819–1825 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chandler JW (2018) Class VIIIb APETALA2 Ethylene response factors in plant development. Trends Plant Sci 23: 151–162 [DOI] [PubMed] [Google Scholar]
- Chen Z, Omori Y, Koren S, Shirokiya T, Kuroda T, Miyamoto A, Wada H, Fujiyama A, Toyoda A, Zhang S, et al. (2019) De novo assembly of the goldfish (Carassius auratus) genome and the evolution of genes after whole-genome duplication. Sci Adv 5: eaav0547. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen ZJ, Sreedasyam A, Ando A, Song Q, De Santiago LM, Hulse-Kemp AM, Ding M, Ye W, Kirkbride RC, Jenkins J, et al. (2020) Genomic diversifications of five Gossypium allopolyploid species and their impact on cotton improvement. Nat Genet 52: 525–533 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chin CS, Alexander DH, Marks P, Klammer AA, Drake J, Heiner C, Clum A, Copeland A, Huddleston J, Eichler EE (2013) Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nat Methods 10: 563–569 [DOI] [PubMed] [Google Scholar]
- Chin CS, Peluso P, Sedlazeck FJ, Nattestad M, Concepcion GT, Clum A, Dunn C, O’Malley R, Figueroa-Balderas R, Morales-Cruz A, et al. (2016) Phased diploid genome assembly with single-molecule real-time sequencing. Nat Methods 13: 1050–1054 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Christin PA, Spriggs E, Osborne CP, Stromberg CAE, Salamin N, Edwards EJ (2014) Molecular dating, evolutionary rates, and the age of the grasses. Syst Biol 63: 153–165 [DOI] [PubMed] [Google Scholar]
- Chuck G, Muszynski M, Kellogg E, Hake S, Schmidt RJ (2002) The control of spikelet meristem identity by the branched silkless1 gene in maize. Science 298: 1238–1241 [DOI] [PubMed] [Google Scholar]
- Clark JW, Donoghue PCJ (2018) Whole-genome duplication and plant macroevolution. Trends Plant Sci 23: 933–945 [DOI] [PubMed] [Google Scholar]
- Clark LG, Judziewicz EJ (1996) The grass subfamilies Anomochlooideae and Pharoideae (Poaceae). Taxon 45: 641–645 [Google Scholar]
- Clark LG, Zhang W, Wendel JF (1995) A phylogeny of the grass family (Poaceae) based on ndhF sequence data. Syst Bot 20: 436–460 [Google Scholar]
- Clayton WD, Renvoize SA (1986) Genera Graminum: grasses of the world. Kew Bull Add Ser 13: 1–389 [Google Scholar]
- Clifford HT (1987) Spikelet and floral morphology. InSoderstrom TR, Hilu KW, Campbell CS, Barkworth ME, eds, Grass Systematics and Evolution. Smithsonian Institute, Washington, DC, pp 21–30 [Google Scholar]
- Davidse G, Pohl RW (1972) Chromosome numbers and notes on some Central American grasses. Botany 50: 273–283 [Google Scholar]
- De Bie T, Cristianini N, Demuth JP, Hahn MW (2006) CAFE: a computational tool for the study of gene family evolution. Bioinformatics 22: 1269–1271 [DOI] [PubMed] [Google Scholar]
- Derbyshire P, Byrne ME (2013) More SPIKELETS1 is required for spikelet fate in the inflorescence of Brachypodium. Plant Physiol 161: 1291–1302 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dobrovolskaya O, Pont C, Sibout R, Martinek P, Badaeva E, Murat F, Chosson A, Watanabe N, Prat E, Gautier N, et al. (2015) Frizzy panicle drives supernumerary spikelets in bread wheat. Plant Physiol 167: 189–199 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Eddy SR (2011) Accelerated profile HMM searches. PLoS Comput Biol 7: e1002195. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32: 1792–1797 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ellinghaus D, Kurtz S, Willhoeft U (2008) LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons. BMC Bioinform 9: 18 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Emms DM, Kelly S (2019) OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol 20: 238. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Enright AJ, Dongen SV, Ouzounis CA (2002) An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res 30: 1575–1584 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Escudero M, Wendel JF (2020) The grand sweep of chromosomal evolution in angiosperms. New Phytol 228: 805–808 [DOI] [PubMed] [Google Scholar]
- Gaut BS (2002) Evolutionary dynamics of grass genomes. New Phytol 154: 15–28 [Google Scholar]
- Gallaher TJ, Adams DC, Attigala L, Burke SV, Craine JM, Duvall MR, Klahs PC, Sherratt E, Wysocki WP, Clark LG (2019) Leaf shape and size track habitat transitions across forest-grassland boundaries in the grass family (Poaceae). Evolution 73: 927–946 [DOI] [PubMed] [Google Scholar]
- GPWG (The Grass Phylogeny Working Group) (2001) Phylogeny and subfamilial classification of the grasses (Poaceae). Ann Mo Bot Gard 88: 373 [Google Scholar]
- Gramzow L, Theissen GJGB (2010) A Hitchhiker’s guide to the MADS world of plants. Genome Biol 11: 214. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Guo ZH, Ma P-F, Young G, Hu J-Y, Liu Y-L, Xia E-H, Zhong M, Zhao L, Sun G-L, Xu Y, et al. (2019) Genome sequences provide insights into the reticulate origin and unique traits of woody bamboos. Mol Plant 12: 1353–1365 [DOI] [PubMed] [Google Scholar]
- Haas BJ, Salzberg SL, Zhu W, Pertea M, Allen JE, Orvis J, White O, Buell CR, Wortman JR (2008) Automated eukaryotic gene structure annotation using EVidenceModeler and the program to assemble spliced alignments. Genome Biol 9: R7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hao Z, Lv D, Ge Y, Shi J, Weijers D, Yu G, Chen J (2020) RIdeogram: drawing SVG graphics to visualize and map genome-wide data on the idiograms. PeerJ Comput Sci 6: e251. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hochbach A, Linder HP, R�ser M (2018) Nuclear genes, matK and the phylogeny of the Poales. Taxon 67: 521–536 [Google Scholar]
- The International Brachypodium Initiative (2010) Genome sequencing and analysis of the model grass Brachypodium distachyon. Nature 463: 763–768. [DOI] [PubMed] [Google Scholar]
- IWGSC AppelsR, Eversole K, Feuillet C, Keller B, Rogers J, Stein N (2018) Shifting the limits in wheat research and breeding using a fully annotated reference genome. Science 361: eaar7191. [DOI] [PubMed] [Google Scholar]
- Jones P, Binns D, Chang H-Y, Fraser M, Li W, McAnulla C, McWilliam H, Maslen J, Mitchell A, Nuka G, et al. (2014) InterProScan 5: genome-scale protein function classification. Bioinformatics 30: 1236–1240 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Judziewicz EJ, Clark LG, Londo�o X, Stern MJ (1999) American Bamboos. Smithsonian Institution Press, Washington and London [Google Scholar]
- Jukes TH, Cantor CR (1969) Evolution of protein molecules. InMunro HN, ed, Mammalian Protein Metabolism. Academic Press, New York [Google Scholar]
- Katoh K, Toh H (2008) Recent developments in the MAFFT multiple sequence alignment program. Brief Bioinformatics 9: 286–298 [DOI] [PubMed] [Google Scholar]
- Kellogg EA (2001) Evolutionary history of the grasses. Plant Physiol 125: 1198–1205 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kellogg EA (2015) The Families and Genera of Vascular Plants. Flowering Plants. Monocots. Poaceae, Vol. XIII. Springer, Berlin, Germany [Google Scholar]
- Kim D, Langmead B, Salzberg SL (2015) HISAT: a fast spliced aligner with low memory requirements. Nat Methods 12: 357–360 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Komatsu M, Chujo A, Nagato Y, Shimamoto K, Kyozuka J (2003) Frizzy panicle is required to prevent the formation of axillary meristems and to establish floral meristem identity in rice spikelets. Development 130: 3841–3850 [DOI] [PubMed] [Google Scholar]
- Li DZ, Wang ZP, Zhu ZD, Xia NH, Jia LZ, Guo ZH, Yang GY, Stapleton CMA (. 2006) Bambuseae (Poaceae). InWu ZY, Raven PH, Hong DY, eds, Flora of China, Vol. 22. Science Press, Beijing and Missouri Botanical Garden Press, St Louis, MI [Google Scholar]
- Li H, Durbin R (2009) Fast and accurate short read alignment with burrows–wheeler transform. Bioinformatics 25: 1754–1760 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Linder HP, Lehmann CER, Archibald S, Osborne CP, Richardson DM (2018) Global grass (Poaceae) success underpinned by traits facilitating colonization, persistence and habitat transformation. Biol Rev 93: 1125–1144 [DOI] [PubMed] [Google Scholar]
- Ling HQ, Ma B, Shi X, Liu H, Dong L, Sun H, Cao Y, Gao Q, Zheng S, Li Y, et al. (2018) Genome sequence of the progenitor of wheat A subgenome Triticum urartu. Nature 557: 424–428 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lowe TM, Eddy SR (1997) tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res 25: 955–964 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ma J, Bennetzen JL (2004) Rapid recent growth and divergence of rice nuclear genomes. Proc Natl Acad Sci USA 101: 12404–12410 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Majoros WH, Pertea M, Salzberg SL (2004) TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders. Bioinformatics 20: 2878–2879 [DOI] [PubMed] [Google Scholar]
- McKain MR, Tang H, McNeal JR, Ayyampalayam S, Davis JI, dePamphilis CW, Givnish TJ, Pires JC, Stevenson DW, Leebens-Mack JH (2016) A phylogenomic assessment of ancient polyploidy and genome evolution across the Poales. Genome Biol Evol 8: 1150–1164 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ming R, Vanburen R, Wai CM, Tang H, Schatz MC, Bowers JE, Lyons EH, Wang ML, Chen J, Biggers E (2015) The pineapple genome and the evolution of CAM photosynthesis. Nat Genet 47: 1435–1442 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Murat F, Armero A, Pont C, Klopp C, Salse J (2017) Reconstructing the genome of the most recent common ancestor of flowering plants. Nat Genet 49: 490–496 [DOI] [PubMed] [Google Scholar]
- Nielsen R, Yang Z (1998) Likelihood models for detecting positively selected amino acid sites and applications to the HIV-1 envelope gene. Genetics 148: 929–936. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ou S, Chen J, Jiang N (2018) Assessing genome assembly quality using the LTR assembly index (LAI). Nucleic Acids Res 46: e126. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ou S, Jiang N (2018) LTR_retriever: a highly accurate and sensitive program for identification of long terminal repeat retrotransposons. Plant Physiol 176: 1410–1422 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pardo J, Man Wai C, Chay H, Madden CF, Hilhorst HWM, Farrant JM, VanBuren R (2020) Intertwined signatures of desiccation and drought tolerance in grasses. Proc Natl Acad Sci USA 117: 10079–10088 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Paterson AH, Bowers JE, Bruggmann R, Dubchak I, Grimwood J, Gundlach H, Haberer G, Hellsten U, Mitros T, Poliakov A, et al. (2009). The Sorghum bicolor genome and the diversification of grasses. Nature 457: 551–556 [DOI] [PubMed] [Google Scholar]
- Paterson AH, Bowers JE, Chapman B (2004) Ancient polyploidization predating divergence of the cereals, and its consequences for comparative genomics. Proc Natl Acad Sci USA 101: 9903–9908 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Peng Z, Lu Y, Li L, Zhao Q, Feng Q, Gao Z, Lu H, Hu T, Yao N, Liu K, et al. (2013) The draft genome of the fast-growing non-timber forest species moso bamboo (Phyllostachys heterocycla). Nat Genet 45: 456–461 [DOI] [PubMed] [Google Scholar]
- Pertea M, Pertea GM, Antonescu CM, Chang T-C, Mendell JT, Salzberg SL (2015) StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat Biotechnol 33: 290–295 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Poursarebani N, Seidensticker T, Koppolu R, Trautewig C, Gawroński P, Bini F, Govind G, Rutten T, Sakuma S, Tagiri A (2015) The genetic basis of composite spike form in barley and ‘Miracle-Wheat’. Genetics 201: 155–165 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Poinar GO, Columbus JT (1992) Adhesive grass spikelet with mammalian hair in Dominican amber: first fossil evidence of epizoochory. Experientia 48: 906–908 [DOI] [PubMed] [Google Scholar]
- Prasad V, Stromberg CAE, Alimohammadian H, Sahni A (2005) Dinosaur coprolites and the early evolution of grasses and grazers. Science 310: 1177–1180 [DOI] [PubMed] [Google Scholar]
- Proost S, Fostier J, Witte DD, Dhoedt B, Demeester P, Van de Peer Y, Vandepoele K (2012) i-ADHoRe 3.0—fast and sensitive detection of genomic homology inextremely large data sets. Nucleic Acids Res 40: e11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Price MN, Dehal PS, Arkin AP (2010) FastTree 2–approximately maximum-likelihood trees for large alignments. PLoS One 5: e9490. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Reinheimer R, Kellogg EA (2009) Evolution of AGL6-like MADS box genes in grasses (Poaceae): ovule expression is ancient and palea expression is new. Plant Cell 21: 2591–2605 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sajo MDG, Longhiwagner HM, Rudall PJJ (2007) Floral development and embryology in the early-divergent grass Pharus. Int J Plant Sci 168: 181–191 [Google Scholar]
- Salse J, Bolot S, Throude M, Jouffe V, Piegu B, Quraishi UM, Calcagno T, Cooke R, Delseny M, Feuillet C (2008) Identification and characterization of shared duplications between rice and wheat provide new insight into grass genome evolution. Plant Cell 20: 11–24 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schilling S, Pan S, Kennedy A, Melzer R (2018) MADS-box genes and crop domestication: the jack of all traits. J Exp Bot 69: 1447–1469 [DOI] [PubMed] [Google Scholar]
- Schnable PS, Ware D, Fulton RS, Stein JC, Wei F, Pasternak S, Liang C, Zhang J, Fulton L, Graves TA, et al. (2009) The B73 maize genome: complexity, diversity, and dynamics. Science 326: 1112–1115 [DOI] [PubMed] [Google Scholar]
- Schrager-Lavelle A, Klein H, Fisher A, Bartlett M (2017) Grass flowers: an untapped resource for floral evo-devo. J Syst Evol 55: 525–541 [Google Scholar]
- Schranz ME, Mohammadin S, Edger PP (2012) Ancient whole genome duplications, novelty and diversification: the WGD radiation lag-time model. Curr Opin Plant Biol 15: 147–153 [DOI] [PubMed] [Google Scholar]
- Sim�o FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM (2015) BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31: 3210–3212 [DOI] [PubMed] [Google Scholar]
- Šmarda P, Bures P, Horova L, Leitch IJ, Mucina L, Pacini E, Tichy L, Grulich V, Rotreklova O (2014) Ecological and evolutionary significance of genomic GC content diversity in monocots. Proc Natl Acad Sci USA 111: E4096–E4102 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Soltis PS, Folk RA, Soltis DE (2019) Darwin review: angiosperm phylogeny and evolutionary radiations. Proc R Soc B-Biol Sci 286: 20190099 [Google Scholar]
- Soreng RJ, Peterson PM, Romaschenko K, Davidse G, Teisher JK, Clark LG, Barber� P, Gillespie LJ, Zuloaga FO (2017) A worldwide phylogenetic classification of the Poaceae (Gramineae) II: an update and a comparison of two 2015 classifications. J Syst Evol 55: 259–290 [Google Scholar]
- Stamatakis A (2014) RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30: 1312–1313 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stanke M, Keller O, Gunduz I, Hayes A, Waack S, Morgenstern B (2006). AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res 34: W435–W439 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tajima F (1993) Simple methods for testing the molecular evolutionary clock hypothesis. Genetics 135: 599–607 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tang H, Bowers JE, Wang X, Ming R, Alam M, Paterson AH (2008) Synteny and collinearity in plant genomes. Science 320: 486–488 [DOI] [PubMed] [Google Scholar]
- Tang H, Bowers JE, Wang X, Paterson AH (2010) Angiosperm genome comparisons reveal early polyploidy in the monocot lineage. Proc Natl Acad Sci USA 107: 472–477 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Trapnell C, Pachter L, Salzberg SL (2009) TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25: 1105–1111 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Trapnell C, Roberts A, Goff L, Pertea G, Kim D, Kelley DR, Pimentel H, Salzberg SL, Rinn JL, Pachter L (2012) Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat Protoc 7: 562–578 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vajda V, Bercovici A (2014) The global vegetation pattern across the Cretaceous–Paleogene mass extinction interval: a template for other extinction events. Glob Planet Change 122: 29–49 [Google Scholar]
- VanBuren R, Bryant D, Edger PP, Tang H, Burgess D, Challabathula D, Spittle K, Hall R, Gu J, Lyons E, et al. (2015) Single-molecule sequencing of the desiccation-tolerant grass Oropetium thomaeum. Nature 527: 508–511 [DOI] [PubMed] [Google Scholar]
- Vanneste K, Baele G, Maere S, Van de Peer Y (2014) Analysis of 41 plant genomes supports a wave of successful genome duplications in association with the Cretaceous-Paleogene boundary. Genome Res 24: 1334–1347 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vanneste K, Van de Peer Y, Maere S (2013) Inference of genome duplications from age distributions revisited. Mol Biol Evol 30: 177–190 [DOI] [PubMed] [Google Scholar]
- Varshney RK, , Shi C, , Thudi M, , Mariac C, , Wallace J, , Qi P, , Zhang H, , Zhao Y, , Wang X, , Rathore A ( 2017)� Pearl millet genome sequence provides a resource to improve agronomic traits in arid environments. Nat Biotechnol 35: 969–976 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang X, Tang H, Paterson AH (2011) Seventy million years of concerted evolution of a homoeologous chromosome pair, in parallel, in major Poaceae lineages. Plant Cell 23: 27–37 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang X, Wang J, Jin D, Guo H, Lee TH, Liu T, Paterson AH (2015) Genome alignment spanning major poaceae lineages reveals heterogeneous evolutionary rates and alters inferred dates for key evolutionary events. Mol Plant 8: 885–898 [DOI] [PubMed] [Google Scholar]
- Wendel JF, , Lisch D, , Hu G, , Mason AS ( 2018)� The long and short of doubling down: polyploidy, epigenetics, and the temporal dynamics of genome fractionation. Curr Opin Genet Dev 49: 1–7. 10.1016/j.gde.2018.01.004 [DOI] [PubMed] [Google Scholar]
- Whipple CJ, , Zanis MJ, , Kellogg EA, , Schmidt RJ ( 2007) Conservation of B class gene expression in the second whorl of a basal grass and outgroups links the origin of lodicules and petals. Proc Natl Acad Sci U S A 104: 1081–1086 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wu Y, You H-L, Li X-Q (2018) Dinosaur-associated Poaceae epidermis and phytoliths from the Early Cretaceous of China. Natl Sci Rev 5: 721–727 [Google Scholar]
- Xu Z, Wang H (2007) LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res 35: W265–268 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yang Z (2007) PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol 24: 1586–1591 [DOI] [PubMed] [Google Scholar]
- Yu J, Hu S, Wang J, Wong GK, Li S, Liu B, Deng Y, Dai L, Zhou Y, Zhang X, et al. (2002) A draft sequence of the rice genome (Oryza sativa L. ssp. indica). Science 296: 79–92 [DOI] [PubMed] [Google Scholar]
- Zhang GQ, , Liu KW, , Li Z, , Lohaus R, , Hsiao YY, , Niu SC, , Wang JY, , Lin YC, , Xu Q, , Chen LJ, et al. (2017) The Apostasia genome and the evolution of orchids. Nature 549: 379–383 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhao M, Zhang B, Lisch D, Ma J (2017) Patterns and consequences of subgenome differentiation provide insights into the nature of paleopolyploidy in plants. Plant Cell 29: 2974–2994 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zwaenepoel A, Van de Peer Y (2019) wgd-simple command line tools for the analysis of ancient whole-genome duplications. Bioinformatics 35: 2153–2155 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.






