Skip to main content
Plant Physiology logoLink to Plant Physiology
. 2009 Jan;149(1):111–116. doi: 10.1104/pp.108.128926

Poaceae Genomes: Going from Unattainable to Becoming a Model Clade for Comparative Plant Genomics

C Robin Buell 1,*
PMCID: PMC2613712  PMID: 19005087

Genomics has an immense potential for improving our understanding of critical issues in plant growth and development, some of which can be applied to improvement of crop production. Midway into the second decade of genomics, genome and transcriptome sequencing efforts with the Poaceae are impressive given the technical and fiscal challenges presented by the typically large, repetitive genomes found within the Poaceae (Smith and Flavell, 1975; Arumuganathan and Earle, 1991; SanMiguel et al., 1996). Indeed, as of October 30, 2008, there are 10,847,522 Poaceae sequences representing 11,142 Mb (11.1 Gb) in GenBank, confirming the fast pace of sequence generation for the Poaceae. With respect to representation of genera and species within the Poaceae, 2,740 of the approximately 10,000 species reported within the Poaceae (http://www.kew.org/scihort/poaceae.html) have at least one sequence in GenBank. Genome-scale datasets represent a more narrow phylogenetic base and of the 47 Poaceae species with genome or transcriptome sequences (Table I; discussed below), 32 are derived from the BEP clade (Bambusoideae, Ehrhartoideae, Pooideae) and 15 are derived from the PACCMAD clade (Panicoideae, Arundinoideae, Chloridoideae, Centothecoideae, Micrairoideae, Aristidoideae, and Danthonioideae) and represent five subfamilies, nine tribes, and 24 genera within the Poaceae (Fig. 1). This article is intended to provide a short introduction to genome sequencing efforts in the Poaceae and an abbreviated report of completed and ongoing genome and transcriptome sequencing efforts for Poaceae species to not only demonstrate the potential of species within the Poaceae for understanding plant biological processes but also to highlight the Poaceae as a model family for comparative genomics.

Table I.

List of Poaceae species with transcriptome and genome sequence data and/or initiatives

Species Common Name Clade (Subfamily, Tribe) EST No.a Statusb Reference or URL
BEP clade
    Phyllostachys edulis Moso bamboo Bambusoideae, Bambuseae 2,141 ESTs
    Oryza alta Ehrhartoideae, Oryzeae 2 Physical map, BES http://www.omap.org/
    Oryza australiensis Ehrhartoideae, Oryzeae 0 Physical map, BES http://www.omap.org/
    Oryza brachyantha Ehrhartoideae, Oryzeae 0 Physical map, BES http://www.omap.org/
    Oryza coarctata Ehrhartoideae, Oryzeae 499 Physical map, BES http://www.omap.org/
    Oryza glaberrima African rice, Guang fu dao Ehrhartoideae, Oryzeae 0 Physical map, BES http://www.omap.org/
    Oryza granulata Yo li dao Ehrhartoideae, Oryzeae 144,859 Physical map, BES, ESTs http://www.omap.org/
    Oryza minuta Ehrhartoideae, Oryzeae 5,760 Physical map, BES, ESTs http://www.omap.org/
    Oryza nivara Ehrhartoideae, Oryzeae 0 Physical map, BES http://www.omap.org/
    Oryza officinalis Yao yong dao Ehrhartoideae, Oryzeae 1,471 Physical map, BES, ESTs http://www.omap.org/
    Oryza punctata Red rice Ehrhartoideae, Oryzeae 806 Physical map, BES http://www.omap.org/
    Oryza ridleyi Ehrhartoideae, Oryzeae 0 Physical map, BES http://www.omap.org/
    Oryza rufipogon Brownbeard rice, red rice Ehrhartoideae, Oryzeae 1 Physical map, BES http://www.omap.org/
    Ricec Rice, Dao Ehrhartoideae, Oryzeae 1,220,877 Resequencing, ESTs McNally et al. (2006); http://oryzasnp.plantbiology.msu.edu; http://irfgc.irri.org/index.php?option=com_content&task=view&id=14&Itemid=106
    Rice sp. indicac Rice, Dao Ehrhartoideae, Oryzeae WGS draft, ESTs Yu et al. (2002, 2005)
    Rice sp. japonicac Rice, Dao Ehrhartoideae, Oryzeae WGS draft, finished, ESTs Barry (2001); Goff et al. (2002); Yu et al. (2002, 2005); The International Rice Genome Sequencing Project (2005)
    Agrostis capillaris Waipu, Colonial bent grass Pooideae, Aveneae 7,743 ESTs
    Agrostis stolonifera Creeping bent grass Pooideae, Aveneae 9,114 ESTs
    Avena sativa Oat Pooideae, Aveneae 7,633 ESTs
    B. distachyon Purple false brome Pooideae, Brachypodieae 20,449 WGS in progress, ESTs http://www.jgi.doe.gov/sequencing/why/51281.html
    Festuca arundinacea Tall fescue Pooideae, Poeae 44,377 ESTs
    Lolium multiflorum Italian ryegrass Pooideae, Poeae 5,968 ESTs
    Lolium perenne Perennial ryegrass Pooideae, Poeae 1,816 ESTs
    Lolium temulentum Darnel Pooideae, Poeae 6,336 ESTs
    Puccinellia tenuiflora Alkali grass Pooideae, Poeae 4,252 ESTs
    Aegilops speltoides Goat grass Pooideae, Triticeae 4,315 ESTs
    Aegilops tauschii Tausch's goatgrass Pooideae, Triticeae 116 Physical map http://wheatdb.ucdavis.edu:8080/wheatdb/
    Hordeum vulgare Barley Pooideae, Triticeae 502,895 ESTs, Physical map http://barleygenome.org; http://phymap.ucdavis.edu:8080/barley/
    Leymus chinensis Pooideae, Triticeae 1,692 ESTs
    Leymus cinereus × Leymus triticoides Great Basin wild rye × Creeping wild rye Pooideae, Triticeae 28,786 ESTs
    Secale cereale Rye Pooideae, Triticeae 9,298 ESTs
    Wheat Bread wheat Pooideae, Triticeae 1,051,304 ESTs, Physical map http://www.wheatgenome.org/; http://urgi.versailles.inra.fr/projects/Triticum/eng/
    Triticum monococcum Einkorn wheat Pooideae, Triticeae 11,190 ESTs
    Triticum turgidum Rivet wheat, Poulard wheat Pooideae, Triticeae 30,874 ESTs
PACCMAD clade
    Cynodon dactylon Bermuda grass Chloridoideae, Cynodonteae 20,148 ESTs
    Eleusine coracana Finger millet Chloridoideae, Cynodonteae 1,749 ESTs
    Eragrostis curvula Weeping love grass Chloridoideae, Cynodonteae 12,295 ESTs
    Eragrostis tef Tef Chloridoideae, Cynodonteae 2,816 ESTs
    Spartina alterniflora Smooth cordgrass Chloridoideae, Cynodonteae 1,255 ESTs
    Saccharum hybrid cultivar Sugarcane Panicoideae, Andropogoneae 9,415 ESTs
    Saccharum officinarum Sugarcane Panicoideae, Andropogoneae 252,698 ESTs
    Sorghum Sorghum Panicoideae, Andropogoneae 209,814 Gene enrichment (complete), WGS draft (complete), ESTs; Physical map Bedell et al. (2005); http://www.jgi.doe.gov/sequencing/why/3060.html
    Sorghum halepense Johnson grass Panicoideae, Andropogoneae 1,965 ESTs
    Sorghum propinquum Panicoideae, Andropogoneae 20,881 ESTs
    Maize Maize, Corn Panicoideae, Andropogoneae 1,464,855 Gene enrichment (complete); Draft BAC by BAC with targeted finishing, ESTs; Physical map Whitelaw et al. (2003), Palmer et al. (2003); http://www.maizesequence.org/index.html
    Cenchrus ciliaris Buffelgrass, African foxtail grass Panicoideae, Paniceae 21,729 ESTs
    Pennisetum glaucum Pearl millet Panicoideae, Paniceae 2,848 ESTs
    Setaria italica Foxtail millet Panicoideae, Paniceae 2,124 Planned WGS, ESTs http://www.jgi.doe.gov/sequencing/why/99178.html
a

Numbers of ESTs were obtained from dbEST, October 17, 2008.

b

BES, BAC end sequence dataset. Resequencing refers to a project using hybridization-based resequencing to identify single nucleotide polymorphisms in 20 rice accessions.

c

All rice ESTs are combined and reported as rice rather than separating them into indica and japonica.

Figure 1.

Figure 1.

Phylogeny of grasses for which genome sequence data has been or will be generated in the near future. Asterisk indicates one poorly supported node. Polyploid species for which genome relationships are known are shown to the right of the diploids, with lines indicating ancestry. Most are tetraploid but one (wheat [T. aestivum]) is hexaploid.

EXPRESSED SEQUENCE TAGS: THE START OF THE GENOMICS ERA IN THE POACEAE

The first genome-scale sequences generated for the Poaceae were ESTs that represent the transcribed portion of a genome and provide a rapid, economic approach to sampling the gene space of an organism. As early access sequence datasets, ESTs can be used for (1) development of genetic markers (for example, see Harushima et al., 1998), (2) electronic gene expression analyses (for example, see Ewing et al., 1999), (3) improvement of structural annotation (Haas et al., 2003), and (4) functional genomic resources for use in overexpression, in vitro expression, and gene-silencing studies. In 1997, the first set of ESTs for a Poaceae species was reported for rice (Oryza sativa; Yamamoto and Sasaki, 1997). Now, 11 years later, there are 36 Poaceae species with EST collections greater than 1,000 sequences (Table I); of these, >1 million ESTs are available for three Poaceae species (maize [Zea mays], rice, wheat [Triticum aestivum]). Collectively, as of October 30, 2008, within the dbEST division of GenBank, there were 5,491,939 Poaceae EST sequences totaling 2,764 Mb (2.76 Gb). With the high degree of repetitive sequences within the majority of Poaceae species, coupled with the availability of high-throughput next generation sequencing platforms for transcriptome sequencing (Cloonan et al., 2008; Rosenkranz et al., 2008), ESTs will continue to provide a rich source of genic sequences for grass researchers and it should be fully envisioned that within the coming years, EST collections will be available for thousands of Poaceae species.

GETTING AT THE GENOME: GENOME SEQUENCES

With continued advancements in technology and concomitant reductions in costs over the last decade, whole-genome sequences have been generated for multiple species within the Poaceae (Table I). Rice was not only the first crop species but also the second plant species with a genome sequence (Barry, 2001; Goff et al., 2002; Yu et al., 2002; The International Rice Genome Sequencing Project, 2005; Yu et al., 2005). Currently, genome sequence is available for two subspecies of rice, indica and japonica. Perhaps most importantly for all current and future Poaceae genome sequences, a high-quality, near-complete genome sequence is available for the Nipponbare cultivar of japonica rice (The International Rice Genome Sequencing Project, 2005) that will most likely provide the only gold standard reference genome for the Poaceae for the near future. Indeed, the reference Nipponbare sequence was used to resequence a set of 20 rice lines using hybridization-based sequencing to identify single nucleotide polymorphisms (McNally et al., 2006; http://irfgc.irri.org/index.php?option=com_content&task=view&id=14&Itemid=106; http://oryzasnp.plantbiology.msu.edu/). Draft genome sequences are also available for sorghum (Sorghum bicolor), Brachypodium distachyon, and maize with analyses and full public release anticipated in 2009 (Table I). A genome project is planned for foxtail millet (Setaria italica) by the U.S. Department of Energy Joint Genome Institute (Table I). Physical resources in the form of bacterial artificial chromosome (BAC) map clones have been developed for a number of Poaceae species in advance of genome sequencing (Table I). Most notably, the OMAP project (Kim et al., 2008) has generated physical maps and BAC end sequences for 12 Oryza species in support of comparative genomics within this important genus.

In addition to transcriptome and whole-genome sequences, large sets of genomic sequences are available within the GSS, HTG, WGS, and PLN divisions of GenBank. Within the GSS division, which includes gene enrichment as well as BAC end sequences, 5,072,454 Poaceae sequences (3,337 Mb) are available. Although the maize and sorghum gene enrichment sequences within the GSS division have now been superceded by draft genome sequences, the gene enrichment approaches of methylation filtration and high Cot were highly successful in generating genic sequences for maize and sorghum, thereby providing early access to the gene space (Palmer et al., 2003; Whitelaw et al., 2003; Bedell et al., 2005). Other Poaceae sequences in GenBank include 179,196 sequences (1,169 Mb) within the PLN division, 18,655 sequences (3,086 Mb) within the HTG division, and 85,280 sequences (785 Mb) within the WGS division.

It should be noted that the majority of the sequence available currently for the Poaceae are derived from a few species of high agricultural importance (rice, maize, wheat, and sorghum). As shown in Figure 2, although 13 of the 47 species with genome-scale datasets, resources, or initiatives listed in Table I have >100 Mb of total sequence in GenBank, three-quarters of the sequence are from maize or Oryza species reflective of the heavy bias in Poaceae genome sequencing projects to date. However, with access to the next generation of genome sequencing technologies (Margulies et al., 2005; Holt et al., 2008; Sarin et al., 2008), it can certainly be envisaged that researchers will have access to dozens of Poaceae genomes in the near future. Furthermore, application of these next generation sequencing technologies along with techniques to enrich for subfractions of the genome (Albert et al., 2007; Hodges et al., 2007; Okou et al., 2007) will greatly enhance resequencing of additional cultivars or accessions, thereby providing an unlimited set of sequence resources to examine genome diversity at the species level.

Figure 2.

Figure 2.

Genome sequence availability for 47 Poaceae species with genome projects. Sequence was downloaded for all 47 species from GenBank (October, 2008) and summed for all divisions. Thirteen species are represented individually in the pie chart; sequence for 34 species with less than 100 Mb of total sequence in GenBank were grouped into Other.

Certainly, this is an exciting time to be engaged in Poaceae research as even if genomics is not your research discipline, access to not just one but multiple Poaceae genome sequences provides not only a robust set of resources for biological inquiries, but also provides a perspective of gene function in a phylogenetic context. With this deluge of genomic sequence data, the storage, handling, analysis, and use of the large-scale genomic sequence and annotation data becomes problematic for most researchers. Consequently, resources, databases, and analyses tools need to be developed to ensure these genome datasets can be used in a feasible and intelligent manner, thereby maximizing the return on the investment of obtaining the genome sequence. Certainly, Poaceae researchers are not alone in forging a path through the morass of genome sequence data in the early 21st century, and the tools, resources, software, and knowledge gained from other genomic research endeavors throughout the Tree of Life will be instrumental in obtaining a full understanding of the pan-Poaceae genome.

Acknowledgments

Efforts in phylogenetic tree construction by E. Kellogg are greatly appreciated. Work on rice genomics was supported by the National Science Foundation (grant nos. DBI–0321538 and DBI–0834043 to C.R.B.).

The author responsible for distribution of materials integral to the findings presented in this article in accordance with the policy described in the Instructions for Authors (www.plantphysiol.org) is: C. Robin Buell (buell@msu.edu).

References

  1. Albert TJ, Molla MN, Muzny DM, Nazareth L, Wheeler D, Song X, Richmond TA, Middle CM, Rodesch MJ, Packard CJ, et al (2007) Direct selection of human genomic loci by microarray hybridization. Nat Methods 4 903–905 [DOI] [PubMed] [Google Scholar]
  2. Arumuganathan K, Earle E (1991) Nuclear DNA content of some important plant species. Plant Mol Biol Rep 9 208–218 [Google Scholar]
  3. Barry GF (2001) The use of the Monsanto draft rice genome sequence in research. Plant Physiol 125 1164–1165 [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Bedell JA, Budiman MA, Nunberg A, Citek RW, Robbins D, Jones J, Flick E, Rholfing T, Fries J, Bradford K, et al (2005) Sorghum genome sequencing by methylation filtration. PLoS Biol 3 e13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Cloonan N, Forrest AR, Kolle G, Gardiner BB, Faulkner GJ, Brown MK, Taylor DF, Steptoe AL, Wani S, Bethel G, et al (2008) Stem cell transcriptome profiling via massive-scale mRNA sequencing. Nat Methods 5 613–619 [DOI] [PubMed] [Google Scholar]
  6. Ewing RM, Ben Kahla A, Poirot O, Lopez F, Audic S, Claverie JM (1999) Large-scale statistical analyses of rice ESTs reveal correlated patterns of gene expression. Genome Res 9 950–959 [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Goff SA, Ricke D, Lan TH, Presting G, Wang R, Dunn M, Glazebrook J, Sessions A, Oeller P, Varma H, et al (2002) A draft sequence of the rice genome (Oryza sativa L. ssp. japonica). Science 296 92–100 [DOI] [PubMed] [Google Scholar]
  8. Haas BJ, Delcher AL, Mount SM, Wortman JR, Smith RK Jr, Hannick LI, Maiti R, Ronning CM, Rusch DB, Town CD, et al (2003) Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res 31 5654–5666 [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Harushima Y, Yano M, Shomura A, Sato M, Shimano T, Kuboki Y, Yamamoto T, Lin SY, Antonio BA, Parco A, et al (1998) A high-density rice genetic linkage map with 2275 markers using a single F2 population. Genetics 148 479–494 [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Hodges E, Xuan Z, Balija V, Kramer M, Molla MN, Smith SW, Middle CM, Rodesch MJ, Albert TJ, Hannon GJ, et al (2007) Genome-wide in situ exon capture for selective resequencing. Nat Genet 39 1522–1527 [DOI] [PubMed] [Google Scholar]
  11. Holt KE, Parkhill J, Mazzoni CJ, Roumagnac P, Weill FX, Goodhead I, Rance R, Baker S, Maskell DJ, Wain J, et al (2008) High-throughput sequencing provides insights into genome variation and evolution in Salmonella typhi. Nat Genet 40 987–993 [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Kim H, Hurwitz B, Yu Y, Collura K, Gill N, SanMiguel P, Mullikin JC, Maher C, Nelson W, Wissotski M, et al (2008) Construction, alignment and analysis of twelve framework physical maps that represent the ten genome types of the genus Oryza. Genome Biol 9 R45. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Margulies M, Egholm M, Altman WE, Attiya S, Bader JS, Bemben LA, Berka J, Braverman MS, Chen YJ, Chen Z, et al (2005) Genome sequencing in microfabricated high-density picolitre reactors. Nature 437 376–380 [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. McNally KL, Bruskiewich R, Mackill D, Buell CR, Leach JE, Leung H (2006) Sequencing multiple and diverse rice varieties: connecting whole-genome variation with phenotypes. Plant Physiol 141 26–31 [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Okou DT, Steinberg KM, Middle C, Cutler DJ, Albert TJ, Zwick ME (2007) Microarray-based genomic selection for high-throughput resequencing. Nat Methods 4 907–909 [DOI] [PubMed] [Google Scholar]
  16. Palmer LE, Rabinowicz PD, O'Shaughnessy AL, Balija VS, Nascimento LU, Dike S, de la Bastide M, Martienssen RA, McCombie WR (2003) Maize genome sequencing by methylation filtration. Science 302 2115–2117 [DOI] [PubMed] [Google Scholar]
  17. Rosenkranz R, Borodina T, Lehrach H, Himmelbauer H (2008) Characterizing the mouse ES cell transcriptome with Illumina sequencing. Genomics 92 187–194 [DOI] [PubMed] [Google Scholar]
  18. SanMiguel P, Tikhonov A, Jin YK, Motchoulskaia N, Zakharov D, Melake-Berhan A, Springer PS, Edwards KJ, Lee M, Avramova Z, et al (1996) Nested retrotransposons in the intergenic regions of the maize genome. Science 274 765–768 [DOI] [PubMed] [Google Scholar]
  19. Sarin S, Prabhu S, O'Meara MM, Pe'er I, Hobert O (2008) Caenorhabditis elegans mutant allele identification by whole-genome sequencing. Nat Methods 5 865–867 [DOI] [PMC free article] [PubMed]
  20. Smith DB, Flavell RB (1975) Characterisation of the wheat genome by renaturation kinetics. Chromosoma 50 223–242 [Google Scholar]
  21. The International Rice Genome Sequencing Project (2005) The map-based sequence of the rice genome. Nature 436 793–800 [DOI] [PubMed] [Google Scholar]
  22. Whitelaw CA, Barbazuk WB, Pertea G, Chan AP, Cheung F, Lee Y, Zheng L, van Heeringen S, Karamycheva S, Bennetzen JL, et al (2003) Enrichment of gene-coding sequences in maize by genome filtration. Science 302 2118–2120 [DOI] [PubMed] [Google Scholar]
  23. Yamamoto K, Sasaki T (1997) Large-scale EST sequencing in rice. Plant Mol Biol 35 135–144 [PubMed] [Google Scholar]
  24. Yu J, Hu S, Wang J, Wong GK, Li S, Liu B, Deng Y, Dai L, Zhou Y, Zhang X, et al (2002) A draft sequence of the rice genome (Oryza sativa L. ssp. indica). Science 296 79–92 [DOI] [PubMed] [Google Scholar]
  25. Yu J, Wang J, Lin W, Li S, Li H, Zhou J, Ni P, Dong W, Hu S, Zeng C, et al (2005) The genomes of Oryza sativa: a history of duplications. PLoS Biol 3 e38. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Plant Physiology are provided here courtesy of Oxford University Press

RESOURCES