Skip to main content
Plant Physiology logoLink to Plant Physiology
. 2004 Jun;135(2):602–606. doi: 10.1104/pp.104.043216

Positioning Arabidopsis in Plant Biology. A Key Step Toward Unification of Plant Research1

Michael Bevan 1,*, Sean Walsh 1
PMCID: PMC514094  PMID: 15208407

Abstract

One of the major challenges in biological investigation involves developing a robust predictive framework in which biological outputs can be predicted from input data and knowledge of the state of the system. Currently, genomics-based strategies provide a strong framework for integrating biological knowledge within a species and linking knowledge between diverse organisms, as DNA sequence is a durable, accurate, and complete record of biological information. As such, it provides the best source of information upon which predictive rules can start to be built, tested, and generalized. Generalization is a key component of predictive biology because it defines the extent to which we can accurately predict from one instance to another. In plant science, several important research themes are concerned with generalization, and progress in these areas is reviewed here. The importance of developing a framework for predictive biology that includes a much wider variety of plant species is also emphasized.

THE ARABIDOPSIS SYSTEM

Establishing a weedy species with no economic value, such as Arabidopsis as a model system for plant biology, was a major step toward unification in plant biology. Prior to this, plant research was spread among a wide variety of species that had particular advantages, either technical or biological, that enabled progress. By deciding to adopt their research goals, for example in plant pathology, to the new Arabidopsis system, some plant scientists took a key step toward generalizing their work, permitting findings in diverse areas to be integrated. An illustrative example is the convergence of knowledge about signal transduction pathways in biotic and abiotic responses (Kwak et al., 2003). The complete genome sequence of Arabidopsis and subsequent analyses and refinement has further focused attention on the Arabidopsis model because of the relative ease of gene identification. The availability of insertion alleles for a vast majority of genes (Table I) and prefabricated microarrays have provided plant researchers with unprecedented opportunities for new discovery and for integrating cellular functions. Furthermore, the Columbia ecotype genome sequence has accelerated exploitation of natural variation in many other Arabidopsis ecotypes, thus drawing a wide range of adaptive phenotypes into the orbit of this model. It is instructive at this stage of developing the Arabidopsis model to consider the main gravitational forces that focus an ever-increasing range of investigation onto Arabidopsis and to identify the centripetal forces working to break up the Arabidopsis system.

Table I.

Status of functional genomics resources in Arabidopsis

Feature Count
Total genes (TIGR version 5 annotation) 29,993
Total insertion element flanking sequences mapping to genome 289,613
Total protein-coding genes 26,207
Total protein-coding genes with ≥1 insertion within exon or intron 22,310 (85%)
Total protein-coding genes with no insertions 3,906 (15%)

This data was extracted from www.atidb.org.

The main gravitational force at the moment is the dark matter of the Arabidopsis genome, by which we mean the many aspects of the organism revealed by genome sequence of which we now know we are almost entirely ignorant. These include the vast number of genes with unknown functions, the fully sequenced heterochromatic regions, chromosome dynamics, the mechanisms of genome evolution, and the mysterious rules governing gene expression. The evolving web of databases, functional genomics data, modeling strategies, stock centers, and the informal exchange of material and ideas also binds the Arabidopsis system together. This combination of experimental capability directed at important questions that can currently only be addressed in Arabidopsis suggests we are witnessing just the early accretion events of the Arabidopsis system. However, several centripetal forces are at work that may weaken the fabric of this system and reduce the potential for discovery. One concern is generating and capturing experimental data that is comparable across experiments, which is crucial for increasing the potential for integrating knowledge and making generalizations. Another concern is the extent to which knowledge gained in Arabidopsis can be unified with that generated in other organisms, particularly crop plants. How far can this go, and what do we have to do to link the Arabidopsis system to others?

DESCRIBING GENE FUNCTION

Identification of the complete set of Arabidopsis proteins provides another opportunity for generalization, by naming and classifying genes in a consistent and informative way.

The Arabidopsis Genome Initiative code was implemented (Mayer et al., 1999) to give each gene a unique identifier and define its relative position. This code is very useful for searching and tracking gene-related information but has no further information content. Most gene names have been propagated by similarity of their encoded proteins to other proteins that have names. This propagation is informative when the overall similarity is significant and the original name describes related functions in each organism. However, propagated names are generally either misleading or profoundly uninformative (e.g. predicted protein related to unknown protein). This ad hoc system of gene naming is being replaced by systematic names based on the principles of molecular function, biological process, and cellular component (Gene Ontology Consortium, 2000). These Gene Ontology classifications, which are being applied to Arabidopsis predicted proteins, are a strong unifying concept in genome analysis both within Arabidopsis and between plant species. Plant Ontology and Trait Ontology also establish controlled vocabularies describing plant anatomy and development, and plant traits and phenotypes (see www.gramene.org/plant_ontology/). Currently, a dedicated group is annotating Arabidopsis and other plant proteins centrally (ftp://ftp.arabidopsis.org/home/tair/Genes/Gene_Ontology/). But how will the plant community deal with the vast amount of data being generated in large-scale functional genomics projects and by individual investigators, both in Arabidopsis and increasingly in other species? Failure to deal with this issue now will eventually weaken the impact of post-genomics research in Arabidopsis and other species and cause the loss of many of the benefits accrued from systematic generation and distribution of genomics and functional genomics data.

There are several ways to improve the reverse flow of data out of individual laboratories once it is publicly available. One way is to deploy informatics tools that standardize and distribute information. The BioMOBY consortium (www.BioMOBY.org) creates Web services that register information in databases (even small laboratory databases) with a central register called MOBY-Central that acts as a directory for information. This registration provides seamless access to biological data and forms the foundation for several plant functional genomics databases (e.g. PlaNet, http://mips.gsf.de/proj/planet/). Another way to promote information flow and ensure systematic gene classification is to capture it during the review and publication process. For example, keywords could be replaced with less arbitrary GO, PO, and TO terms to describe gene function, the sites of gene expression, and phenotypes resulting from mutations. Also, we must find ways of ensuring comparability between different data sets. Current methods based on paper publication are generally inadequate. Information represented this way can be directly captured by central databases and associated with relevant datasets to generate up-to-date, integrated, rich datasets. This strategy for data capture should appeal to journal editors who may wish to impart a greater degree of objective justification in the review process. At this stage it is not clear how receptive journals and their editors are to either considering these changes or coordinating their introduction. At a time when the printed media is under justifiable threat due to cost, fair access, and capacity for change, the additional charge of not actually contributing to the reliable exchange of data may promote the needed revolution in publication methods.

RELATIONSHIPS BETWEEN CELLULAR PROCESSES IN ARABIDOPSIS AND OTHER ORGANISMS

The benefits of using Arabidopsis as a model system include the superb technical advantages and genomics resources that permit rapid and systematic progress. The extent to which knowledge of cellular processes in one plant can be generalized to different species, especially crops, has yet to be thoroughly tested due to insufficient information, but most evidence reveals the value of these comparisons (see below). Comparison of Arabidopsis and the complete sequence of rice (Oryza sativa) chromosome 10 indicates about 70% of rice genes have conserved (E < −5) counterparts in Arabidopsis (The Rice Chromosome 10 Sequencing Consortium, 2003). Due to the evolutionary distance between the monocots and dicots, this figure probably represents a maximum overall difference in gene content between most flowering plants. Therefore gene discovery in Arabidopsis will generate a comprehensive knowledge of most common cellular process in plants. But some common processes that have multiple evolutionary origins, or specific processes that have evolved in different groups of plants, will require gene discovery in specific systems.

Two strategies can therefore be taken: one to push forward with parallel gene discovery in Arabidopsis and crop species, and another that aims to try and understand more comprehensively the evolution of plant genes, genomes, and gene functions. To underpin the latter strategy, sequencing of key species representing important nodes in plant evolution has been proposed (Pryer et al., 2002). This broader sampling of taxa may help reconstruct a generalized ancestral genome structure and also strongly aid taxonomy, which is also needed to identify and conserve plant biodiversity (Wheeler et al., 2004). The sequencing of several groups of plants is now under way or in an advanced state of planning. These (Table II) include, for the first time, a member of the asterid family, tomato (Lycopersicon esculentum), which will complement the five rosid species being sequenced. Thus, comparative genomics exerts a powerful gravitational force holding the Arabidopsis system to that of other plants.

Table II.

Status of plant genome sequencing projects

Group Organism Expressed Sequence Tag Genomic
Asterids Lycopersicon + +
Nicotiana +
Solanum +
Rosids Arabidopsis + +
Brassica +
Gossypium +
Glycine +
Lotus + +
Medicago + +
Populus + +
Monocots Hordeum +
Oryza + +
Sorghum +
Triticum +
Zea + +
Coniferidra Pinus +
Bryomorpha Physcomitrella +
Volvocales Chlamydomonas + +

Before discussing selected examples of the extent of generalization from Arabidopsis to crop species, it is worthwhile looking at the relevance of knowledge obtained in Arabidopsis to nonplant species. Two (out of many) examples suggest plant research contributes new discoveries and concepts relevant to human health research. Extensive heterochromatic regions of Arabidopsis have been completely sequenced, and analysis of the imposition and maintenance of DNA methylation in these regions contributes to related work in mammalian systems (Gendrel et al., 2002). The DET and COP gene classes were first characterized as regulators of photomorphogenesis in Arabidopsis (Pepper et al., 1994; Wei and Deng, 1999), and their discovery in plants has led to the characterization of human orthologs (somewhat alarmingly called human de-etiolated-1) that constitute a ubiquitin ligase complex that degrades c-jun (Wertz et al., 2004).

The starchy endosperm of cereal seeds is one of the most obvious differences between Arabidopsis and cereals. In cereal seeds, it is the major site of reserve deposition, while in Arabidopsis seeds it is a transient tissue. Despite these differences, the early stages of syncytial growth, cellularization, and storage deposition are regulated by similar genes in cereals and Arabidopsis, and significant parallels have been discovered between the control of early embryo formation in Drosophila and maternal control by epigenetic mechanisms in Arabidopsis and maize (Zea mays) endosperm formation (Berger, 2003).

The molecular mechanisms controlling photoperiod responses, such as flowering time, involve the transduction of environmental signals to a circadian oscillator and outputs from the oscillator that control multiple cellular processes. The flowering time of plants has evolved to adapt to different conditions; temperate plants often require extended periods of long days to induce flowering, while tropical plants generally flower in short day conditions. Molecular analysis of flowering time responses in Arabidopsis, a long day plant, and rice, a short day plant, show remarkable conservation of the input and output control mechanisms governing flowering time, despite the different responses of these two plants (Hayama and Coupland, 2003). In rice, QTL analysis identified several heading-date genes that encode proteins similar to key Arabidopsis output regulators CO, FT, and CK2. Input regulators such as GI and phytochome are also functionally conserved in rice and Arabidopsis (Yano et al., 2001). These similarities permit rapid progress toward understanding how similar components generate different photoperiod responses in tropical cereals and Arabidopsis and establishing how photoperiodic responses evolved in temperate cereals (Griffiths et al., 2003).

The mechanisms regulating the vernalization response are being defined in Arabidopsis and wheat (Triticum aestivum) and provide another good comparison of the evolution of regulatory networks in diverse plants. In Arabidopsis, an extended cold period promotes flowering by epigenetically mediated down-regulation of the floral repressor FLC, which encodes a MADs box transcription factor (Henderson et al., 2003). The analogous gene in wheat, VRN2, is also down-regulated by an extended cold period and has recently been cloned. It is unrelated to FLC and encodes a Zn-finger protein containing the CCT domains conserved in the Constans protein family (Yan et al., 2004). A MADS box gene highly homologous to the Arabidopsis AP1 gene is a candidate gene for the wheat Vrn1 locus, which confers a vernalization requirement (Yan et al., 2003). The Arabidopsis AP1 gene has not been associated with the vernalization response in Arabidopsis. These comparisons demonstrate unrelated genes perform related functions in wheat and Arabidopsis and suggest this may be due to an independent evolutionary origin of vernalization requirement mechanisms in wheat and Arabidopsis. This is supported by the proposed evolutionary origin of temperate cereals from tropical ancestors after the monocot-dicot divergence.

Another example of evolutionary distance leading to marked differences between Arabidopsis and other plants involves symbiotic interactions such as the formation of arbuscular mycorrhizae and N fixing root nodules, which do not occur in members of the Brassicaceae family such as Arabidopsis. To what extent can Arabidopsis genomics contribute to understanding symbiotic interactions? Studies in the model legume species Lotus japonicus and Medicago truncatula, and in pea (Pisum sativum), are identifying genes required for different steps in the nodulation process. Early responses to nodulation factors such as Ca2+ spiking require Leu-rich repeat receptor-like kinases, as do later responses regulating nodule numbers (Limpens and Bisseling, 2003). Extensive analysis in Arabidopsis of the function of this class of protein in responses to growth regulators and pathogens provides a strong framework for dissecting their role in early nodulation events. Similarly, knowledge of other cellular processes involved in nodulation, including Ca2+-mediated signaling, polar cell growth, and trimeric G protein signaling gained in Arabidopsis will help interpret the function of nodulation genes. The Medicago DMI3 gene acts downstream of nodulin response mutants and modifies Ca spiking. It encodes a calcium and calmodulin-dependent protein kinase, which are widespread in many plant groups but not in Arabidopsis (Levy et al., 2004). Perhaps Arabidopsis can be used to reconstruct some of the components of nodulation, such as the early signaling events by transformation with the missing classes of genes? Comparing the predicted proteomes of Arabidopsis, Lotus and Medicago, coupled with gene expression studies, may reveal other classes of proteins absent from Arabidopsis that may be implicated in establishing symbiotic relationships.

Finally, the tiny spindly Arabidopsis plant appears to have very little to offer our understanding of wood formation in trees. In fact, Arabidopsis inflorescence stems undergo secondary thickening, and a bona fide cambium forms. Many genes expressed during secondary thickening and associated with cambial activity are highly conserved, consistent with the relatively close evolutionary relationship between Arabidopsis and Populus (Hertzberg et al., 2001). The identification of many genes involved in cell wall formation, lignification, and vascular cell identity in Arabidopsis suggests that Arabidopsis functional genomics, linked with appropriate studies in tree species, will create new opportunities for a key biotechnology sector.

These examples typify a wide range of work on Arabidopsis and suggest that comparative approaches linking the rapidly expanding capabilities of Arabidopsis genomics with more specific and applied research goals in other species currently provide excellent opportunities for rapid progress and the generalization of knowledge of plant biology.

NEW CHALLENGES FOR PLANT SCIENCE

Many areas of plant research have benefited from the thoroughly annotated Arabidopsis genome sequence, and the generation and rapid public release of insertion lines for nearly all genes (Table I) has further accelerated the rate and scope of plant research. These resources strengthen the central role of Arabidopsis in structuring the current research landscape. Clearly, the emerging genome sequence of a wide variety of other plants (Table II) will have a similar galvanizing effect on research associated with these plants. Scientists will soon be able to establish the most creative blend of work in different species (with Arabidopsis being involved in many studies) to address their biological process of interest. But if this strategy is to generate a consistent and integrated data set concerning plant gene function, and if it is to have the potential to merge into detailed mechanistic models of cellular processes, we have to find better ways of capturing data and linking it in meaningful ways.

Our current methods of data distribution, through peer-reviewed journals, for example, appear to be a major limitation in attempts to structure complex functional genomics data. Once these traditional information distribution systems have caught up with the current state of plant science, we can rise to meet new challenges. These include far greater use of plants and plant processes for addressing the great societal challenges of the 21st century: establishing renewable resources for manufacturing and energy production, the identification and protection of biodiversity, and establishing globally sustainable food production.

1

This work was supported by the European Commission (grant no. QLRI–CT–2001–00006 PlaNet to S.W. and M.B.).

References

  1. Berger F (2003) Endosperm: the crossroad of seed development. Curr Opin Plant Biol 6: 42–50 [DOI] [PubMed] [Google Scholar]
  2. Gendrel A-V, Lippman Z, Yordan C, Colot V, Martienssen RA (2002) Dependence of heterochromatic histone H3 methylation patterns on the Arabidopsis gene DDM1. Science 297: 1871–1873 [DOI] [PubMed] [Google Scholar]
  3. Gene Ontology Consortium (2000) Gene Ontology: tool for the unification of biology. Nat Genet 25: 25–28 [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Griffiths S, Dunford RP, Coupland G, Laurie DA (2003) The evolution of CONSTANS-like gene families in barley, rice and Arabidopsis. Plant Physiol 131: 1855–1867 [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Hayama R, Coupland G (2003) Shedding light on the circadian clock and the photoperiodic control of flowering. Curr Opin Plant Biol 6: 13–19 [DOI] [PubMed] [Google Scholar]
  6. Henderson IR, Shindo C, Dean C (2003) The need for winter in the switch to flowering. Annu Rev Genet 37: 371–392 [DOI] [PubMed] [Google Scholar]
  7. Hertzberg M, Aspeborg H, Schrader J, Andersson A, Erlandsson R, Blomqvist K, Bhalerao R, Uhlen M, Teri T, Lunderberg J, et al (2001) A transcriptional roadmap to wood formation. Proc Natl Acad Sci USA 98: 14732–14737 [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Kwak JM, Mori IC, Pei Z-M, Leonhardt N, Torres MA, Dangl JL, Bloom RE, Bodde S, Jones JDG, Schroeder JI (2003) NADPH oxidase AtrbohD and AtrbohF genes function in ROS-dependent ABA signaling in Arabidopsis. EMBO J 22: 2323–2333 [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Levy J, Bres C, Geurts R, Chaloub B, Kulikova O, Duc G, Journet E-P, Ane J-M, Lauber E, Bisseling T, et al (2004) A putative Ca and calmodulin-dependent protein kinase required for bacterial and fungal symbioses. Science 303: 1361–1364 [DOI] [PubMed] [Google Scholar]
  10. Limpens E, BisselingT (2003) Signalling in symbiosis. Curr Opin Plant Biol 6: 343–350 [DOI] [PubMed] [Google Scholar]
  11. Mayer K, Schuller C, Wambutt R, Murphy G, Volckaert G, Pohl T, Dusterhoft A, Stiekema W, Entian KD, Terryn N, et al (1999) Sequence and analysis of chromosome 4 of Arabidopsis thaliana. Nature 402: 769–777 [DOI] [PubMed] [Google Scholar]
  12. Pepper A, Delaney T, Washburn T, Poole D, Chory J (1994) DET1, a negative regulator of light-mediated development and gene expression in Arabidopsis, encodes a novel nuclear-localized protein. Cell 78: 109–116 [DOI] [PubMed] [Google Scholar]
  13. Pryer KM, Schneider H, Zimmer EA, Banks JA (2002) Deciding among green plants for whole genome studies. Trends Plant Sci 7: 550–554 [DOI] [PubMed] [Google Scholar]
  14. The Rice Chromosome 10 Sequencing Consortium (2003) In-depth view of structure, activity and evolution of rice chromosome 10. Science 300: 1566–1569 [DOI] [PubMed] [Google Scholar]
  15. Wei N, Deng XW (1999) Making sense of the COP9 signalosome. A regulatory protein complex conserved from Arabidopsis to human. Trends Genet 15: 98–103 [DOI] [PubMed] [Google Scholar]
  16. Wertz IE, O'Rourke KM, Zhang Z, Dornan D, Arnott D, Deshaies RJ, Dixit VM (2004) Human De-etiolated-1 regulates c-Jun by assembling a CUL4A ubiquitin ligase. Science 303: 1371–1374 [DOI] [PubMed] [Google Scholar]
  17. Wheeler QD, Raven PH, Wilson EO (2004) Taxonomy: impediment or expedient? Science 303: 285. [DOI] [PubMed] [Google Scholar]
  18. Yan L, Loukoianov A, Blechl A, Tranquilli G, Ramakrishna W, SanMiguel P, Bennetzen JL, Echenique V, Dubcovsky J (2004) The wheat VRN2 gene is a flowering repressor down-regulated by vernalization. Science 303: 1640–1644 [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Yan L, Loukoianov A, Tranquilli G, Helguera M, Fahima T, Dubcovsky J (2003) Positional cloning of wheat vernalisation gene VRN1. Proc Natl Acad Sci USA 100: 6263–6268 [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Yano M, Kojima S, Takahashi Y, Lin H, Sasaki T (2001) Genetic control of flowering time in rice, a short day plant. Plant Physiol 127: 1425–1429 [PMC free article] [PubMed] [Google Scholar]

Articles from Plant Physiology are provided here courtesy of Oxford University Press

RESOURCES