Abstract
The HOX Pro database contains information about the organization, function and evolution of gene ensembles, notably the homeobox-containing genes. It is now clear that a subset of genes containing the homeobox motif play key roles in the orchestration of genes which control embryonic patterning, morphogenesis, cell differentiation and malignant transformation. The HOX Pro contains a broad spectrum of information including images, diagrams and animations. Currently this amounts to approximately 700 HTML pages together with 400 images which contain information on 200 groups of genes and 90 promoters, in turn linked to maps of 13 HOX clusters and nine genetic networks. There are about 700 sequences of individual hox-genes of animals classified in approximately 200 homologous or paralogous groups. Graphical representation of HOX clusters and Hox-based networks is accomplished by means of flow and 3D diagrams, JavaScript animations and Java applets. The HOX Pro now includes sections presenting data mining and data simulation issues. The DB is located at http://www.iephb.nw.ru/hoxpro.
INTRODUCTION
The HOX Pro database contains information about the organization, function and evolution of gene ensembles, among which the key players are the homeobox containing genes. These are involved in networks of genes which control embryonic patterning, morphogenesis, cell differentiation and malignant transformation. For these reasons, homeobox containing genes are a natural choice for the subject matter of a database concerned with gene function in development at multiple levels.
The HOX Pro database is aimed at:
Analysis and classification of regulatory and coding regions in diverse homeobox and related genes.
Description of mutations and knock-outs of hox genes, as well as hereditary diseases related to these genes.
Graphical representation, comparisons and classification of the hox gene expression patterns and profiles in sea urchin blastula, Drosophila blastoderm and imaginal discs, vertebrate limbs, mammalian brain and human cell cultures.
Comparative analysis of ‘hox-based’ genetic networks in the nematode Caenorhabditis elegans, the sea urchins Strongylocentrotus purpuratus and Paracentrotus lividus, the fruit flies Drosophila melanogaster and Drosophila virilis, and certain vertebrates.
Analysis of the phylogeny and evolution of homeobox genes and clusters.
A previous publication (1) was concerned with the concept of the database, its organization and the structural and evolutionary aspects of hox clusters. In that work we discussed the division of the database into structural and functional components. Structural entries are the larger component, and we have systematically updated the structural content of the HOX Pro. Today about 700 sequences of individual animal hox genes are known, classified into approximately 200 homologous or paralogous groups. Altogether there are approximately 700 HTML pages together with 400 images which contain information on these 200 groups of genes and 90 promoters, which are in turn linked to maps of 13 HOX clusters and nine genetic networks.
In this article we shall be concerned with new materials added to the database which describe the function of hox genes and their genetic networks. For the past 2 years we have entered and processed data concerning the functional genomics of hox genes. Hence this publication deals with functional aspects of the hox ensembles. To represent patterns of gene expression, new graphic tools were developed. Our approach to graphic representation by these tools will be illustrated by data on expression of hox genes in early development of sea urchins and Drosophila.
The storage and presentation of gene expression data in unified graphic form makes the comparison of patterns of expression very effective and pictorial (2). However, the input, storage, processing and presentation of the data in the form of three-dimensional (3D) graphical models demand considerable preliminary effort in building such models. On the other hand, they permit the early embryogenesis of the sea urchin and fruit fly to be presented in detail. Examples of such tools are presented in the HOX Pro DB.
FUNCTIONAL GENOMICS OF HOX-ENSEMBLES
In the last two decades the hox genes have been studied as the key players in ensembles of the genes which control embryonic development. Therefore, the literature contains considerable data on the expression of these genes in normal embryos, in cell cultures, during regeneration and in mutants. This level of scientific attention to these ensembles of genes has revealed their role in certain heritable abnormalities of limbs, eye and ear of humans or laboratory mammals. Mutations in the coding and regulatory sequences of hox genes have been accumulated, and data on knock-outs of these genes has been compiled. Other new results concern polymorphisms of hox genes, both in exons and enhancers. All this material is now reviewed in the HOX Pro.
Gene expression data
The first time-series data of hox gene expression in induced cell cultures were obtained to the late 1980s (3). Moreover, the first detailed data on dynamics of expression patterns in developing vertebrate limbs appeared >10 years ago (4) and have been improved lately. In the late 1990s, unprecedently detailed data on pattern dynamics of expression of the network of genes controlling segmentation of fruit fly early embryo with cellular level resolution were obtained (5). Many of these data are presented in the HOX Pro DB in 3D graphic form.
Sea urchin embryos. The fundamental significance of the data about the sea urchins hox genes and morphological simplicity of its early embryo have inspired us to begin working out graphic tools. The simplicity of this object allows us to represent expression patterns of the controlling genes from post fertilization up to gastrulation and for each cell of this early embryo. By means of the Java language it is possible to visualize stages of development from one cell up to the blastula and gastrulation stage (Fig. 1). The color of each cell represents the gene expression level inside it. The user has the ability to rotate these 3D graphic objects and to zoom in or out.
Figure 1.
An example of 3D diagram in the form of Java-applet for early sea urchin embryo. It is a model of urchin’s gastrula consisting of ∼700 cells. It represents region-specific expression of the S.purpuratus hox gene SpHmx, which in midgastrula stage embryos expressed throughout the archenteron (7).
By means of 3D graphic applets we visualized the character of expression, for example, of P.lividus homeobox gene PlHbox12. This gene is one of the earliest transcribed zygotic homeobox genes of echinids, possibly involved in the initial specification of cell fate (6). The user has the ability to compare PlHbox12 early patterns with the known five territories of determination detected to the 60-cell embryo stage. This applet allows the user to compare the expression patterns of those sea urchin’s homeobox genes for which evident region-specific expression is demonstrated (Fig. 1). It is, for example, the S.purpuratus hox-gene SpHmx, which in midgastrula stage embryos is expressed throughout the archenteron (7).
Fruit fly embryos. The cascade of regulator genes which target the Drosophila HOM-C complex remain the most investigated at the present time. The core of the cascade is also formed by various hox genes localized mainly outside of the HOM-C complex. Therefore, this ensemble of genes that control development is the natural prototype for interpretation of the organization and functions of other genetic ensembles.
The early fly embryo is a syncytium with about 5000 nuclei arranged in a roughly ellipsoidal shell. In this syncytial stage the most dramatic events in the determination of the Drosophila body plan takes place. Our 3D graphic tools allow us to transmit a rendered view of 2000–3000 nuclei on one side of the blastoderm of an early embryo. In Figure 2, our Java-applet represents the 3D reconstruction of about 2000 nuclei in the syncytial blastoderm of an embryo following the fourteenth nuclear division. The nuclei are densely packed and arranged in a nearly hexagonal grid.
Figure 2.
An example of 3D diagram in the form of Java-applet for early embryo of D.melanogaster. It is a 3D reconstruction of arrangement of nuclei and visualization of the EVEN-SKIPPED homeodomain pattern in gray palette. Early embryo has a form of prolate spheroid. We can see about 2000 nuclei in all (the darker the nucleus, the greater the local production of the EVEN-SKIPPED).
The applet transmits the levels of expression of concrete gene product (in this case, the even-skipped gene) in 8-bit grayscale encoding for each of about 2000 nuclei. As in the case of the previous applet, the user can rotate this 3D graphic object and zoom in or out. Note that this applet can be used not only for the purposes of visualization, but also for direct (with resolution at a level of each nucleus) comparison of patterns of expression of different genes.
Vertebrate embryos. The role of hox genes in formation of some structures in embryogenesis of amphibians, chicken, laboratory mammal and human is widely studied. Thereafter, these data are reviewed in the HOX Pro.
Most of the vertebrate data concerns the genetic control of limb development. A minimum of 22 hox genes is involved in the control of limb morphogenesis. In addition, the functional ensemble controlling limb development includes more than 10 other genes, which as a rule encode secreted factors. The available experimental data are insufficient for complete reconstruction of the functional organization of this ensemble of regulators of morphogenesis. However, some fragments of this functional network are already decrypted. Representation of all these data in a convenient graphic form requires the use of common graphics tools. These are utilized in the summary panels of images, interactive maps and 3D graphic models of limb buds and the limb skeleton.
DISCUSSION: DATA MINING AND SIMULATION
The long-range goal of the HOX Pro is its wide usage as instrument for data mining, or knowledge discovery. That is why we included two specialized sections of the database entitled ‘Data Mining’ and ‘Data Simulation’. The ‘Data Mining’ section illustrates how the database content can be used to discover new knowledge, while the ‘Data Simulation’ section is included to review the results in modeling/simulation of hox networks behavior.
We demonstrate the knowledge discovery in the HOX Pro by two examples of sequence analysis. One is the cluster analysis of the hox homeobox gene promoter regions. In the homeobox genes, apart from homeoboxes, there are other conserved sites, including non-coding regions. Cluster analysis allowed several gene groups to be revealed on the basis of conservation of sequences in the promoter zones (8). Two compact groups are clearly seen: genes homologous with the Deformed Drosophila gene and homologous with the Sex comb reduced Drosophila gene.
Another example is related to the similarity of hox RA-Responding Enhancers (RAREs) with Alu sequences. We collect the cases of sequence similarity between the documented examples of hox RAREs and known Alu motifs.
A gene expression database should ideally provide comparisons of spatial regions and patterns, in which case one can expect that it will be a useful source for knowledge discovery.
The ‘Data Simulation’ section includes two examples of computer simulation of hox genes expression dynamics. It is activation of even-skipped gene in early Drosophila development and activation of hox genes by retinoic acid in human embryo carcinoma cells.
In connection with the explosive growth of computer power and the development of modern heuristic methods of optimization, new work on the simulation of dynamics of genetic activity is necessary as a new approach to data mining. This is closely related to a reverse engineering approach, when the kinetics of activation of genetic ensemble is reconstructed on the basis of known empirical data on expression of the genes.
Acknowledgments
ACKNOWLEDGEMENTS
We thank J. Reinitz, who provided helpful comments on this manuscript. This work is supported by USA National Institutes of Health grant RO1-RR07801 and GAP awards RBO-685 and RBO-895; INTAS grant no. 97-30950; RFBR grant no. 00-04-48515.
REFERENCES
- 1.Spirov A.V., Bowler,T. and Reinitz,J. (1999) HOX Pro, a specialized database for clusters and networks of homeobox genes. Nucleic Acids Res., 28, 337–340. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Bard J.B., Baldock,R.A. and Davidson,D.R. (1998) Elucidating the genetic networks of development, a bioinformatics approach. Genome Res., 8, 859–863. [DOI] [PubMed] [Google Scholar]
- 3.Simeone A., Acampora,D., Arcioni,L., Andrews,P.W., Boncinelli,E. and Mavilio,F. (1990) Sequential activation of HOX2 homeobox genes by retinoic acid in human embryonal carcinoma cells. Nature, 346, 763–766. [DOI] [PubMed] [Google Scholar]
- 4.Dolle P., Izpisua-Belmonte,J.-C., Renucci,A., Falkenstein,H. and Duboule,D. (1989) Coordinate expression of the murine Hox-5 complex homoeobox-containing genes during limb pattern formation. Nature, 342, 767–772. [DOI] [PubMed] [Google Scholar]
- 5.Kosman D., Small,S. and Reinitz,J. (1998) Rapid preparation of a panel of polyclonal antibodies to Drosophila segmentation proteins. Dev. Genes Evol., 208, 290–294. [DOI] [PubMed] [Google Scholar]
- 6.Di Bernardo M., Russo,R., Oliveri,P., Melfi,R. and Spinelli,G. (1995) Homeobox-containing gene transiently expressed in a spatially restricted pattern in the early sea urchin embryo. Proc. Natl Acad. Sci. USA, 92, 8180–8184. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Martinez P. and Davidson,E.H., (1997) SpHmx, a sea urchin homeobox gene expressed in embryonic pigment cells. Dev. Biol., 181, 213–222. [DOI] [PubMed] [Google Scholar]
- 8.Spirov A.V. (1996) The role of some conservative sequences in regulatory elements of Antp-like, homeobox-containing genes of vertebrates. J. Evol. Biochem. Physiol., 32, 556–568. [PubMed] [Google Scholar]


