Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2010 Jan 8.
Published in final edited form as: Fly (Austin). 2008 Mar 8;2(2):58–66. doi: 10.4161/fly.6060

Pipeline for acquisition of quantitative data on segmentation gene expression from confocal images

Svetlana Surkova 1, Ekaterina Myasnikova 1, Hilde Janssens 2, Konstantin N Kozlov 1, Anastasia A Samsonova 3, John Reinitz 4, Maria Samsonova 1,*
PMCID: PMC2803333  NIHMSID: NIHMS59929  PMID: 18820476

Abstract

We describe a data pipeline developed to extract the quantitative data on segmentation gene expression from confocal images of gene expression patterns in Drosophila. The pipeline consists of five steps: image segmentation, background removal, temporal characterization of an embryo, data registration and data averaging. This pipeline was successfully applied to obtain quantitative gene expression data at cellular resolution in space and at the 6.5-minute resolution in time, as well as to construct a spatiotemporal atlas of segmentation gene expression. Each data pipeline step can be easily adapted to process a wide range of images of gene expression patterns.

Keywords: image processing, confocal microscopy, gene expression, image segmentation, spatial registration, background removal

Introduction

Biology is increasingly asking quantitative questions. Quantification is essential to understand the principles of organism functioning. Modern physics and engineering provide tools and strategies for accurate measurements and the acquisition of comprehensive and consistent data sets. For example, microarrays are widely used to quantify the levels of gene expression,1 and fluorescence restoration after photobleaching (FRAP) is applied to measure diffusion rate or molecular transport.2

In the last decade developmental biology has achieved tremendous progress. Due to the success of genetics and functional genomics a large number of genes controlling development has been cloned and sequenced, the products of these genes have been identified, and the function of many of these molecules has been revealed. However, despite of these spectacular achievements the integrative picture of how an organism controls the phenotype of tissues and organs is still absent.

The morphogenetic field is a basic unit of ontogeny.3 This physically detached area is formed by complex coordinated interactions of transcripts and proteins that make up the field. Understanding the developmental processes within the morphogenetic field requires a quantitative characterization of the dynamics of each of the field components.

Currently available methods for quantification of gene expression like microarrays, quantitative PCR and CAT assays have some limitations for investigations of developmental processes. All of these methods are based on the preparation of homogenates of cells and are unable to capture spatial information about gene expression.

The study of development in morphogenetic fields requires methods for the acquisition of gene expression data that will allow monitoring the expression time course of all of the genes simultaneously at the resolution of a single cell. Confocal scanning microscopy of fluorescently tagged molecules constitute the most common data acquisition strategy, which preserves spatial information about the distribution of gene products. This method involves the recognition of protein or RNA in situ by target-specific primary antibodies or a labeled antisense RNA probe. The detection method is usually based on the use of secondary antibodies conjugated with a fluorophore. Confocal microscopy provides high quality digital images ready for computer processing.

Nowadays, as confocal microscopy has become an important tool to monitor gene expression, methods applied to process confocal images come into focus. Here we describe a data pipeline developed to extract quantitative data on segmentation gene expression from confocal images of Drosophila embryos. We present an overview of each method of the pipeline in order to provide a reader with general ideas how to extract quantitative data from confocal images, while the detailed information about the processing algorithms may be found in our publications.49 We also discuss how each of the methods designed can be adapted to deal with images of other gene expression patterns.

Method for the Acquisition of Confocal Images

Experimental data

Approximately 1600 wild-type (OregonR) blastoderm Drosophila melanogaster embryos were collected, fixed and immunostained to detect the expression of maternal genes bicoid (bcd) and caudal (cad), gap genes Krüppel (Kr), knirps (kni), giant (gt), hunchback (hb) and tailless (tll) and pair-rule genes even-skipped (eve), fushi tarazu (ftz), hairy (h), runt (run), odd-skipped (odd), paired (prd) and sloppy-paired (slp).

The immunostaining procedure was previously described.8,10 Each embryo was stained for Eve protein and two other segmentation gene products. Approximately half of the embryos were additionally stained with an anti-histone H1–4 antibody (Chemicon) to mark the nuclei.8

Confocal microscopy

We selected only laterally oriented embryos, which developmental age span the time interval from cleavage cycle 10 up to the beginning of gastrulation. Confocal scanning was implemented as described previously.8,9,11 Subject to the number of microscope channels three or four different images were obtained for each stained embryo. All the images were acquired in 8-bit format, the size of each image being 1024 × 1024 pixels.

Most of embryos were additionally scanned in Differential Interference Contrast (DIC) optics to collect data on the blastoderm morphology for temporal classification.

One-dimensional expression patterns

The expression of segmentation genes is largely a function of position along the anterior-posterior (A-P) axis of the embryo body, and can be well represented in one dimension. The one-dimensional data on gene expression (Fig. 1) were extracted from the central 10% strip (45–55% D-V) along the A-P axis. Thus, each nucleus is characterized by the x-coordinate (A-P) only, while the y-coordinate (D-V) is ignored and the patterns, demonstrating the variation of gene expression along the x-axis, are presented as diagrams.

Figure 1.

Figure 1

Extraction of one-dimensional quantitative data from an image of expression patterns of the Drosophila segmentation genes. (A) Drosophila embryo from time class 6 of cycle 14A immunostained for Hb (blue), Kni (green) and Eve (red). Horizontal white lines delineate the position of the central 10% strip. (B) One-dimensional expression patterns of the same three genes.

Temporal classification of embryos

To reconstruct the temporal dynamics of gene expression from many embryos each fixed at a different stage of development, it is necessary to classify each embryo in time. As interphases of cleavage cycles 10–13 are short (only 6–14 minutes) each cleavage cycle can constitute a distinct time class defined by the total number of nuclei in an embryo. However, since cycle 14A is about 50 minutes long, we divided it into eight temporal equivalence classes based on thorough visual inspection of the expression pattern of the eve gene in cycle 14A embryos. Each class represents about 6.5 minutes of development (Fig. 2).6,12 The time classification of embryos based on the dynamics of the eve expression pattern matches the degree of membrane invagination, the morphological marker used to stage embryos in cycle 14A.13

Figure 2.

Figure 2

The 8 temporal classes of cycle 14A. For each temporal class we present a typical embryo. The left-hand panel displays the one-dimensional expression pattern of eve, the right-hand panel shows a high magnification DIC image of the blastoderm morphology. In the DIC images vertical black lines indicate the cortical cytoplasm, the black arrows in time classes 1 and 2 indicate the elongation of nuclei, and the white arrows in time classes 3–8 show the position of membrane front.

Access to the data

All images and quantitative data are stored in the publicly accessible database FlyEx (http://urchin.spbcas.ru/flyex/; http://flyex.ams.sunysb.edu/FlyEx/).14

Image and Data Processing Methods

We apply image segmentation and remove background to extract quantitative data from individual images. To construct a spatiotemporal atlas of segmentation gene expression these procedures were supplemented with image registration and data averaging. As the data was acquired from fixed embryos, a new automatic method for determination of an embryo age was developed to reconstruct the temporal dynamics of segmentation gene expression.

Image Segmentation

The image segmentation method8 presented here is aimed to find objects of interest (nuclei) in an embryo image, extract quantitative information about gene expression from images obtained in different microscope channels and write it to a data table. The method is implemented in the following steps.

Rotation of images to standard orientation

The averaged images (Fig. 3A1–4) are brought to standard orientation using a whole-embryo mask (Fig. 3D). The whole-embryo mask is a binary image where the values of pixels belonging to the embryo are set to one and those outside the embryo to zero. The mask is constructed from the pixel maximum image (Fig. 3B) obtained by comparing values for each pixel in all the channel images (Fig. 3A1–4) and giving it the highest found value. The binary mask is obtained by thresholding and median filtering of the maximum image (Fig. 3C), followed by several cycles of erosion and dilation.15 To determine the rotation angle the invariant moments of the mask16 are calculated. The whole-embryo mask and averaged channel images are rotated by this angle and non-informative image areas outside the embryo boundaries are cut away. The procedure of embryo rotation may somewhat corrupt the embryo contour since structural elements which are applied to smooth the mask depend on image orientation. Therefore the whole-embryo mask is remade to exactly cover each of the averaged channel images (Fig. 3). The averaged images are thus cropped to the exact size of the corrected whole-embryo mask (Fig. 3D).8,11

Figure 3.

Figure 3

Scheme of image segmentation and acquisition of quantitative data. (A1–4) Original images from four different microscope channels of the same embryo. (B–D) Construction of a whole-embryo mask. (E1–4) Channel images rotated and cropped using the mask (D). (F–J) Construction of a nuclear mask. (K–M) Quantitative data acquired from the channel image L using the nuclear mask (K).

Construction of nuclear mask

The nuclear mask (Fig. 3K) is the binary image with nonzero pixels corresponding exclusively to within-nucleus pixels.8 It is generated using either the image stained for histones, if available, or the pixel maximum image of all the available channel images (Fig. 3F). After the contrast enhancement and denoising of this image a watershed image is created. Lines of single pixel width define the watershed domains, that bound regions occupied by single nuclei. The nucleus boundaries are found by the Shen-Castan edge detection method (Fig. 3G–J).17

Segmentation accuracy

The quality of a nuclear mask is inspected visually by superimposing it on the histone or pixel maximum image (Fig. 4A). A detailed quantitative test has been developed based on the assumption that the dispersion of pixel intensities within a nucleus is less than those between pixels inside and outside the nucleus. The set of pixels inside and outside nuclei are specified as two groups (Fig. 4B and C, respectively) and then the significance of the ratio of variances between and within these groups is statistically tested using the F-test.

Figure 4.

Figure 4

Quality control of the image segmentation method (A) Fragment of an overlay of the nuclear mask with the image stained for histones. (B and C) Labeled pixels represent nuclei (B) and islands of cytoplasm, which surround nuclei (C). Labeled regions in (C) were isolated by subtraction of nuclei from the watershed domains.

Acquisition of quantitative data

The nuclear mask is superimposed on each channel image to compute the coordinates of nuclei centroids and mean fluorescence intensities over each nucleus.8,11 Finally the data are presented as a table containing nucleus number, x and y coordinates of its centroid measured in percent of the embryo length and width, as well as the averaged fluorescence intensities (relative expression levels) for each gene scanned in the embryo (Fig. 3M).

Most of the operations described above are standard and can be applied to identify objects and extract quantitative data from images of expression patterns of other Drosophila genes.

Background Removal

It is well known that methods for immunofluorescent labeling of biological objects in situ give rise to a low level of nonspecific staining, or “background”.

Evidently, even a low background distorts the quantitative levels of gene expression. The degree of these distortions varies between embryos, from experiment to experiment and even among secondary antibodies conjugated to different fluorescent dyes (Fig. 5A and B).

Figure 5.

Figure 5

(A) Example of two one-dimensional non-registered expression patterns of the eve gene stained with the same primary antibodies and two different secondary antibodies, one conjugated to Cy5 (gray) and the other to Texas Red (black). (B) The same two expression patterns after the background removal. (C) The two-dimensional pattern of Eve staining in an embryo homozygous for Df(2R)eve1.27 which has no Eve protein. Individual nuclei are shown as gray circles, black drop lines parallel to the vertical axis indicate the level of fluorescence intensity in the nuclei. (D and E). Projections of the 10% strip cuts taken along the midline in the A-P and D-V directions. Dashed lines indicate the locations of the 10% strips on the 2D image. The 1D patterns are presented together with the 1D projections of the paraboloid.

The method for removal of background, which we have developed,9 is based on our observation that in null mutant embryos stained for the absent (mutated) protein the level of fluorescent intensity is well fit by a two-dimensional quadratic paraboloid (Fig. 5D and E). The parabolic distribution of background can be most likely explained by the properties of the confocal microscope and the convex shape of an embryo. The main idea of the method is to approximate the background signal by a paraboloid using the nonexpressing areas of an embryo and then apply this paraboloid to rescale the whole image. This procedure is implemented in several steps.

Assignment of nonexpressing areas

Nonexpressing areas for a given gene are those parts of the embryo, in which the gene is not expressed in most nuclei. These regions are found by visual inspection of one- and two-dimensional expression patterns of the given gene in all the embryos. However, in two dimensions there is residual curvature of stripes in the D-V direction, hence to determine the nonexpressing regions in two dimensions the curved stripes are straightened by coordinate transformation.9

Background approximation and removal

The background is approximated by a quadratic paraboloid fit to the points of support, which are extracted from the straightened nonexpressing regions of the two-dimensional pattern. The approximating paraboloid is found by an iterative optimization procedure. Finally, background is removed from the entire embryo by a linear mapping of intensity that transforms fluorescence at or below background level to zero and transforms maximum possible fluorescence (255) to itself. Examples of background removal from expression patterns of different genes and at different developmental times are presented in Figure 6.

Figure 6.

Figure 6

Results of background removal from the representative expression patterns of several genes. All the patterns were obtained from embryos belonging to cleavage cycle 14A except those for which the cycle is specified in the figure. Patterns with background are shown by white circles, background free patterns are given by black circles, and the background parabola by a solid line.

Estimation of the background removal accuracy

The method for background removal was carefully tested against mutant embryos (or embryos from mutant mothers) bearing homozygous protein null alleles of bcd, eve, gt, kni or Kr and stained for the protein product of that gene. The expression patterns of these genes were transformed into essentially zero expression in the whole embryo (Fig. 5C–E). Visual inspection of expression patterns demonstrates that the method provides good results for most patterns in cleavage cycle 14A. Typical results for bcd, hb, Kr, eve and run at this stage are shown in Figure 6. For the expression patterns of gap and pair-rule genes at earlier stages of development (cycles 10–13) the method is less accurate due to low signal intensity and imprecise domain borders.

Determination of Developmental Age

In our previous work6,12 each cycle 14A embryo was assigned to one of the eight time classes (Fig. 2). It is assumed that embryos belonging to one and the same class do not have clear expression pattern differences for each stained gene product. However, this classification is rough and non-exhaustive as expression patterns contain additional information, which allows to more precisely determine the developmental age of an embryo. This can be done by standardization of expression patterns against morphological data.

The examination of the blastoderm morphology and measurement of the degree of membrane invagination is a commonly used method to stage embryos in cycle 14A. From these measurements the embryo age can be evaluated by using the standard curve13 that gives membrane invagination as a function of developmental time. The method requires obtaining images of the blastoderm morphology in each embryo and is time-consuming for this reason. To automate the process of embryo staging we have developed a method to predict the embryo age from its gene expression pattern.7 The method is based on the analysis of the highly dynamic expression pattern of the eve gene, which is visualized in each embryo, and standardization of these expression patterns against a small training set of embryos with a known developmental age. As a prediction method we use the support vector regression (SVR) method.

The method comprises three stages: (1) design of the training set of embryos in which each embryo is characterized by a small number of characteristic features of the eve expression pattern and the precise developmental age determined by measuring the degree of membrane invagination; (2) construction of the regression function from the training set; (3) prediction of age of other embryos in the dataset on the basis of their gene expression patterns.7

Construction of the regression function

We use a training set of 103 embryos, for which the precise developmental age was determined by measuring membrane invagination. Each embryo is characterized by a multidimensional vector with components defined by the value of developmental age together with the 13 extremal values of one-dimensional expression pattern of the eve gene. The total number of embryos in the training set is not so big compared to the size of the parameter vector; therefore the prediction method may demonstrate unreliable results due to over-fitting. Moreover the 13 extremal values of the eve gene expression pattern are strongly correlated and hence redundant. Because of this the dimension of the feature vector was reduced to three uncorrelated components by means of the principal component analysis.

The SVR method was developed by Vapnik,18 Smola and Scholkopf19 and in linear case can be briefly formulated as follows. The training data are presented by observations (embryos). Each observation consists of a pair, namely a vector of characteristic features, considered as regression variables, and the associated ‘true’ embryo age, determined by measuring the degree of membrane invagination. The goal is to find a vector of regression coefficients that minimizes the regularized empirical risk functional defined as the total deviation between observed and predicted ages. In ε-SVR18 the deviation is penalized when it exceeds a given small value ε, while the deviations smaller than ε are not taken into account. The problem is regularized to prevent too large values of the coefficients. Then the regression function is used to predict the age of embryos with no membrane measurements available.

Estimation of prediction accuracy

To cross-validate the prediction accuracy we apply the leave-one-out test excluding, one by one, a single item from the training set and predicting the age for the excluded embryo. As a criterion of the quality of prediction the risk functional is used with the entries computed for the excluded items. In our dataset the prediction error was about 2 minutes. Taking into account that the ages of embryos in the training set vary over a range of 20 to 50 minutes from the onset of cycle 14A, the error is small enough to conclude that the predictions are reliable. Results presented in Figure 7 show that the predicted ages correlate well with the temporal classification of embryos.7

Figure 7.

Figure 7

Determination of the developmental embryo age. Predicted ages (in minutes) of embryos that do not belong to the training set. The data are grouped according to the predefined temporal classes 3–8 within cycle 14A.

Image Registration

The confocal microscope permits to scan for expression of a small number of genes at once. Due to individual variability of embryo sizes the direct superposition of gene expression patterns from different embryos does not provide the information about mutual localization of expression domains. This problem can be solved by image registration, which brings individual embryos to the common coordinate system.

Extraction of ground control points

Our method for image registration5,6 is based on the extraction of “ground control points” (GCPs),20 which are a small number of characteristic features in each image, and application of a coordinate transformation to make the GCP coincide as closely as possible on different images. We use the extrema of the eve one-dimensional expression pattern as GCPs. We applied quadratic spline approximation or fast redundant dyadic wavelet transform to extract the information about the localization of these extrema.46,21

Image registration

The registration of images is performed by resizing the two-dimensional expression patterns of eve gene along the x-axis by the affine transformation. The transformation is performed to minimize the total distance between x-coordinates of all the GCPs in different patterns and the mean position of the corresponding point computed over all the registered images.5 The method is applied to register the expression patterns of segmentation genes in embryos belonging to one and the same temporal class. The accuracy of the method for each temporal class is estimated by considering the standard deviations of x-coordinates of the eve extrema. In late cycle 14A the standard deviations are less than 1% of embryo length. A single nucleus is about 1% egg length in diameter, so this represents a high level of accuracy. In early cycle 14A time classes the accuracy of registration is lower, as the pattern is not formed yet and only a few GCP can be detected. The example shown in Figure 8 clearly illustrates that registration reduces the spatial variability of the patterns.

Figure 8.

Figure 8

Example of image registration. We show the 1D expression patterns of eve gene in five embryos from temporal class 8 before (A) and after (B) registration.

Data Averaging

The final step of the data pipeline is the construction of reference data for each segmentation gene and each time point. We call such a data as integrated data. The integrated data were built from registered data without background. There are two types of integrated data, namely one-dimensional integrated data and two-dimensional integrated patterns.

To construct two-dimensional integrated patterns first the averaged nuclear structure of an embryo was computed. The number and location of nuclei in individual embryos are variable, hence we calculate the spatial density of the nuclei distribution and compute the average nucleus diameter in each point of the averaged embryo.5

On the next step each nucleus of an individual embryo was associated with the closest averaged nucleus on the averaged nuclear structure, and then the averaged fluorescence intensity was computed over all the individual nuclei associated with this averaged nucleus. Figure 9A presents the integrated two-dimensional expression patterns of gt, Kr and eve in temporal class 8. To make sure that the integrated pattern preserves the main features of individual patterns we show an individual pattern with the same combination of scanned genes in Figure 9B. The images are not identical due to individual variation of gene expression patterns among embryos but they look very similar and it is clearly seen that the integrated pattern correctly reproduces the shape and size of expression domains. In Figure 9C the integrated patterns of Kr, gt, kni in temporal class 8 are presented. It is to be noted that our dataset does not contain embryos stained simultaneously for these three proteins. By mapping integrated patterns of different genes on the averaged nuclear structure of an embryo it is possible to display and visualize the expression domains of segmentation genes in any desirable combination.

Figure 9.

Figure 9

Two-dimensional integrated patterns of segmentation gene expression in time class 8. (A) Integrated data for Gt, Kr and Eve. (B) Expression patterns of the same genes in an individual embryo of the same time class. (C) “Virtual” embryo with 2D integrated data for Kr, Kni and Gt.

To construct the one-dimensional integrated data for a given gene the x-coordinates of nuclei in each registered one-dimensional expression pattern without background are grouped along the A-P axis into R intervals.6 Then the average fluorescence intensity of a given gene is calculated within each interval over all the embryos from the same time class. The value of R is defined by the requirement to correctly model the averaged nuclear structure of an embryo. A single nucleus is very close to 1% egg length in diameter in cycle 14A and the central part of an embryo, hence R should be taken equal to 100 to model a single row of nuclei. Figure 10 presents the one-dimensional integrated data on expression of maternal, gap and pair-rule genes in early and late time classes of cycle14A.

Figure 10.

Figure 10

One-dimensional integrated data on the expression of segmentation genes in cycle 14A. Left column: Maternal and gap genes in time classes 1 (top) and 6 (bottom); right column: pair-rule genes in time classes 3 (top) and 8 (bottom).

Discussion

In this paper we describe the data pipeline proposed to acquire quantitative data on segmentation gene expression from confocal images of gene expression patterns. It includes image segmentation, background removal, determination of developmental age of an embryo, image registration and data averaging.49 These methods can be applied both sequentially and independently and joined in different combinations. The main advantages of the pipeline are in a wide range of processing methods, automatic performance, flexibility in use and ease of adaptation to other problems.

The majority of image processing methods that have been developed so far solve only one specific problem (e.g., image registration), and, hence, include only one or two procedures. Among such methods the segmentation of cell and tissue images is the most widely applied (reviewed in refs. 2225). An essential disadvantage of most of the image segmentation algorithms is the requirement of manual adjustment of parameters by a user. Our segmentation method is fully automatic and the only stage, which requires a visual control, is the detection of embryo orientation (Fig. 3).

Several scientific groups have also developed their own methods to segment and acquire quantitative data from confocal images of the Drosophila blastoderm embryos. For example, Houchmandzadeh et al.26 extracted quantitative data on bcd and hb expression by sliding a window of the size of an average nucleus along the dorsal side of the embryo. The fluorescence intensity was averaged within each window. Later the same method was applied by Gregor et al.27 The limitations of this method are that it operates only with the nucleus of constant size and does not isolate the nuclei outlines. This makes it problematic to extend the method to other biological objects.

One of the recent approaches to the segmentation of three-dimensional stacks of confocal images of the Drosophila blastoderm was developed by Luegno et al.28 This method finds the local maxima of fluorescence intensities in nuclei stained with Sytox Green Dye and then detects the so called seeds in each nucleus. The seeds are grown to fill the nuclei using the thresholded initial image. The advantage of this method is that it was originally developed to process three-dimensional data. However the procedure of the seed detection is not very reliable because the majority of nuclei contain several local maxima, and a complicated method for boundary correction is required to improve the image segmentation quality. In addition, the seed growth is based on the mask built from the thresholded images, which cannot always provide the precise shape and size of the nuclei.

So far there exist a few methods for the removal of non-specific background from the confocal images of fluorescently tagged molecules, as well as for image registration. The most common approach to remove background consists in direct subtraction of a constant level of fluorescence intensity from an image. The approach was experimentally substantiated by Gregor et al.29 The authors considered images of live Drosophila embryos, in which the endogenic bcd was replaced by the construct expressing eGFP-Bcd. The images were acquired with a two-photon confocal microscope. The level of background staining was approximately uniform over an embryo body in these images. However, such a situation is not typical and it is important to take into consideration the possibility of non-uniform distribution of background in a confocal image. Recently a method for image registration based on a new strategy has been published.30 This method, named as the algorithm for elastic registration deforms the coordinates non-linearly to make the images coincide as close as possible. After transformation all the registered images are brought to one and the same size. This method is very accurate, however it deforms the coordinates of raw images and therefore cannot be used to determine the location of expression domains.

The knowledge of a precise embryo age is absolutely necessary to reconstruct temporal dynamics of gene expression and to decipher the network of genetic interactions that underlies early development in Drosophila. Staging of embryos fixed at cycle 14A is a complicated problem, which is usually solved by the application of time-consuming and expensive experimental method. We have developed a method for staging of an embryo on the basis of its gene expression pattern by applying SV regression.7 The method is fast and automatic, shows good prediction accuracy and thus constitute a great advance in staging of fixed Drosophila embryos.

Our pipeline was successfully applied to process about 5000 images scanned in 1580 embryos and acquire the quantitative dataset on segmentation gene expression in the Drosophila blastoderm. This dataset has cellular resolution in space and 6.5-minute resolution in time. All the images and quantitative data are stored in the FlyEx database and are widely used by scientific community to study the mechanism of pattern formation, infer regulatory interactions in the segmentation genetic network and develop new mathematical models.12,3145

Generally, it is a hard task to adapt the problem-oriented methods and their software implementations to solve similar problems in different biological systems. Our pipeline was successfully adapted to acquire data on expression of segmentation genes at the RNA level. In Drosophila the main difference between expression of the transcription factor genes at protein and RNA levels is that mRNA is localized not only in the nuclei but also in the cytoplasm. In order to segment these images we have modified the watershed procedure to construct a mask both for the nuclei and surrounding cytoplasm. In addition the method of background removal was modified in a part of selection of the points of support. Subsequently we have acquired the averaged quantitative mRNA data on expression of different reporter constructs of the eve gene that were used to model gene expression from sequence,46 as well as the averaged quantitative data on expression of gap genes in early cleavage cycles 10–13.47 We have also adapted the image segmentation method to obtain the three-dimensional data and processed several stacks of embryo images scanned for the rhomboid expression. The pipeline was also modified to construct integrated quantitative data on gene expression in early development of the coral Acropora millepora and the sea anemone Nematostella vectensis.48 In this case image segmentation was also based on masking of the whole embryo and areas of gene expression with somewhat different combination of filters used to preprocess initial images. The quantitative gene expression data were presented using polar coordinates and the registration of expression patterns was done by a method based on the extraction of GCPs.6

Acknowledgments

This work was supported by NIH grant RR07801, GAP awards RBO-1286 and RUB1-1578 and by RFBR grant 08-04-00712-a. We thank David Kosman and Carlos E. Vanario-Alonso for the acquisition of confocal images and Alexander Samsonov for valuable discussions.

References

  • 1.Banerjee N, Zhang MQ. Functional genomics as applied to mapping transcription regulatory networks. Curr Op Microbiol. 2002;5:313–7. doi: 10.1016/s1369-5274(02)00322-3. [DOI] [PubMed] [Google Scholar]
  • 2.Shav Tal Y. The living test-tube: imaging of real-time gene expression. Soft Matter. 2006;2:361–70. doi: 10.1039/b600234j. [DOI] [PubMed] [Google Scholar]
  • 3.Gilbert S, Opitz J, Raff R. Resynthesizing evolutionary and developmental biology. Developmental Biology. 1996;173:357–72. doi: 10.1006/dbio.1996.0032. [DOI] [PubMed] [Google Scholar]
  • 4.Kozlov K, Myasnikova E, Samsonova M, Reinitz J, Kosman D. Method for spatial registration of the expression patterns of Drosophila segmentation genes using wavelets. Computational Technologies. 2000;5:112–9. [Google Scholar]
  • 5.Kozlov K, Myasnikova E, Pisarev A, Samsonova M, Reinitz J. A method for two-dimensional registration and construction of the two-dimensional atlas of gene expression patterns in situ. In Silico Biology. 2002;2:125–41. [PubMed] [Google Scholar]
  • 6.Myasnikova E, Samsonova A, Kozlov K, Samsonova M, Reinitz J. Registration of the expression patterns of Drosophila segmentation genes by two independent methods. Bioinformatics. 2001;17:3–12. doi: 10.1093/bioinformatics/17.1.3. [DOI] [PubMed] [Google Scholar]
  • 7.Myasnikova E, Samsonova A, Samsonova M, Reinitz J. Support vector regression applied to the determination of the developmental age of a Drosophila embryo from its segmentation gene expression patterns. Bioinformatics. 2002;18:87–95. doi: 10.1093/bioinformatics/18.suppl_1.s87. [DOI] [PubMed] [Google Scholar]
  • 8.Janssens H, Kosman D, Vanario Alonso CE, Jaeger J, Samsonova M, Reinitz J. A high-throughput method for quantifying gene expression data from early Drosophila embryos. Dev Genes Evol. 2005;215:374–81. doi: 10.1007/s00427-005-0484-y. [DOI] [PubMed] [Google Scholar]
  • 9.Myasnikova E, Samsonova M, Kosman D, Reinitz J. Removal of background signal from in situ data on the expression of segmentation genes in Drosophila. Dev Genes Evol. 2005;215:320–6. doi: 10.1007/s00427-005-0472-2. [DOI] [PubMed] [Google Scholar]
  • 10.Kosman D, Small S, Reinitz J. Rapid preparation of a panel of polyclonal antibodies to Drosophila segmentation proteins. Dev Genes Evol. 1998;208:290–4. doi: 10.1007/s004270050184. [DOI] [PubMed] [Google Scholar]
  • 11.Kosman D, Reinitz J, Sharp DH. Automated assay of gene expression at cellular resolution. In: Altman R, Dunker K, Hunter L, Klein T, editors. Proceedings of the 1998 Pacific Symposium on Biocomputing. Singapore: World Scientific Press; 1997. pp. 6–17. [PubMed] [Google Scholar]
  • 12.Surkova S, Kosman D, Kozlov K, Manu Myasnikova M, Samsonova AA, Spirov A, Vanario Alonso C, Samsonova M, Reinitz J. Characterization of the Drosophila segment determination morphome. Dev Biol. 2008;313:844–62. doi: 10.1016/j.ydbio.2007.10.037. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Merrill P, Sweeton D, Wieschaus E. Requirements for autosomal gene activity during precellular stages of Drosophila melanogaster. Development. 1988;104:495–509. doi: 10.1242/dev.104.3.495. [DOI] [PubMed] [Google Scholar]
  • 14.Poustelnikova E, Pisarev A, Blagov M, Samsonova M, Reinitz J. A database for management of gene expression data in situ. Bioinformatics. 2004;20:2212–21. doi: 10.1093/bioinformatics/bth222. [DOI] [PubMed] [Google Scholar]
  • 15.Gonzalez RC, Woods RE. Digital image processing. Upper Saddle River, NJ: Prentice Hall; 2002. [Google Scholar]
  • 16.Hu MK. Visual pattern recognition by moment invariants. IRE transactions of information theory. 1962;8:179–87. [Google Scholar]
  • 17.Shen J, Castan S. An optimal linear operator for edge detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; Miami. 1986. pp. 109–14. [Google Scholar]
  • 18.Vapnik V. The Nature of Statistical Learning Theory. New York: Springer; 1995. [Google Scholar]
  • 19.Schölkopf B, Smola A. Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press; 2002. Learning with Kernels. [Google Scholar]
  • 20.Brown LG. ACM computing surveys. 1992;24:325–76. [Google Scholar]
  • 21.Myasnikova E, Kosman D, Reinitz J, Samsonova M. Spatiotemporal registration of the expression patterns of Drosophila segmentation genes. In: Lengauer T, Schneider R, Bork P, Brutlag D, Glasgow J, Mewes HW, Zimmer R, editors. Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology. Menlo Park, CA: AAAI Press; 1999. pp. 195–201. [PubMed] [Google Scholar]
  • 22.Umesh Adiga PS, Chaudhuri BB. Efficient cell segmentation tool for confocal microscopy tissue images and quantitative evaluation of FISH signal. Microsc Res Tech. 1999;44:49–68. doi: 10.1002/(SICI)1097-0029(19990101)44:1<49::AID-JEMT6>3.0.CO;2-6. [DOI] [PubMed] [Google Scholar]
  • 23.Ortiz de Solórzano C, Garcia Rodriguez E, Jones A, Pinkel D, Gray JW, Sudar D, Lockett SJ. Segmentation of confocal microscope images of cell nuclei in thick tissue sections. J Microsc. 1999;193:212–26. doi: 10.1046/j.1365-2818.1999.00463.x. [DOI] [PubMed] [Google Scholar]
  • 24.Ortiz de Solorzano C, Lelievre S, Lockett SJ, Malladi R. Segmentation of Cell and Nuclei using Membrane Related Proteins. Journal of Microscopy-Oxford. 2001;201:1–13. doi: 10.1046/j.1365-2818.2001.00854.x. [DOI] [PubMed] [Google Scholar]
  • 25.Chawla MK, Lin G, Olson K, Vazdarjanova A, Burke SN, McNaughton BL, Worley PF, Guzowski JF, Roysam B, Barnes CA. 3D-catFISH: a system for automated quantitative three-dimensional compartmental analysis of temporal gene transcription activity imaged by fluorescence in situ hybridization. J Neurosci Methods. 2004;139:13–24. doi: 10.1016/j.jneumeth.2004.04.017. [DOI] [PubMed] [Google Scholar]
  • 26.Houchmandzadeh B, Wieschaus E, Leibler S. Establishment of developmental precision and proportions in the early Drosophila embryo. Nature. 2002;415:748–9. doi: 10.1038/415798a. [DOI] [PubMed] [Google Scholar]
  • 27.Gregor T, Bialek W, de Ruyter van Steveninck RR, Tank DW, Wieschaus EF. Diffusion and scaling during early embryonic pattern formation. Proc Natl Acad Sci USA. 2005;102:18403–7. doi: 10.1073/pnas.0509483102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Luengo Hendriks CL, Keränen SV, Fowlkes CC, Simirenko L, Weber GH, DePace AH, Henriquez C, Kaszuba DW, Hamann B, Eisen MB, Malik J, Sudar D, Biggin MD, Knowles DW. Three-dimensional morphology and gene expression in the Drosophila blastoderm at cellular resolution I: data acquisition pipeline. Genome Biology. 2006;7:123. doi: 10.1186/gb-2006-7-12-r123. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Gregor T, Tank DW, Wieschaus EF, Bialek W. Probing the Limits to Positional Information. Cell. 2007;130:153–64. doi: 10.1016/j.cell.2007.05.025. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Sorzano COS, Blagov M, Thevenaz P, Myasnikova E, Samsonova M, Unser M. Algorithm for Spline-Based Elastic Registration in Application to Confocal Images of Gene Expression. Pattern Recognition and Image Analysis. 2006;16:93–6. [Google Scholar]
  • 31.Holloway DM, Harrison LG, Spirov AV. Noise in the segmentation gene network of Drosophila with implications for mechanisms of body axis specification. In: Bezrukov SM, editor. Fuctuations and Noise in Biological, Biophysical, and Biomedical Systems Proceedings of the SPIE. Vol. 5110. 2003. pp. 180–91. [Google Scholar]
  • 32.Pereanu W, Hartenstein V. Digital three-dimensional models of Drosophila development. Curr Opin Genet Dev. 2004;14:382–91. doi: 10.1016/j.gde.2004.06.010. [DOI] [PubMed] [Google Scholar]
  • 33.Jaeger J, Surkova S, Blagov M, Janssens H, Kosman D, Kozlov K, Manu Myasnikova E, Vanario Alonso CE, Samsonova M, Sharp D, Reinitz J. Dynamic control of positional information in the early Drosophila embryo. Nature. 2004a;430:368–71. doi: 10.1038/nature02678. [DOI] [PubMed] [Google Scholar]
  • 34.Jaeger J, Blagov M, Kosman D, Kozlov K, Manu Myasnikova E, Surkova S, Vanario Alonso CE, Samsonova M, Sharp DH, Reinitz J. Dynamical analysis of regulatory interactions in the gap gene system of Drosophila melanogaster. Genetics. 2004b;167:1721–37. doi: 10.1534/genetics.104.027334. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Diambra L, Da Costa F. Complex networks approach to gene expression driven phenotype imaging. Bioinformatics. 2005;21:3846–51. doi: 10.1093/bioinformatics/bti625. [DOI] [PubMed] [Google Scholar]
  • 36.Aegerter Wilmsen T, Aegerter CM, Bisseling T. Model for the robust establishment of precise proportions in the early Drosophila embryo. J Theor Biol. 2005;234:13–9. doi: 10.1016/j.jtbi.2004.11.002. [DOI] [PubMed] [Google Scholar]
  • 37.Isalan M, Lemerle C, Serrano L. Engineering gene networks to emulate Drosophila embryonic pattern formation. PLoS Biol. 2005;3:64. doi: 10.1371/journal.pbio.0030064. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Ludwig MZ, Palsson A, Alekseeva E, Bergman CE, Nathan J, Kreitman M. Functional evolution of a cis-regulatory module. PLoS Biology. 2005;3:588–98. doi: 10.1371/journal.pbio.0030093. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Holloway DM, Harrison LG, Kosman D, Vanario-Alonso CE, Spirov AV. Analysis of pattern precision shows that Drosophila segmentation develops substantial independence from gradients of maternal gene products. Dev Dyn. 2006;235:2949–60. doi: 10.1002/dvdy.20940. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Krishna S, Banerjee B, Ramakrishnan TV, Shivashankar GV. Stochastic simulations of the origins and implications of long-tailed distributions in gene expression. PNAS. 2005;102:4771–6. doi: 10.1073/pnas.0406415102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Ochoa-Espinosa A, Yucel G, Kaplan L, Pare A, Pura N, Oberstein A, Papatsenko D, Small S. The role of binding site cluster strength in bicoid-dependent patterning in Drosophila. PNAS. 2005;102:4960–5. doi: 10.1073/pnas.0500373102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Perkins TJ, Jaeger J, Reinitz J, Glass L. Reverse engineering the gap gene network of Drosophila melanogaster. PLoS Computational Biology. 2006;2:51. doi: 10.1371/journal.pcbi.0020051. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Yucel G, Small S. Morphogens: Precise outputs from a variable gradient. Curr Biol. 2006;16:29–31. doi: 10.1016/j.cub.2005.12.005. [DOI] [PubMed] [Google Scholar]
  • 44.Zinzen R, Papatsenko D. Enhancer responses to similarly distributed antagonistic gradients in development. PLoS Computational Biology. 2007;3:84. doi: 10.1371/journal.pcbi.0030084. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Bergmann S, Sandler O, Sberro H, Shnider S, Schejter E, Shilo BZ, Barkai N. Pre-steady-state decoding of the Bicoid morphogen gradient. PLoS Biol. 2007;5:46. doi: 10.1371/journal.pbio.0050046. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Janssens H, Hou S, Jaeger J, Kim AR, Myasnikova E, Sharp D, Reinitz J. Quantitative and predictive model of transcriptional control of the Drosophila melanogaster even-skipped gene. Nature Genetics. 2006;38:1159–65. doi: 10.1038/ng1886. [DOI] [PubMed] [Google Scholar]
  • 47.Jaeger J, Sharp D, Reinitz J. Known maternal gradients are not sufficient for the establishment of gap domains in Drosophila melanogaster. Mechanisms of Development. 2007;124:108–28. doi: 10.1016/j.mod.2006.11.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Kozlov K, Pisarev A, Matveeva A, Kaandorp J, Samsonova M. Image Processing Package ProStack for Quantification of Biological Images. Proceedings of the 4th International Symposium on Networks in Bioinformatics (ISNB); 2007. [Google Scholar]

RESOURCES