Abstract
Mosaic analysis provides a means to probe developmental processes in situ by generating loss-of-function mutants within otherwise wildtype tissues. Combining these techniques with quantitative microscopy enables researchers to rigorously compare RNA or protein expression across the resultant clones. However, visual inspection of mosaic tissues remains common in the literature because quantification demands considerable labor and computational expertise. Practitioners must segment cell membranes or cell nuclei from a tissue and annotate the clones before their data are suitable for analysis. Here, we introduce Fly-QMA, a computational framework that automates each of these tasks for confocal microscopy images of Drosophila imaginal discs. The framework includes an unsupervised annotation algorithm that incorporates spatial context to inform the genetic identity of each cell. We use a combination of real and synthetic validation data to survey the performance of the annotation algorithm across a broad range of conditions. By contributing our framework to the open-source software ecosystem, we aim to contribute to the current move toward automated quantitative analysis among developmental biologists.
Author summary
Biologists use mosaic tissues to compare the behavior of genetically distinct cells within an otherwise equivalent context. The ensuing analysis is often limited to qualitative insight. However, it is becoming clear that quantitative models are needed to unravel the complexities of many biological systems. In this manuscript we introduce a computational framework that automates the quantification of mosaic analysis for Drosophila imaginal discs, a common setting for studies of developmental processes. The software extracts quantitative measurements from confocal images of mosaic tissues, rectifies any cross-talk between fluorescent reporters, and identifies clonally-related subpopulations of cells. Together, these functions allow users to rigorously ascribe changes in gene expression to the presence or absence of particular genes. We validate the performance of our framework using both real and synthetic data. We invite interested readers to apply these methods using our freely available software.
Introduction
Quantification will be essential as biologists study increasingly complex facets of organismal development [1]. Unfortunately, qualitative analysis remains common because it is often difficult to measure cellular processes in their native context. Modern fluorescent probes and microscopy techniques make such measurements possible [2–4], but the ensuing image analysis demands specialized skills that fall beyond the expertise of most experimentalists. Automated analysis strategies have addressed similar challenges in cytometry [5–7], genomics and transcriptomics [8–11], and other subdisciplines of biology [12, 13]. Image analysis has proven particularly amenable to automation, with several computer vision tools having gained traction among biologists [14–17]. These platforms are popular because they increase productivity, improve the consistency and sensitivity of measurements, and obviate the need for specialized computational proficiency [18–20]. Designing similar tools to help biologists probe and measure developmental processes in vivo will further transform studies of embryogenesis and development into quantitative endeavors.
Developmental biologists study how the expression and function of individual genes coordinate the emergence of adult phenotypes. They often ask how cells respond when a specific gene, RNA, or protein is perturbed during a particular stage of development. Cell response may be characterized by changes in morphology, or by changes in the expression of other genes (Fig 1A). Experimental efforts to answer this question were historically stifled by the difficulty of isolating perturbations to a single developmental context, as the most interesting perturbation targets often confer pleiotropic function across several stages of development and can trigger early embryonic lethality [21–23].
Mosaic analysis addressed this challenge in Drosophila by limiting perturbations to a subset of cells within the imaginal discs of the larva [24, 25]. The technique yields a heterogeneous tissue comprised of genetically distinct patches of cells that are clonally related. Aside from rare de novo mutations, cells within each clone are genetically identical. Clone formation may be restricted to specific developing organs by using disc-specific gene promoters to drive trans-chromosomal recombination events in the corresponding imaginal discs [26, 27]. The timing of these events determines the number and size of the resultant clones [28]. Perturbations are applied by engineering the dosage of a target gene to differ across clones (Fig 1B), resulting in clones whose cells are either homozygous mutant (−/−), heterozygous wildtype (+/−), or homozygous wildtype (+/+) for the particular gene. Labeling these clones with the presence or absence of fluorescent markers enables direct comparison of cells subject to control or perturbation conditions, while maintaining otherwise equivalent developmental and physiological histories between the two cell populations (Fig 2A). Additional reporters may be used to monitor differences in RNA or protein expression, morphology, or cell fate choice across clones (Fig 2B). Variants of this strategy led to seminal discoveries in both neural patterning [29–31] and morphogenesis [32, 33], and remain popular today [34–36].
Quantitative microscopy techniques are well suited to measuring differences in cell behavior across clones. One reporter (a clonal marker) labels the clones, while others quantitatively report properties of their constituent cells, such as the expression level of a gene product of interest (Fig 2C). The former then defines the stratification under which the latter are compared. We call this strategy Quantitative Mosaic Analysis (QMA) because it replaces subjective visual comparison with a rigorous statistical alternative. Although a few recent studies have deployed this approach [37–40], qualitative visual comparison remains pervasive in the literature.
We suspect the adoption of QMA has been hindered by demand for specialized computational skills or, in their stead, extensive manual labor. Researchers must first draw or detect boundaries around individual nuclei in a procedure known as segmentation (Fig 2D). Averaging the pixel intensities within each boundary then yields a fluorescence intensity measurement for each reporter in each identified nucleus (Fig 2E). The measurements should then be corrected to account for any fluorescence bleedthrough between reporter channels (Fig 2F). Correction often requires single-reporter calibration experiments to quantify any potential crosstalk between different fluorophores, followed by complex calculations to remedy the data [41, 42]. Researchers must then label, or annotate, each identified nucleus as mutant, heterozygous, or homozygous for the clonal marker. Annotation is typically achieved through visual inspection (Fig 2G). Cells carrying zero, one, or two copies of the clonal marker should exhibit low, medium, or high average levels of fluorescence, respectively. However, both measurement and biological noise introduce the possibility that some cells’ measured fluorescence levels may not reliably reflect their genetic identity. Annotation must therefore also consider the spatial context surrounding each nucleus. For instance, a nucleus whose neighbors express high levels of the clonal marker is likely to be homozygous for the clonal marker, even if its individual fluorescence level is comparable to that of heterozygous cells (Fig 2G, white arrows). Spatial context is particularly informative in developing tissues where cell migration is minimal, such as the fly imaginal discs. With many biological replicates containing thousands of cells each, annotation can quickly become insurmountably tedious. The corrected and labeled measurements are then curated for statistical comparison by excluding those on the border of each clone, and limiting their scope to particular regions of the image field (Fig 2H). Combined, all of these tasks ultimately burden researchers and raise the barrier for adoption of QMA.
Automation promises to alleviate this bottleneck, yet the literature bears surprisingly few computational resources designed to support QMA. The ClonalTools plugin for ImageJ deploys an image-based approach to measure macroscopic features of clone morphology, but is limited to binary classification of mutant versus non-mutant tissue and offers no functionality for comparing reporter expression across clones [43]. Alternatively, the MosaicSuite plugin for ImageJ deploys an array of image processing, segmentation, and analysis capabilities to automatically detect spatial interactions between objects found in separate fluorescence channels [44, 45]. While useful in many other settings, neither of these tools support automated labeling of individual cells or explicit comparison of clones with single-cell resolution. Most modern studies employing a quantitative mosaic analysis instead report using some form of ad hoc semi-automated pipeline built upon ImageJ [37, 39, 40]. We are therefore unaware of any platforms that offer comprehensive support for an automated QMA workflow.
Here, we introduce Fly-QMA, a computational framework for automated QMA of Drosophila imaginal discs. Fly-QMA supports segmentation, bleedthrough correction, and annotation of confocal microscopy data (Fig 2D–2H). We demonstrate each of these functions by applying them to real confocal images of clones in the eye imaginal disc, and find that our automated approach yields results consistent with manual analysis by a human expert. We then generate and use synthetic data to survey the performance of our framework across a broad range of biologically plausible conditions. Fly-QMA is freely available online (see Data and software availability), along with an interactive coding tutorial designed to acquaint users with the core software features by applying them to example data.
Results
Quantification of nuclear fluorescence levels
We implemented a segmentation strategy based upon a standard watershed approach [52]. Briefly, we construct a foreground mask by Otsu thresholding the nuclear stain or nuclear label image following a series of smoothing and contrast-limited adaptive histogram equalization operations [52, 53]. We then apply a Euclidean distance transform to the foreground mask, identify the local maxima, and use them as seeds for watershed segmentation. When applied to the microscopy data, few visible spots in the nuclear stain were neglected, and the vast majority of segments outlined individual nuclei (S1C Fig).
This approach is flexible and should perform adequately in many scenarios. However, we acknowledge that no individual strategy can address all microscopy data because segmentation is strongly context dependent. All subsequent stages of analysis were therefore designed to be compatible with any data that conform to our standardized file structure. This modular arrangement grants users the freedom to use one of the many other available segmentation platforms [54], including FlyEye Silhouette [55], before applying the remaining functionalities of our framework. Regardless of how nuclear contours are identified, averaging the pixel intensities within them yields fluorescence intensity measurements for each reporter in each identified nucleus. We next sought to ensure that these measurements were suitable for comparison across clones.
Bleedthrough correction
Despite efforts to select non-overlapping reporter bandwidths and excite them sequentially, it is not uncommon for reporters excited at one wavelength to emit some fluorescence in the spectrum collected for another channel (Fig 2B, yellow lines) [41, 56]. The end result is a positive correlation, or crosstalk, between the measured fluorescence intensities of two or more reporters. Exogenous correlations between the measured fluorescence intensities of the clonal marker and the reporter of interest are problematic given that the purpose of the experiment is to detect changes in reporter levels with respect to the clonal marker.
In our microscopy data, individual clones were distinguished by their low, medium, or high expression levels of an RFP-tagged clonal marker (Fig 3A). These images should not have shown any detectable difference in GFP levels across clones because all cells carried an equivalent dosage of the control reporter (S1A Fig). However, the images visibly suffered from bleedthrough between the RFP and GFP channels (Fig 3A and 3B). Bleedthrough was similarly evident when we compared measured GFP levels across labeled clones. Nuclei labeled mutant, heterozygous, or homozygous for the clonal marker had low, medium, and high expression levels of the control reporter, respectively (Fig 3C, black boxes). The data were therefore ripe for systematic correction.
Spectral bleedthrough correction is common practice in other forms of cross-correlation and co-localization microscopy [41, 56]. These methods typically entail characterizing the extent of crosstalk between fluorophores globally [57, 58], on a pixel-by-pixel basis [42], or by experimental calibration [41], then detrending all images or measurements prior to subsequent analysis. Our framework adopts the global approach, using the background pixels in each image to infer the extent of fluorescence bleedthrough across spectral channels.
Specifically, we assume the fluorescence intensity Fij for channel i at pixel j is a superposition of a background intensity Bij and some function of the expression level Eij that we seek to compare across cells [59]:
(1) |
We further assume that the background intensity of a channel includes linear contributions from the fluorescence intensity of each of the other channels:
(2) |
where k is indexed over K anticipated sources of bleedthrough. Given estimates for each {α1, α2, …αK} and β we can then estimate the background intensity of each measurement:
(3) |
where the braces denote the average across all pixels within a single nucleus. The corrected signal value is obtained by subtracting the background intensity from the measured fluorescence level:
(4) |
Repeating this procedure for each nucleus facilitates comparison of relative expression levels across nuclei in the absence of bleedthrough effects. Bleedthrough correction performance is therefore strongly dependent upon accurate estimation of the bleedthrough contribution strengths, {α1, α2, …αK}.
We estimate these parameters by characterizing their impact on background pixels (see Methods). When applied to the microscopy data, bleedthrough correction successfully eliminated any detectable difference in GFP expression across clones (Fig 3C, red boxes, p > 0.05 two-sided Mann-Whitney U test).
Automated annotation of clones
Our annotation strategy seeks to label each identified cell as homozygous mutant, heterozygous wildtype, or homozygous wildtype for the clonal marker. Variation within each clone precludes accurate classification of a cell’s genotype solely on the basis of its individual expression level. However, in tissues where cell migration is minimal, clonal lineages are unlikely to exist in isolation because recombination events are typically timed to generate large clones. Our strategy therefore integrates both clonal marker expression and spatial context to identify clusters of cells with locally homogeneous expression behavior, then maps each cluster to one of the possible labels. This unsupervised approach lends itself to automated annotation because the clusters are inferred directly from the data without any guidance from the user.
We first train a statistical model to estimate the probability that a given measurement came from a cell carrying zero, one, or two copies of the clonal marker (S3A Fig). This entails fitting a weighted mixture of three or more bivariate lognormal distributions (components) to a two dimensional set of observations (S3B and S3C Fig). The first dimension corresponds to the clonal marker fluorescence level measured within each cell. The second dimension describes the local average expression level within the region surrounding each cell. We evaluate the latter by estimating a neighborhood radius from the decay of the radial correlation of the expression levels, then averaging the expression levels of all cells within that radius (S3D Fig). The second dimension therefore measures the spatial context in which a cell resides. We balance model fidelity against overfitting by using the Bayesian information criterion to determine the optimal number of model components (S3E Fig). We then cluster the components into three groups on the basis of their mean values (S3F Fig), effectively mapping each component to one of the three possible gene dosages. The model may be trained using observations derived from a single image, or with a collection of observations derived from multiple images. Once trained, the model is able to predict the conditional probability that an individual observation belongs to one of the model’s components, given its measured expression level.
We then use the learned conditional probabilities to detect entire clones, thus assigning a label to each cell. Rather than using the trained model to classify each observation, we compile a new set of observations by limiting each estimate of spatial context to spatially collocated communities with similar expression behavior (S4A Fig). We identify these communities by applying a community detection algorithm to an undirected graph connecting adjacent cells (S4B Fig). Edges in this graph are weighted by the similarity of clonal marker expression between neighbors, resulting in communities with similar expression levels (S4E Fig, Steps I and II). The graph-based approach increases spatial resolution by limiting the information shared by dissimilar neighbors. Applying the mixture model yields an initial estimate of the probability that an observation belongs to one of the model’s components (S4E Fig, Step III). We further refine these estimates by allowing the probabilities estimated for each cell to diffuse throughout the graph (S4E Fig, Step IV). The rate of diffusion between neighbors is determined by the weight of the edge that connects them, with more similar neighbors exerting stronger influence on each other. We then use the diffused probabilities to identify the most probable source component and label each observation (S4E Fig, Step V). These probabilities also provide a measure of confidence in the assigned labels. We replace any low-confidence labels with alternate labels assigned using a marginal classifier that neglects spatial context (S4F and S4G Fig), resulting in a fully labeled image (S4H Fig).
The algorithm leverages the collective wisdom of neighboring measurements to override spatially isolated fluctuations in clonal marker expression, and thereby enforces consistent annotation within contiguous regions of the image field. The size of these regions depends upon the granularity of estimates for the spatial context surrounding each cell. We used an unsupervised approach to choose an appropriate spatial resolution in a principled manner. In short, the resolution is matched to the approximate length scale over which expression levels remain correlated among cells. Both the training and application stages of our annotation algorithm use this automated approach (S3D and S4D Figs), thus averting any need for user input.
Manual assessment of annotation performance
We sought to validate the performance of the annotation algorithm by assessing its ability to accurately reproduce human-assigned labels. We manually labeled nuclei in each eye imaginal disc as homozygous mutant, heterozygous wildtype, or homozygous wildtype for the clonal marker, then automatically labeled the same cells (Fig 4A). The two sets of labels showed strong overall agreement (Fig 4B and S5A Fig). Excluding cells on the border of each clone revealed greater than 97% agreement in seven of the nine annotated images (see Table 1). Upon secondary inspection of the sole instance of substantial disagreement (S5B Fig), we are unable to confidently discern which set of labels are more accurate. While manual labeling required more than one hour of labor per image, the annotation algorithm achieved comparable accuracy in a matter of seconds. This performance advantage would continue to grow if the analysis were extended to multiple image layers, tissue samples, and experimental conditions.
Table 1. Automated vs. manual annotation.
Disc | Layer | Agreement* |
---|---|---|
1 | 1 | 93.1% (97.3%) |
1 | 2 | 95.3% (97.3%) |
2 | 1 | 91.3% (99.1%) |
2 | 2 | 95.2% (96.4%) |
3 | 1 | 67.2% (75.6%) |
4 | 1 | 82.5% (89.2%) |
5 | 1 | 96.2% (100%) |
6 | 1 | 99.1% (99.3%) |
6 | 2 | 95.2% (97.5%) |
* Values in parentheses denote agreementwhen clone borders are excluded.
While it is common practice to use human-labeled data as the gold standard, manually assigned labels do not represent a reliable and reproducible ground truth. Furthermore, we contend that validation with manually-labeled data entrains implicit human biases in the selection of performant algorithms. These biases are particularly pronounced in biological image data where intrinsic variation, measurement noise, and transient processes can make cell-type annotation a highly subjective, and thus irreproducible, task.
Synthetic benchmarking of annotation performance
Synthetic benchmarking provides a powerful alternative to validation against manually labeled data. The idea is simple; measure how accurately an algorithm is able to label synthetic data for which the labels are known. The synthetic data generation procedure may be modeled after the process underlying formation of the real data, providing a means to assess the performance of an algorithm across the range of conditions that it is likely to encounter. The strategy therefore provides a means to survey the breadth of biologically plausible conditions under which the algorithm provides adequate performance. Synthetic benchmarking also facilitates unbiased comparison of competing algorithms, resulting in a reliable standard that may be called upon at any time.
We used synthetic microscopy data to benchmark the performance of our annotation strategy. Each synthetic dataset depicts a simulated culture of cells distributed roughly uniformly in space (S6A Fig). Cells in this culture contain zero, one, or two copies of a gene encoding an RFP-tagged clonal marker (S6B Fig). Our simulation procedure ensures that cells tend to remain proximal to their clonal siblings (S6C Fig), thus forming synthetic clones with tunable size and spatial heterogeneity (S6D and S6E Fig). We generated synthetic measurements by randomly sampling fluorescence levels in a dosage-depend manner (S7A–S7C Fig). We varied the similarity of fluorescence levels across clones using an ambiguity parameter, σα, that modulates the spread of the distributions used to generate fluorescence levels (S7D–S7F Fig).
Using this schema as a template, we generated a large synthetic dataset, annotated each set of measurements, and compared the assigned labels with their true values. We used the mean absolute error as a comparison metric because it provides a stable measure of accuracy for multiclass classification problems in which the labels are intrinsically ordered [60]. In other words, it penalizes egregious misclassifications more severely than mild ones.
Annotation performance is very strong for all cases in which σα ≤ 0.3 (Fig 5). Unsurprisingly, performance suffers as the difficulty of the classification problem is increased. The same trends are evident when performance is graded strictly on accuracy (S8 Fig). As cells on the periphery of each clone were not excluded from these analyses, the observed metrics provide a lower bound on the performance that may be anticipated in practice.
Performance improved with increasing clone size. We suspected this was caused by larger clones offering additional spatial context to inform the identify of each cell. We verified our assertion by re-evaluating performance relative to a variant of our annotation algorithm that neglects spatial context (S4G Fig). As expected, the variant’s performance exhibited no dependence on clone size (S9A Fig). Comparing the two strategies confirmed that spatial context confers the most benefit when clones are large (S9B Fig). Inclusion of spatial context also becomes increasingly advantageous as the fluorescence ambiguity is increased, even for smaller clones. Thus, spatial context adds progressively more value as the classification task becomes more difficult.
This observation may be rationalized from a statistical perspective. Each cell is classified by maximizing the probability that the assigned label is correct. We compute these probabilities using the estimated expression level of each cell. Neglecting spatial context, this estimate is limited to a single sample and is therefore highly sensitive to both measurement and biological noise. Incorporating spatial context expands the sample size and thereby reduces the standard error of the estimated fluorescence level. The strategy is thus generally well suited to scenarios in which fluorescence intensities correlate across large clones, and closely parallels computer vision methods that exploit spatial contiguity to segment image features with ill-defined borders [61]. Because increased measurement precision comes at the expense of spatial resolution, we expect strong performance when measurements are aggregated across relatively large clones, but failure to detect small, heterogeneous clones. These expectations are consistent with the observed results. They are also conveniently aligned with the anticipated properties of real data, as experiments typically attempt to mitigate edge effects by driving early recombination events to generate large clones.
Discussion
We used synthetic data to survey the performance of our annotation strategy across a much broader range of conditions than would have otherwise been possible with manually labeled data. This included conditions well beyond those of practical use. In particular, experiments designed to compare gene expression levels across clones would likely seek to avoid generating small clones with ambiguous clonal marker expression. Beyond complicating the annotation task, small clones are also exposed to diffusion-mediated signals from adjacent clones that can mask the effect of mutations. Cells located near the clone boundaries are often excluded for the same reason, as quantification is typically most reliable in cells surrounded by similar neighbors. Synthetic data provided a means to survey these edge cases and establish a lower bound on annotation performance. The strong performance observed across the remaining conditions bolsters our confidence that our annotation strategy is well suited to the images it is likely to encounter.
In each of our examples, clones were distinguished by ternary segregation of nuclear clonal marker fluorescence levels. Modern mosaic analysis techniques continue to deploy ternary labeling [62, 63], but also frequently opt for binary labeling of mutant versus non-mutant clones [64–66] and dichromic labeling of twin-spots [67, 68]. Our annotation scheme readily adapts to each of these scenarios provided that the number of anticipated labels is adjusted accordingly. In the case of dichromic labeling, binary classification would be performed separately for each color channel before merging the assigned labels. Extending the same logic to combinatorial pairs of colors suggests that our framework may also be compatible with multicolor labeling schemes used to simultaneously trace many clonal lineages over time [69–71]. A notable limitation of our approach is its reliance upon reporter fluorescence levels within distinct cells or nuclei. This requirement for discrete measurements precludes analysis of contiguous clones in which cytosplasmic fluorescence signals are indistinguishable between adjacent cells. Our framework is thus well suited to many different mosaic analysis platforms deployed in imaginal discs, so long as reporter fluorescence levels are measured on a discrete basis.
In principle, the framework described here should also be applicable to a wide variety of other tissues [72, 73] and model organisms [74–76] in which mosaics are studied. In practice, application to alternate contexts would require modifying some stages of the analysis. Most notably, image segmentation is strongly context dependent and any attempts to develop a universally successful strategy are likely to prove futile [77]. For this reason, we implemented a modular design in which each stage of analysis may be applied separately. For example, a user could perform their own segmentation before using our bleedthrough correction and clone annotation tools. By offering modular functionalities we hope to extend the utility of our software to the wider community of developmental biologists. Furthermore, the open-source nature of our framework supports continued development of more advanced features as various demands arise. Our synthetic benchmarking platform could then be used to objectively confirm the benefit conferred by any future developments.
Materials and methods
Genetics and microscopy of Drosophila eye imaginal discs
We borrowed an experimental dataset from a separate study of neuronal fate commitment during eye disc development [38]. The data consist of six eye imaginal discs dissected and fixed during the third larval instar of Drosophila development. Within each disc, ey>FLP and FRT40A were used to generate clones. The chromosome arm (2L) targeted for recombination was marked with a Ubi-mRFPnls transgene (S1A Fig), enabling automated detection of clones marked by distinct levels of mRFP fluorescence (S1B Fig). The discs also carried a pnt-GFP reporter transgene located on a different chromosome that was not subject to mitotic recombination. The PntGFP reporter is predominantly expressed in two narrow stripes of progenitor cells during eye disc development [38]. The first stripe occurs immediately posterior to a wave of developmental signaling that traverses the eye disc. Progenitor cells located in this region are suitable for comparison because they are of approximately equivalent developmental age. We applied the Fly-QMA framework to a total of nine images of these cells.
Genetics, fly lines, immunohistochemistry, and imaging conditions related to this dataset have already been published [38]. All discs were dissected in PBS, fixed in 4% paraformaldehyde for 30 min at room temperature, and permeabilized with PBS-Triton X-100 0.1% for 20 min at room temperature to allow DAPI penetration without perturbing the fluorescence of the Pnt-GFP protein. Discs were subsequently stained with a 4’,6-diamidino-2-phenylindole (DAPI) nuclear marker, rinsed twice with PBS-Tween 0.5%, and mounted on Vecta Shield (Vector labs). Images were acquired using a Leica SP5 confocal equipped with a tunable detector. The 405, 488, and 561 nm lasers were used to excite DAPI, Pnt-GFP, and Ubi-mRFPnls, while photons were collected in the 437-481, 491–555, and 570-644 nm intervals for DAPI, GFP, and mRFP, respectively. Images were recorded with 16-bit resolution using a 40X oil objective. Discs were oriented with the dorso-ventral equator parallel to the horizontal axis, and all images captured at least six rows of ommatidia on either side of the equator. All discs were fixed, mounted, and imaged in parallel in order to reduce measurement error.
Characterization of fluorescence bleedthrough
For each image, we morphologically dilate the foreground until no features remain visible (S2A Fig). We then extract the background pixels and resample them such that the distribution of pixel intensities is approximately uniform (S2B Fig). Resampling helps mitigate the skewed distribution of pixel intensities found in the background. We then estimate values for each {α1, α2, …αK} and β by fitting a generalized linear model to the fluorescence intensities of the resampled pixels (S2C Fig). Each model is a variant of Eq 3 in which angled braces instead denote averages across all background pixels. We formulate these models with identity link functions under the assumption that residuals are gamma distributed. Their coefficients provide an estimate of the bleedthrough contribution strengths that may then be used to estimate the background fluorescence intensity of each nucleus in the corresponding image (S2D Fig). The measurements may then be corrected through application of Eq 4.
Clone annotation algorithm
We assume the measured fluorescence level xi for cell i is sampled from an underlying distribution pm(x) for cells carrying m copies of the gene encoding the clonal marker:
(5) |
We further assume that pm(x) is comprised of a mixture of one or more lognormal distributions:
(6) |
(7) |
where 0 ≤ λ ≤ 1 are the mixing proportions, are the mean and variance of the nth distribution. This assumption is supported by both empirical observations and theoretical insights [46, 47]. By superposition, the global distribution of measured fluorescence levels p(lnx) for all values of m are also sampled from a mixture of K components:
(8) |
(9) |
where αm denotes the overall fraction of cells with m copies of the gene encoding the clonal marker. For brevity, we substitute X = lnx yielding:
(10) |
Given a collection of sampled fluorescence levels, {Xi}i = 1…N, we use expectation maximization to find values of θk and λk for each of the model’s K components that maximize the log-likelihood of the observed sample. We repeat this procedure for a range of sequential values of K, resulting in multiple models of increasing size. We then balance model resolution against overfitting by selecting the model that yields the smallest value of the Bayesian Information Criterion (BIC):
(11) |
(12) |
where N is the sample size, is the maximum value of the log-likelihood, the subscript K denotes the number of mixture components in the model, and qK is the total number of parameters (i.e. K − 1 values of λk and 2K values of μk and ).
Applying Bayes’ rule to the selected model infers the posterior probabilities that each sample Xi belongs to the kth component:
(13) |
where p(Xi∣k) is evaluated using the model’s likelihood function and p(Xi) is evaluated by marginalizing across each of the model’s K components. The end result is a mixture model that allows us to predict the probability that a given measurement of clonal marker expression belongs to a particular one of its component distributions.
We then define a many-to-one mapping, f, from each of the K components of the mixture to each of the three possible values of m:
(14) |
We determine the mapping by k-means clustering the K component distributions into three groups on the basis of their mean values, . We may then assign a genotype label m to each measurement Xi by predicting the component k from which it was sampled.
The accuracy of these labels depends upon how closely the fitted mixture model reflects the true partitioning of gene copies among clones. While finite mixtures are always identifiable given a sufficiently large sample [48], the algorithm used to fit the mixture tends toward local maxima of the likelihood function when the true components are similar (Wu, 1983). An approach based on a univariate mixture is thus inherently prone to failure when expression levels extensively overlap across clones, as variation within each clone precludes accurate classification of a cell’s genotype solely on the basis of its individual expression level. However, clonal lineages are unlikely to exist in isolation because recombination events are usually timed to generate large clones. Our strategy therefore integrates both clonal marker expression and spatial context to identify clusters of cells with locally homogeneous expression behavior.
We incorporate spatial context by introducing a second jointly-distributed variable Yi:
(15) |
where the subscript j indexes all Mi neighbors of cell i. The new variable reflects the average expression level among the neighbors surrounding each cell. We define neighbors as pairs of cells located within a critical distance of each other. This distance, or sampling radius, is derived from the approximate length scale over which cells retain approximately similar clonal marker expression levels. Specifically, we determine the exponential decay constant of the spatial correlation function, ψ(δ):
(16) |
where μX and are the global mean and standard deviation, and angled brackets denote the mean across all pairs of cells separated by distance δ. We efficiently implement this procedure by fitting an exponential decay function to the down-sampled moving average of ψ(δ) as a function of increasing separation distance.
Following the introduction of spatial context, the mixture model becomes:
(17) |
where contains the mean and variance of each component given by vectors of length two. This formulation constrains each component’s covariance matrix to be diagonal. The posterior is now:
(18) |
We can recover the univariate model by marginalizing the posterior over all values of Y:
(19) |
When neglecting spatial context, we use this expression to classify each sample by applying the mapping f to the value of k that maximizes p(k∣Xi):
(20) |
In all other cases, we deploy a graph-based approach to refine the estimate of p(k∣Xi, Yi). This first entails constructing an undirected graph connecting adjacent cells within each image. We obtain the graph’s edges through Delaunay triangulation of the measured cell positions, then exclude distant neighbors by thresholding the edge lengths. Each edge is assigned a weight wij reflecting the similarity of clonal marker expression between adjacent cells i and j:
(21) |
(22) |
where Eij is the absolute log fold-change in measured expression level and angled brackets denote the mean across all edges. We chose an exponential formulation because it yields an approximately uniform distribution of edge weights. We then detect communities within the graph using the Infomap algorithm [49]. The algorithm provides a hierarchical partitioning of nodes into non-overlapping clusters. We aggregate all clusters below a critical level that is again chosen by estimating the spatial correlation decay constant. We then enumerate where is the spatial context obtained by averaging expression levels among all neighbors in the same community as cell i.
We further incorporate spatial context by allowing the posterior probabilities to diffuse among adjacent cells. We define the modified posterior probability through a recursive relation analogous to the Katz centrality [50], initialized by :
(23) |
(24) |
where α is the attenuation factor and wij are the edge weights. Expressed in matrix form, the solution for is given by:
(25) |
where I denotes the identity matrix and W is the matrix of edge weights wij. We then assign a label to each measurement Xi by applying f to the value of k that maximizes :
(26) |
Finally, we assess the total posterior probability of each assigned label, :
(27) |
This measure reflects the overall confidence that mi is the appropriate label. Labels whose confidence falls below 80% are replaced by their counterparts estimated using the marginal classifier. This substitution helps preserve classification accuracy in situations where spatial context is not informative, and is particularly useful when the annotated clones are relatively small.
Statistical comparison of fluorescence levels
To mitigate edge effects, cells residing on the periphery of each clone were excluded from all comparisons (S2E Fig). Border cells were identified by using a Delaunay triangulation to find all cells connected to a neighbor within a different clone. Our framework includes a simple graphical user interface that permits manual curation of which regions of the image field are included in subsequent analyses. We used this tool to limit our analysis to the region of elevated GFP expression near the morphogenetic furrow (S2F Fig). Comparisons were further restricted to cells undergoing similar stages of development (S2G Fig). These restrictions served to buffer against differences in developmental context and ensured that all compared cells were of similar developmental age. The remaining fluorescence measurements were then aggregated across all eye discs and compared between pairs of clones by two-sided Mann-Whitney U test.
Simulated cell growth and recombination
We simulated the two dimensional growth of a cell culture seeded with a single cell. Growth proceeds through sequential division of cells (S6A Fig). Not all cells divide at each time-step because cell division is a stochastic process. Instead, each cell divides stochastically with a rate controlled by a global growth rate parameter.
Cells in this culture carry a gene encoding a clonal marker (S6B Fig). During growth, the gene is subject to mitotic recombination (S6C Fig). Each time a cell divides, its genes are duplicated and equally partitioned between the two daughter cells. However, in some instances a heterozygous parent may instead partition its two duplicate genes unequally, with one daughter receiving both and the other receiving none. These mitotic recombination events occur stochastically with a frequency defined by a global recombination rate parameter.
After each round of cell division, all cells are repositioned in order to preserve approximately uniform spatial density (S6C Fig). Repositioning is achieved by equilibrating a network of springs connecting each cell with its neighbors. This undirected network is constructed through Delaunay triangulation of all cells spatial positions. Edges on the periphery of the culture are systematically excluded by establishing a maximum polar angle between neighbors. This filtration removes spurious edges between distant pairs of cells. Edges connecting pairs of cells with the same clonal marker dosage are assigned a 10% higher spring constant than edges that connect dissimilar cells. This modest bias ensures that cells tend to remain proximal to their clonal lineages. Cell positions are then updated using a force-directed graph drawing algorithm [51]. Alternating cell division and repositioning steps are then repeated until a predefined population size is reached.
The timing and duration of recombination events affects the number and size of the resultant clones. In real experiments, recombination events are restricted to a particular stage of the developmental program through localized exogenous expression of the recombination machinery. We incorporated this feature into our cell growth simulations via two adjustable parameters. The first determines the minimum population size at which recombination may begin, while the second determines the number of generations over which recombination may continue to occur. These two parameters provide a means to tune the average number and size of clonal subpopulations in the synthetic data (S6D Fig). Early recombination events generally entail larger clones, while shorter recombination periods limit the extent of clone formation (S6E Fig).
Generation of synthetic microscopy data
Each simulation yields a list of spatial coordinates and gene dosages for each nucleus (S6B Fig). Synthetic measurements for each nucleus were generated by randomly sampling fluorescence levels {x1, x2, …xi = N} from a lognormal distribution conditioned upon the corresponding gene dosage (S7A–S7C Fig):
(28) |
where the subscript n denotes the gene copy number and are the mean and variance of the corresponding distribution. We define μn such that the mean fluorescence level doubles for each additional copy of the gene:
(29) |
We refer to σα as the fluorescence ambiguity because it modulates the similarity of fluorescence levels across gene dosages. Increasing σα increases the overlap among , , and (S7D and S7E Fig), and consequently increases the difficulty of the annotation task (S7F Fig).
Synthetic benchmarking of annotation performance
We generated a large synthetic dataset spanning a broad range of sixteen different clone sizes and fluorescence ambiguities (S6D and S7F Figs, only half are shown). We performed 50 replicate simulations for each condition. All simulations were terminated when the total population exceeded 2048 cells. We assigned each cell a 20% probability of division upon each iteration, and each cell division event was accompanied by a 20% chance of mitotic recombination. Parent cells containing zero or two copies of the recombined genes were ineligible for recombination, effectively sealing the genetic fates of their respective lineages.
To annotate each set of measurements, the mixture model given by Eq 17 was independently trained and applied to each replicate. Training a single model on all replicates yields modestly stronger performance on average, but also yields more variable variable results across the parameter space because all labels are dependent upon the outcome of a single expectation maximization routine.
Data and software availability
We have distributed the automated mosaic analysis framework as an open-source python package available at https://sebastianbernasek.github.io/flyqma. The associated code repository contains resources designed to help users analyze their own microscope images. These include code documentation, a guide to getting started with Fly-QMA, and an interactive tutorial that uses example data to demonstrate the core features of the software. We also intend to incorporate Fly-QMA into future versions of FlyEye Silhouette, our open-source desktop application for quantitative analysis of the larval eye. The code used to generate synthetic microscopy data is also freely available at https://github.com/sebastianbernasek/growth. All segmented and annotated eye discs are accessible via our data repository (https://doi.org/10.21985/N2F207).
Supporting information
Data Availability
The data underlying the results presented in the study are available in a public data repository hosted by Northwestern University. DOI: https://doi.org/10.21985/N2F207.
Funding Statement
SMB and LANA were supported by the John and Leslie McQuown Gift. RWC was supported by NIH R35GM118144 (https://www.nih.gov). LANA, NB, and RWC were supported by NSF 1764421 (https://www.nsf.gov). LANA, NB, and RWC were supported by Simons Foundation 597491 (https://www.simonsfoundation.org). NP was supported by the HHMI Hanna H. Gray Fellowship (https://www.hhmi.org/programs/hanna-h-gray-fellows-program). In all cases, the funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
References
- 1.Oates AC, Gorfinkiel N, González-Gaitán M, Heisenberg CP. Quantitative approaches in developmental biology; 2009. Available from: http://www.nature.com/articles/nrg2548. [DOI] [PubMed]
- 2. Muzzey D, van Oudenaarden A. Quantitative time-lapse fluorescence microscopy in single cells. Annual Review of Cell and Developmental Biology. 2009;25(1):301–327. 10.1146/annurev.cellbio.042308.113408 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Stelzer EHK. Light-sheet fluorescence microscopy for quantitative biology. Nature Methods. 2014;12(1):23–26. 10.1038/nmeth.3219 [DOI] [PubMed] [Google Scholar]
- 4. Truong TV, Supatto W. Toward high-content/high-throughput imaging and analysis of embryonic morphogenesis. Genesis. 2011;49(7):555–569. 10.1002/dvg.20760 [DOI] [PubMed] [Google Scholar]
- 5. Aghaeepour N, Finak G, Hoos H, Mosmann TR, Brinkman R, Gottardo R, et al. Critical assessment of automated flow cytometry data analysis techniques. Nature Methods. 2013;10(3):228–238. 10.1038/nmeth.2365 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Chen X, Hasan M, Libri V, Urrutia A, Beitz B, Rouilly V, et al. Automated flow cytometric analysis across large numbers of samples and cell types. Clinical Immunology. 2015;157(2):249–260. 10.1016/j.clim.2014.12.009 [DOI] [PubMed] [Google Scholar]
- 7. Pyne S, Maier LM, Lin TI, Wang K, Rossin E, Hu X, et al. Automated high-dimensional flow cytometric data analysis. Proceedings of the National Academy of Sciences. 2009;106(21):8519–8524. 10.1073/pnas.0903028106 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Bernstein BE, Brown M, Johnson DS, Liu XS, Nussbaum C, Myers RM, et al. Model-based analysis of ChIP-Seq (MACS). Genome Biology. 2008;9(9):R137 10.1186/gb-2008-9-9-r137 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Hellemans J, Mortier G, De Paepe A, Speleman F, Vandesompele J. qBase relative quantification framework and software for management and automated analysis of real-time quantitative PCR data. Genome Biology. 2007;8(2). 10.1186/gb-2007-8-2-r19 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nature Methods. 2012;9(4):357–9. 10.1038/nmeth.1923 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Trapnell C, Pachter L, Salzberg SL. TopHat: Discovering splice junctions with RNA-Seq. Bioinformatics. 2009;25(9):1105–1111. 10.1093/bioinformatics/btp120 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Costes SV, Daelemans D, Cho EH, Dobbin Z, Pavlakis G, Lockett S. Automatic and quantitative measurement of protein-protein colocalization in live cells. Biophysical Journal. 2004;86(6):3993–4003. 10.1529/biophysj.103.038422 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Kelley LA, Mezulis S, Yates CM, Wass MN, Sternberg MJE. The Phyre2 web portal for protein modeling, prediction and analysis. Nature Protocols. 2015;10(6):845–858. 10.1038/nprot.2015.053 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Carpenter AE, Jones TR, Lamprecht MR, Clarke C, Kang IH, Friman O, et al. CellProfiler: Image analysis software for identifying and quantifying cell phenotypes. Genome Biology. 2006;7(10):R100 10.1186/gb-2006-7-10-r100 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Paintdakhi A, Parry B, Campos M, Irnov I, Elf J, Surovtsev I, et al. Oufti: An integrated software package for high-accuracy, high-throughput quantitative microscopy analysis. Molecular Microbiology. 2016;99(4):767–777. 10.1111/mmi.13264 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Schindelin J, Arganda-Carreras I, Frise E, Kaynig V, Longair M, Pietzsch T, et al. Fiji: An open-source platform for biological-image analysis; 2012. Available from: http://www.nature.com/articles/nmeth.2019. [DOI] [PMC free article] [PubMed]
- 17.Sommer C, Straehle C, Kothe U, Hamprecht FA. Ilastik: Interactive learning and segmentation toolkit. In: Proceedings—IEEE International Symposium on Biomedical Imaging. 2011. p. 230–233. Available from: http://ieeexplore.ieee.org/document/5872394/.
- 18. Jug F, Pietzsch T, Preibisch S, Tomancak P. Bioimage informatics in the context of Drosophila research. Methods. 2014;68(1):60–73. 10.1016/j.ymeth.2014.04.004 [DOI] [PubMed] [Google Scholar]
- 19. Sbalzarini IF. Seeing is believing: Quantifying is convincing: Computational image analysis in biology. Advances in Anatomy, Embryology, and Cell Biology. 2016;219:1–39. 10.1007/978-3-319-28549-8_1 [DOI] [PubMed] [Google Scholar]
- 20.Schindelin J, Rueden CT, Hiner MC, Eliceiri KW. The ImageJ ecosystem: An open platform for biomedical image analysis; 2015. Available from: http://doi.wiley.com/10.1002/mrd.22489. [DOI] [PMC free article] [PubMed]
- 21.Simpson IT, Price DJ. Pax6; a pleiotropic player in development; 2002. Available from: http://doi.wiley.com/10.1002/bies.10174. [DOI] [PubMed]
- 22. Parody TR, Muskavitch MAT. The pleiotropic function of Delta during postembryonic development of Drosophila melanogaster. Genetics. 1993;135(2):527–539. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Shilo BZ, Raz E. Developmental control by the Drosophila EGF receptor homolog DER; 1991. Available from: https://www.sciencedirect.com/science/article/pii/016895259190261N. [DOI] [PubMed]
- 24. Xu T, Rubin GM. Analysis of genetic mosaics in developing and adult Drosophila tissues. Development. 1993;117(4):1223–37. [DOI] [PubMed] [Google Scholar]
- 25. Xu T, Rubin GM. The effort to make mosaic analysis a household tool. Development. 2012;139(24):4501–4503. 10.1242/dev.085183 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Newsome TP, Asling B, Dickson BJ. Analysis of Drosophila photoreceptor axon guidance in eye-specific mosaics. Development. 2000;127(4):851–60. [DOI] [PubMed] [Google Scholar]
- 27. Theodosiou NA, Xu T. Use of FLP/FRT system to study Drosophila development. Methods. 1998;14(4):355–365. 10.1006/meth.1998.0591 [DOI] [PubMed] [Google Scholar]
- 28. Struhl G, Basler K. Organizing activity of wingless protein in Drosophila. Cell. 1993;. 10.1016/0092-8674(93)90072-X [DOI] [PubMed] [Google Scholar]
- 29. Halfar K, Rommel C, Stocker H, Hafen E. Ras controls growth, survival and differentiation in the Drosophila eye by different thresholds of MAP kinase activity. Development. 2001;128(9):1687–96. [DOI] [PubMed] [Google Scholar]
- 30. Tomlinson A, Struhl G. Delta/Notch and Boss/Sevenless signals act combinatorially to specify the Drosophila R7 photoreceptor. Molecular Cell. 2001;7(3):487–95. 10.1016/s1097-2765(01)00196-4 [DOI] [PubMed] [Google Scholar]
- 31. Yang L, Baker NE. Role of the EGFR/Ras/Raf pathway in specification of photoreceptor cells in the Drosophila retina. Development. 2001;128(7):1183–91. [DOI] [PubMed] [Google Scholar]
- 32. Huang J, Wu S, Barrera J, Matthews K, Pan D. The Hippo signaling pathway coordinately regulates cell proliferation and apoptosis by inactivating Yorkie, the Drosophila homolog of YAP. Cell. 2005;122(3):421–434. 10.1016/j.cell.2005.06.007 [DOI] [PubMed] [Google Scholar]
- 33. Thompson BJ, Cohen SM. The Hippo pathway regulates the bantam microRNA to control cell proliferation and apoptosis in Drosophila. Cell. 2006;126(4):767–774. 10.1016/j.cell.2006.07.013 [DOI] [PubMed] [Google Scholar]
- 34. Atkins M. Drosophila genetics: The power of genetic mosaic approaches In: Methods Mol. Biol. vol. 1893 Humana Press, New York, NY; 2019. p. 27–42. Available from: http://link.springer.com/10.1007/978-1-4939-8910-2_2. [DOI] [PubMed] [Google Scholar]
- 35. Enomoto M, Siow C, Igaki T. Drosophila as a cancer model In: Advances in Experimental Medicine and Biology. vol. 1076 Springer, Singapore; 2018. p. 173–194. Available from: http://link.springer.com/10.1007/978-981-13-0529-0_10. [DOI] [PubMed] [Google Scholar]
- 36. Germani F, Bergantinos C, Johnston LA. Mosaic analysis in Drosophila. Genetics. 2018;208(2):473–490. 10.1534/genetics.117.300256 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37. Dai W, Peterson A, Kenney T, Burrous H, Montell DJ. Quantitative microscopy of the Drosophila ovary shows multiple niche signals specify progenitor cell fate. Nature Communications. 2017;8(1):1244 10.1038/s41467-017-01322-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Bernasek SM, Lachance JFB, Peláez N, Bakker R, Navarro HT, Amaral LAN, et al. Ratio-based sensing of two transcription factors regulates the transit to differentiation. bioRxiv. 2018; p. 430744.
- 39. Ghiglione C, Jouandin P, Cérézo D, Noselli S. The Drosophila insulin pathway controls Profilin expression and dynamic actin-rich protrusions during collective cell migration. Development. 2018;145(14):dev161117 10.1242/dev.161117 [DOI] [PubMed] [Google Scholar]
- 40. Li K, Baker NE. Regulation of the Drosophila ID protein Extra macrochaetae by proneural dimerization partners. Elife. 2018;7 10.7554/eLife.33967 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41. Bacia K, Petrášek Z, Schwille P. Correcting for spectral cross-talk in dual-color fluorescence cross-correlation spectroscopy. ChemPhysChem. 2012;13(5):1221–1231. 10.1002/cphc.201100801 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42. Elangovan M, Wallrabe H, Chen Y, Day RN, Barroso M, Periasamy A. Characterization of one- and two-photon excitation fluorescence resonance energy transfer microscopy. Methods. 2003;29(1):58–73. 10.1016/s1046-2023(02)00283-9 [DOI] [PubMed] [Google Scholar]
- 43. Mort RL. Quantitative analysis of patch patterns in mosaic tissues with ClonalTools software. Journal of Anatomy. 2009;215(6):698–704. 10.1111/j.1469-7580.2009.01150.x [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44. Helmuth JA, Paul G, Sbalzarini IF. Beyond co-localization: Inferring spatial interactions between sub-cellular structures from microscopy images. BMC Bioinformatics. 2010;11(1):372 10.1186/1471-2105-11-372 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45. Shivanandan A, Radenovic A, Sbalzarini IF. MosaicIA: An ImageJ/Fiji plugin for spatial pattern and interaction analysis. BMC Bioinformatics. 2013;14(1):349 10.1186/1471-2105-14-349 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46. Furusawa C, Suzuki T, Kashiwagi A, Yomo T, Kaneko K. Ubiquity of log-normal distributions in intra-cellular reaction dynamics. Biophysics. 2005;1:25–31. 10.2142/biophysics.1.25 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47. Beal J. Biochemical complexity drives log-normal variation in genetic expression. Engineering Biology. 2017;1(1):55–60. 10.1049/enb.2017.0004 [DOI] [Google Scholar]
- 48. Teicher H. Identifiability of finite mixtures. The Annals of Mathematical Statistics. 1963;34(4):1265–1269. 10.1214/aoms/1177703862 [DOI] [Google Scholar]
- 49. Rosvall M, Axelsson D, Bergstrom CT. The map equation. European Physical Journal. 2009;. [Google Scholar]
- 50. Katz L. A new status index derived from sociometric analysis. Psychometrika. 1953;. 10.1007/BF02289026 [DOI] [Google Scholar]
- 51. Kamada T, Kawai S. An algorithm for drawing general undirected graphs. Information Processing Letters. 1989;31(1):7–15. 10.1016/0020-0190(89)90102-6 [DOI] [Google Scholar]
- 52. van der Walt S, Schönberger JL, Nunez-Iglesias J, Boulogne F, Warner JD, Yager N, et al. scikit-image: image processing in Python. PeerJ. 2014;. 10.7717/peerj.453 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Nobuyuki Otsu. A threshold selection method from gray-level histograms. IEEE Transactions on Systems, Man, and Cybernetics. 1979;.
- 54. Bugarski M, Mansouri M, Niemann A, Rizk A, Berger P, Ziegler U, et al. Segmentation and quantification of subcellular structures in fluorescence microscopy images using Squassh. Nature Protocols. 2014;9(3):586–596. 10.1038/nprot.2014.037 [DOI] [PubMed] [Google Scholar]
- 55. Peláez N, Gavalda-Miralles A, Wang B, Navarro HT, Gudjonson H, Rebay I, et al. Dynamics and heterogeneity of a fate determinant during transition towards cell differentiation. Elife. 2015;4 10.7554/eLife.08924 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56. Zinchuk V, Zinchuk O, Okada T. Quantitative colocalization analysis of multicolor confocal immunofluorescence microscopy images: Pushing pixels to explore biological phenomena. Acta Histochemica et Cytochemica. 2007;40(4):101–111. 10.1267/ahc.07002 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57. Arsenovic PT, Mayer CR, Conway DE. SensorFRET: A standardless approach to measuring pixel-based spectral bleed-through and FRET efficiency using spectral imaging. Scientific Reports. 2017;7(1). 10.1038/s41598-017-15411-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58. Kim D, Curthoys NM, Parent MT, Hess ST. Bleed-through correction for rendering and correlation analysis in multi-colour localization microscopy. Journal of Optics. 2013;15(9). 10.1088/2040-8978/15/9/094011 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59. McMullen PD, Morimoto RI, Amaral LAN. Physically grounded approach for estimating gene expression from microarray data. Proceedings of the National Academy of Sciences. 2010;107(31):13690–13695. 10.1073/pnas.1000938107 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60. Gaudette L, Japkowicz N. Evaluation methods for ordinal classification In: Lecture Notes in Computer Science. vol. 5549 LNAI. Springer, Berlin, Heidelberg; 2009. p. 207–210. Available from: http://link.springer.com/10.1007/978-3-642-01818-3_25. [Google Scholar]
- 61. Nguyen TM, Wu QMJ. Gaussian mixture-model-based spatial neighborhood relationships for pixel labeling problems. IEEE Transactions on Systems, Man, and Cybernetics. 2012;42(1):193–202. 10.1109/TSMCB.2011.2161284 [DOI] [PubMed] [Google Scholar]
- 62. Gambis A, Dourlen P, Steller H, Mollereau B. Two-color in vivo imaging of photoreceptor apoptosis and development in Drosophila. Developmental Biology. 2011;351(1):128–134. 10.1016/j.ydbio.2010.12.040 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63. Dourlen P, Levet C, Mejat A, Gambis A, Mollereau B. The Tomato/GFP-FLP/FRT method for live imaging of mosaic adult Drosophila photoreceptor cells. Journal of Visualized Experiments. 2013;79. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64. Fisher YE, Yang HH, Isaacman-Beck J, Xie M, Gohl DM, Clandinin TR. FlpStop, a tool for conditional gene control in Drosophila. Elife. 2017;6 10.7554/eLife.22279 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65. Wu JS, Luo L. A protocol for mosaic analysis with a repressible cell marker (MARCM) in Drosophila. Nature Protocols. 2007;1(6):2583–2589. 10.1038/nprot.2006.320 [DOI] [PubMed] [Google Scholar]
- 66. Zhou Q, Neal SJ, Pignoni F. Mutant analysis by rescue gene excision: New tools for mosaic studies in Drosophila. Genesis. 2016;54(11):589–592. 10.1002/dvg.22984 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67. Heffern E, Perrimon N, Hohl AM, del Valle Rodriguez A, Bakal C, Bonvin M, et al. The twin spot generator for differential Drosophila lineage analysis. Nat Methods. 2009;6(8):600–602. 10.1038/nmeth.1349 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68. Yu HH, Kao CF, He Y, Ding P, Kao JC, Lee T. A complete developmental sequence of a Drosophila neuronal lineage as revealed by twin-spot MARCM. PLoS Biology. 2010;8(8):39–40. 10.1371/journal.pbio.1000461 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69. Denes AS, Caussinus E, Affolter M, Kanca O, Percival-Smith A. Raeppli: a whole-tissue labeling tool for live imaging of Drosophila development. Development. 2013;141(2):472–480. 10.1242/dev.102913 [DOI] [PubMed] [Google Scholar]
- 70. Hadjieconomou D, Rotkopf S, Alexandre C, Bell DM, Dickson BJ, Salecker I. Flybow: Genetic multicolor cell labeling for neural circuit analysis in Drosophila melanogaster. Nature Methods. 2011;8(3):260–266. 10.1038/nmeth.1567 [DOI] [PubMed] [Google Scholar]
- 71. Hampel S, Chung P, McKellar CE, Hall D, Looger LL, Simpson JH. Drosophila Brainbow: a recombinase-based fluorescence labeling technique to subdivide neural expression patterns. Nature Methods. 2011;8(3):253–259. 10.1038/nmeth.1566 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72. Neufeld TP, De La Cruz AFA, Johnston LA, Edgar BA. Coordination of growth and cell division in the Drosophila wing. Cell. 1998;93(7):1183–1193. 10.1016/s0092-8674(00)81462-2 [DOI] [PubMed] [Google Scholar]
- 73. Tworoger M, Larkin MK, Bryant Z, Ruohola-Baker H. Mosaic analysis in the Drosophila ovary reveals a common Hedgehog- inducible precursor stage for stalk and polar cells. Genetics. 1999;. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74. Collins RT, Linker C, Lewis J. MAZe: A tool for mosaic analysis of gene function in zebrafish. Nature Methods. 2010;7(3):219–223. 10.1038/nmeth.1423 [DOI] [PubMed] [Google Scholar]
- 75. Muñoz-Jiménez C, Ayuso C, Dobrzynska A, Torres-Mendéz A, Ruiz PdlC, Askjaer P. An efficient FLP-based toolkit for spatiotemporal control of gene expression in Caenorhabditis elegans. Genetics. 2017;206(4):1763–1778. 10.1534/genetics.117.201012 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76. Wang W, Warren M, Bradley A. Induced mitotic recombination of p53 in vivo. Proceedings of the National Academy of Sciences. 2007;104(11):4501–4505. 10.1073/pnas.0607953104 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77. Meijering E. Cell segmentation: 50 years down the road. IEEE Signal Processing Magazine. 2012;29(5):140–145. 10.1109/MSP.2012.2204190 [DOI] [Google Scholar]