Skip to main content
PLOS Computational Biology logoLink to PLOS Computational Biology
. 2020 Mar 3;16(3):e1007406. doi: 10.1371/journal.pcbi.1007406

Fly-QMA: Automated analysis of mosaic imaginal discs in Drosophila

Sebastian M Bernasek 1,2, Nicolás Peláez 3,¤, Richard W Carthew 2,3,4, Neda Bagheri 1,2,5,6,7,*, Luís A N Amaral 1,2,7,8,*
Editor: Pedro Mendes9
PMCID: PMC7100978  PMID: 32126077

Abstract

Mosaic analysis provides a means to probe developmental processes in situ by generating loss-of-function mutants within otherwise wildtype tissues. Combining these techniques with quantitative microscopy enables researchers to rigorously compare RNA or protein expression across the resultant clones. However, visual inspection of mosaic tissues remains common in the literature because quantification demands considerable labor and computational expertise. Practitioners must segment cell membranes or cell nuclei from a tissue and annotate the clones before their data are suitable for analysis. Here, we introduce Fly-QMA, a computational framework that automates each of these tasks for confocal microscopy images of Drosophila imaginal discs. The framework includes an unsupervised annotation algorithm that incorporates spatial context to inform the genetic identity of each cell. We use a combination of real and synthetic validation data to survey the performance of the annotation algorithm across a broad range of conditions. By contributing our framework to the open-source software ecosystem, we aim to contribute to the current move toward automated quantitative analysis among developmental biologists.

Author summary

Biologists use mosaic tissues to compare the behavior of genetically distinct cells within an otherwise equivalent context. The ensuing analysis is often limited to qualitative insight. However, it is becoming clear that quantitative models are needed to unravel the complexities of many biological systems. In this manuscript we introduce a computational framework that automates the quantification of mosaic analysis for Drosophila imaginal discs, a common setting for studies of developmental processes. The software extracts quantitative measurements from confocal images of mosaic tissues, rectifies any cross-talk between fluorescent reporters, and identifies clonally-related subpopulations of cells. Together, these functions allow users to rigorously ascribe changes in gene expression to the presence or absence of particular genes. We validate the performance of our framework using both real and synthetic data. We invite interested readers to apply these methods using our freely available software.

Introduction

Quantification will be essential as biologists study increasingly complex facets of organismal development [1]. Unfortunately, qualitative analysis remains common because it is often difficult to measure cellular processes in their native context. Modern fluorescent probes and microscopy techniques make such measurements possible [24], but the ensuing image analysis demands specialized skills that fall beyond the expertise of most experimentalists. Automated analysis strategies have addressed similar challenges in cytometry [57], genomics and transcriptomics [811], and other subdisciplines of biology [12, 13]. Image analysis has proven particularly amenable to automation, with several computer vision tools having gained traction among biologists [1417]. These platforms are popular because they increase productivity, improve the consistency and sensitivity of measurements, and obviate the need for specialized computational proficiency [1820]. Designing similar tools to help biologists probe and measure developmental processes in vivo will further transform studies of embryogenesis and development into quantitative endeavors.

Developmental biologists study how the expression and function of individual genes coordinate the emergence of adult phenotypes. They often ask how cells respond when a specific gene, RNA, or protein is perturbed during a particular stage of development. Cell response may be characterized by changes in morphology, or by changes in the expression of other genes (Fig 1A). Experimental efforts to answer this question were historically stifled by the difficulty of isolating perturbations to a single developmental context, as the most interesting perturbation targets often confer pleiotropic function across several stages of development and can trigger early embryonic lethality [2123].

Fig 1. Perturbing gene expression via mitotic recombination.

Fig 1

Experimental framework using mitotic clones to test whether or not regulatory interactions occur between a perturbation target and reporter of interest. Blue and green markers represent the respective genes encoding the perturbation target and the reporter. (A) A perturbation-induced decrease in reporter levels would confirm that regulation occurs. (B) Mitotic recombination generates clonal subpopulations carrying zero, one, or two copies of the gene encoding a perturbation target. Black lines depict a genetic locus. Only genes downstream of the recombination site are subject to recombination. Red markers represent a gene encoding a clonal marker used to identify the resultant clones. Red shading of large oval reflects relative clonal marker fluorescence level.

Mosaic analysis addressed this challenge in Drosophila by limiting perturbations to a subset of cells within the imaginal discs of the larva [24, 25]. The technique yields a heterogeneous tissue comprised of genetically distinct patches of cells that are clonally related. Aside from rare de novo mutations, cells within each clone are genetically identical. Clone formation may be restricted to specific developing organs by using disc-specific gene promoters to drive trans-chromosomal recombination events in the corresponding imaginal discs [26, 27]. The timing of these events determines the number and size of the resultant clones [28]. Perturbations are applied by engineering the dosage of a target gene to differ across clones (Fig 1B), resulting in clones whose cells are either homozygous mutant (−/−), heterozygous wildtype (+/−), or homozygous wildtype (+/+) for the particular gene. Labeling these clones with the presence or absence of fluorescent markers enables direct comparison of cells subject to control or perturbation conditions, while maintaining otherwise equivalent developmental and physiological histories between the two cell populations (Fig 2A). Additional reporters may be used to monitor differences in RNA or protein expression, morphology, or cell fate choice across clones (Fig 2B). Variants of this strategy led to seminal discoveries in both neural patterning [2931] and morphogenesis [32, 33], and remain popular today [3436].

Fig 2. Conventional versus quantitative mosaic analysis.

Fig 2

(A,B) Conventional analysis of a mosaic eye imaginal disc. (A) Clones are identified by visual comparison of clonal marker fluorescence among nuclei. (B) Regions labeled homozygous mutant (−/−) or homozygous wildtype (+/+) for the clonal marker are compared with those labeled heterozygous wildtype (+/−) to assess whether reporter expression differs across clones. Fluorescence bleed-through is arbitrarily diagnosed. (C-H) Quantitative mosaic analysis. Panels depict a magnified view of the region enclosed by red rectangles in panels A and B. (C) Raw confocal image of the nuclear stain, clonal marker, and reporter of interest. (D) Segmentation identifies distinct nuclei. (E) Reporter expression is quantified by averaging the pixel intensities within each segment. Numbers reflect measured values. (F) Measurements may be corrected to mitigate fluorescence bleedthrough. (G) Individual nuclei are labeled homozygous mutant, heterozygous, or homozygous wildtype for the clonal marker. White arrows mark nuclei with ambiguous fluorescence levels. (H) Reporter levels are compared across clones to determine whether the perturbation affects reporter expression. Yellow region marks excluded clone borders. Comparison may exclude clone borders (yellow regions) and focus on a particular region of the image field (black arrows). In the eye imaginal disc, comparison is often limited to a narrow window near the MF (orange arrow).

Quantitative microscopy techniques are well suited to measuring differences in cell behavior across clones. One reporter (a clonal marker) labels the clones, while others quantitatively report properties of their constituent cells, such as the expression level of a gene product of interest (Fig 2C). The former then defines the stratification under which the latter are compared. We call this strategy Quantitative Mosaic Analysis (QMA) because it replaces subjective visual comparison with a rigorous statistical alternative. Although a few recent studies have deployed this approach [3740], qualitative visual comparison remains pervasive in the literature.

We suspect the adoption of QMA has been hindered by demand for specialized computational skills or, in their stead, extensive manual labor. Researchers must first draw or detect boundaries around individual nuclei in a procedure known as segmentation (Fig 2D). Averaging the pixel intensities within each boundary then yields a fluorescence intensity measurement for each reporter in each identified nucleus (Fig 2E). The measurements should then be corrected to account for any fluorescence bleedthrough between reporter channels (Fig 2F). Correction often requires single-reporter calibration experiments to quantify any potential crosstalk between different fluorophores, followed by complex calculations to remedy the data [41, 42]. Researchers must then label, or annotate, each identified nucleus as mutant, heterozygous, or homozygous for the clonal marker. Annotation is typically achieved through visual inspection (Fig 2G). Cells carrying zero, one, or two copies of the clonal marker should exhibit low, medium, or high average levels of fluorescence, respectively. However, both measurement and biological noise introduce the possibility that some cells’ measured fluorescence levels may not reliably reflect their genetic identity. Annotation must therefore also consider the spatial context surrounding each nucleus. For instance, a nucleus whose neighbors express high levels of the clonal marker is likely to be homozygous for the clonal marker, even if its individual fluorescence level is comparable to that of heterozygous cells (Fig 2G, white arrows). Spatial context is particularly informative in developing tissues where cell migration is minimal, such as the fly imaginal discs. With many biological replicates containing thousands of cells each, annotation can quickly become insurmountably tedious. The corrected and labeled measurements are then curated for statistical comparison by excluding those on the border of each clone, and limiting their scope to particular regions of the image field (Fig 2H). Combined, all of these tasks ultimately burden researchers and raise the barrier for adoption of QMA.

Automation promises to alleviate this bottleneck, yet the literature bears surprisingly few computational resources designed to support QMA. The ClonalTools plugin for ImageJ deploys an image-based approach to measure macroscopic features of clone morphology, but is limited to binary classification of mutant versus non-mutant tissue and offers no functionality for comparing reporter expression across clones [43]. Alternatively, the MosaicSuite plugin for ImageJ deploys an array of image processing, segmentation, and analysis capabilities to automatically detect spatial interactions between objects found in separate fluorescence channels [44, 45]. While useful in many other settings, neither of these tools support automated labeling of individual cells or explicit comparison of clones with single-cell resolution. Most modern studies employing a quantitative mosaic analysis instead report using some form of ad hoc semi-automated pipeline built upon ImageJ [37, 39, 40]. We are therefore unaware of any platforms that offer comprehensive support for an automated QMA workflow.

Here, we introduce Fly-QMA, a computational framework for automated QMA of Drosophila imaginal discs. Fly-QMA supports segmentation, bleedthrough correction, and annotation of confocal microscopy data (Fig 2D–2H). We demonstrate each of these functions by applying them to real confocal images of clones in the eye imaginal disc, and find that our automated approach yields results consistent with manual analysis by a human expert. We then generate and use synthetic data to survey the performance of our framework across a broad range of biologically plausible conditions. Fly-QMA is freely available online (see Data and software availability), along with an interactive coding tutorial designed to acquaint users with the core software features by applying them to example data.

Results

Quantification of nuclear fluorescence levels

We implemented a segmentation strategy based upon a standard watershed approach [52]. Briefly, we construct a foreground mask by Otsu thresholding the nuclear stain or nuclear label image following a series of smoothing and contrast-limited adaptive histogram equalization operations [52, 53]. We then apply a Euclidean distance transform to the foreground mask, identify the local maxima, and use them as seeds for watershed segmentation. When applied to the microscopy data, few visible spots in the nuclear stain were neglected, and the vast majority of segments outlined individual nuclei (S1C Fig).

This approach is flexible and should perform adequately in many scenarios. However, we acknowledge that no individual strategy can address all microscopy data because segmentation is strongly context dependent. All subsequent stages of analysis were therefore designed to be compatible with any data that conform to our standardized file structure. This modular arrangement grants users the freedom to use one of the many other available segmentation platforms [54], including FlyEye Silhouette [55], before applying the remaining functionalities of our framework. Regardless of how nuclear contours are identified, averaging the pixel intensities within them yields fluorescence intensity measurements for each reporter in each identified nucleus. We next sought to ensure that these measurements were suitable for comparison across clones.

Bleedthrough correction

Despite efforts to select non-overlapping reporter bandwidths and excite them sequentially, it is not uncommon for reporters excited at one wavelength to emit some fluorescence in the spectrum collected for another channel (Fig 2B, yellow lines) [41, 56]. The end result is a positive correlation, or crosstalk, between the measured fluorescence intensities of two or more reporters. Exogenous correlations between the measured fluorescence intensities of the clonal marker and the reporter of interest are problematic given that the purpose of the experiment is to detect changes in reporter levels with respect to the clonal marker.

In our microscopy data, individual clones were distinguished by their low, medium, or high expression levels of an RFP-tagged clonal marker (Fig 3A). These images should not have shown any detectable difference in GFP levels across clones because all cells carried an equivalent dosage of the control reporter (S1A Fig). However, the images visibly suffered from bleedthrough between the RFP and GFP channels (Fig 3A and 3B). Bleedthrough was similarly evident when we compared measured GFP levels across labeled clones. Nuclei labeled mutant, heterozygous, or homozygous for the clonal marker had low, medium, and high expression levels of the control reporter, respectively (Fig 3C, black boxes). The data were therefore ripe for systematic correction.

Fig 3. Automated correction of fluorescence bleedthrough in the larval eye.

Fig 3

(A) Low, medium, and high expression levels of the RFP-tagged clonal marker. (B) GFP-tagged control reporter expression. RFP fluorescence bleedthrough is visually apparent upon comparison with A. (C) Comparison of control reporter expression between clones. Includes data aggregated across nine images taken from six separate eye discs. Data were limited to cells within the region of elevated GFP expression that were of approximately comparable developmental age (see S2E–S2G Fig). Measurements are stratified by their assigned labels. Before correction, expression differs between clones (black boxes, p < 10−5). No difference is detected after correction (red boxes, p > 0.05).

Spectral bleedthrough correction is common practice in other forms of cross-correlation and co-localization microscopy [41, 56]. These methods typically entail characterizing the extent of crosstalk between fluorophores globally [57, 58], on a pixel-by-pixel basis [42], or by experimental calibration [41], then detrending all images or measurements prior to subsequent analysis. Our framework adopts the global approach, using the background pixels in each image to infer the extent of fluorescence bleedthrough across spectral channels.

Specifically, we assume the fluorescence intensity Fij for channel i at pixel j is a superposition of a background intensity Bij and some function of the expression level Eij that we seek to compare across cells [59]:

Fij=Bij+f(Eij) (1)

We further assume that the background intensity of a channel includes linear contributions from the fluorescence intensity of each of the other channels:

Bij=kiαkFkj+β (2)

where k is indexed over K anticipated sources of bleedthrough. Given estimates for each {α1, α2, …αK} and β we can then estimate the background intensity of each measurement:

Bij=kiαkFkj+β (3)

where the braces denote the average across all pixels within a single nucleus. The corrected signal value is obtained by subtracting the background intensity from the measured fluorescence level:

f(Eij)=Fij-Bij (4)

Repeating this procedure for each nucleus facilitates comparison of relative expression levels across nuclei in the absence of bleedthrough effects. Bleedthrough correction performance is therefore strongly dependent upon accurate estimation of the bleedthrough contribution strengths, {α1, α2, …αK}.

We estimate these parameters by characterizing their impact on background pixels (see Methods). When applied to the microscopy data, bleedthrough correction successfully eliminated any detectable difference in GFP expression across clones (Fig 3C, red boxes, p > 0.05 two-sided Mann-Whitney U test).

Automated annotation of clones

Our annotation strategy seeks to label each identified cell as homozygous mutant, heterozygous wildtype, or homozygous wildtype for the clonal marker. Variation within each clone precludes accurate classification of a cell’s genotype solely on the basis of its individual expression level. However, in tissues where cell migration is minimal, clonal lineages are unlikely to exist in isolation because recombination events are typically timed to generate large clones. Our strategy therefore integrates both clonal marker expression and spatial context to identify clusters of cells with locally homogeneous expression behavior, then maps each cluster to one of the possible labels. This unsupervised approach lends itself to automated annotation because the clusters are inferred directly from the data without any guidance from the user.

We first train a statistical model to estimate the probability that a given measurement came from a cell carrying zero, one, or two copies of the clonal marker (S3A Fig). This entails fitting a weighted mixture of three or more bivariate lognormal distributions (components) to a two dimensional set of observations (S3B and S3C Fig). The first dimension corresponds to the clonal marker fluorescence level measured within each cell. The second dimension describes the local average expression level within the region surrounding each cell. We evaluate the latter by estimating a neighborhood radius from the decay of the radial correlation of the expression levels, then averaging the expression levels of all cells within that radius (S3D Fig). The second dimension therefore measures the spatial context in which a cell resides. We balance model fidelity against overfitting by using the Bayesian information criterion to determine the optimal number of model components (S3E Fig). We then cluster the components into three groups on the basis of their mean values (S3F Fig), effectively mapping each component to one of the three possible gene dosages. The model may be trained using observations derived from a single image, or with a collection of observations derived from multiple images. Once trained, the model is able to predict the conditional probability that an individual observation belongs to one of the model’s components, given its measured expression level.

We then use the learned conditional probabilities to detect entire clones, thus assigning a label to each cell. Rather than using the trained model to classify each observation, we compile a new set of observations by limiting each estimate of spatial context to spatially collocated communities with similar expression behavior (S4A Fig). We identify these communities by applying a community detection algorithm to an undirected graph connecting adjacent cells (S4B Fig). Edges in this graph are weighted by the similarity of clonal marker expression between neighbors, resulting in communities with similar expression levels (S4E Fig, Steps I and II). The graph-based approach increases spatial resolution by limiting the information shared by dissimilar neighbors. Applying the mixture model yields an initial estimate of the probability that an observation belongs to one of the model’s components (S4E Fig, Step III). We further refine these estimates by allowing the probabilities estimated for each cell to diffuse throughout the graph (S4E Fig, Step IV). The rate of diffusion between neighbors is determined by the weight of the edge that connects them, with more similar neighbors exerting stronger influence on each other. We then use the diffused probabilities to identify the most probable source component and label each observation (S4E Fig, Step V). These probabilities also provide a measure of confidence in the assigned labels. We replace any low-confidence labels with alternate labels assigned using a marginal classifier that neglects spatial context (S4F and S4G Fig), resulting in a fully labeled image (S4H Fig).

The algorithm leverages the collective wisdom of neighboring measurements to override spatially isolated fluctuations in clonal marker expression, and thereby enforces consistent annotation within contiguous regions of the image field. The size of these regions depends upon the granularity of estimates for the spatial context surrounding each cell. We used an unsupervised approach to choose an appropriate spatial resolution in a principled manner. In short, the resolution is matched to the approximate length scale over which expression levels remain correlated among cells. Both the training and application stages of our annotation algorithm use this automated approach (S3D and S4D Figs), thus averting any need for user input.

Manual assessment of annotation performance

We sought to validate the performance of the annotation algorithm by assessing its ability to accurately reproduce human-assigned labels. We manually labeled nuclei in each eye imaginal disc as homozygous mutant, heterozygous wildtype, or homozygous wildtype for the clonal marker, then automatically labeled the same cells (Fig 4A). The two sets of labels showed strong overall agreement (Fig 4B and S5A Fig). Excluding cells on the border of each clone revealed greater than 97% agreement in seven of the nine annotated images (see Table 1). Upon secondary inspection of the sole instance of substantial disagreement (S5B Fig), we are unable to confidently discern which set of labels are more accurate. While manual labeling required more than one hour of labor per image, the annotation algorithm achieved comparable accuracy in a matter of seconds. This performance advantage would continue to grow if the analysis were extended to multiple image layers, tissue samples, and experimental conditions.

Fig 4. Automated unsupervised annotation of clones in the larval eye.

Fig 4

(A) Labels assigned by automated annotation. Yellow, cyan, and magenta denote the label assigned to each contour. Labels are overlayed on the RFP channel of the image shown in S1B Fig. Cells on the periphery of each clone are excluded. (B) Comparison of automated annotation with manually-assigned labels. Confusion matrix includes data aggregated across nine images taken from six separate eye discs. Cells on the periphery of each clone are excluded. Columns sum to one.

Table 1. Automated vs. manual annotation.

Disc Layer Agreement*
1 1 93.1% (97.3%)
1 2 95.3% (97.3%)
2 1 91.3% (99.1%)
2 2 95.2% (96.4%)
3 1 67.2% (75.6%)
4 1 82.5% (89.2%)
5 1 96.2% (100%)
6 1 99.1% (99.3%)
6 2 95.2% (97.5%)

* Values in parentheses denote agreementwhen clone borders are excluded.

While it is common practice to use human-labeled data as the gold standard, manually assigned labels do not represent a reliable and reproducible ground truth. Furthermore, we contend that validation with manually-labeled data entrains implicit human biases in the selection of performant algorithms. These biases are particularly pronounced in biological image data where intrinsic variation, measurement noise, and transient processes can make cell-type annotation a highly subjective, and thus irreproducible, task.

Synthetic benchmarking of annotation performance

Synthetic benchmarking provides a powerful alternative to validation against manually labeled data. The idea is simple; measure how accurately an algorithm is able to label synthetic data for which the labels are known. The synthetic data generation procedure may be modeled after the process underlying formation of the real data, providing a means to assess the performance of an algorithm across the range of conditions that it is likely to encounter. The strategy therefore provides a means to survey the breadth of biologically plausible conditions under which the algorithm provides adequate performance. Synthetic benchmarking also facilitates unbiased comparison of competing algorithms, resulting in a reliable standard that may be called upon at any time.

We used synthetic microscopy data to benchmark the performance of our annotation strategy. Each synthetic dataset depicts a simulated culture of cells distributed roughly uniformly in space (S6A Fig). Cells in this culture contain zero, one, or two copies of a gene encoding an RFP-tagged clonal marker (S6B Fig). Our simulation procedure ensures that cells tend to remain proximal to their clonal siblings (S6C Fig), thus forming synthetic clones with tunable size and spatial heterogeneity (S6D and S6E Fig). We generated synthetic measurements by randomly sampling fluorescence levels in a dosage-depend manner (S7A–S7C Fig). We varied the similarity of fluorescence levels across clones using an ambiguity parameter, σα, that modulates the spread of the distributions used to generate fluorescence levels (S7D–S7F Fig).

Using this schema as a template, we generated a large synthetic dataset, annotated each set of measurements, and compared the assigned labels with their true values. We used the mean absolute error as a comparison metric because it provides a stable measure of accuracy for multiclass classification problems in which the labels are intrinsically ordered [60]. In other words, it penalizes egregious misclassifications more severely than mild ones.

Annotation performance is very strong for all cases in which σα ≤ 0.3 (Fig 5). Unsurprisingly, performance suffers as the difficulty of the classification problem is increased. The same trends are evident when performance is graded strictly on accuracy (S8 Fig). As cells on the periphery of each clone were not excluded from these analyses, the observed metrics provide a lower bound on the performance that may be anticipated in practice.

Fig 5. Synthetic benchmarking of automated annotation performance.

Fig 5

Each pixel reflects the mean MAE across 50 replicates. Clone size reflects the mean number of cells per clone. Performance improves with increasing clone size and worsens with increasing fluorescence ambiguity.

Performance improved with increasing clone size. We suspected this was caused by larger clones offering additional spatial context to inform the identify of each cell. We verified our assertion by re-evaluating performance relative to a variant of our annotation algorithm that neglects spatial context (S4G Fig). As expected, the variant’s performance exhibited no dependence on clone size (S9A Fig). Comparing the two strategies confirmed that spatial context confers the most benefit when clones are large (S9B Fig). Inclusion of spatial context also becomes increasingly advantageous as the fluorescence ambiguity is increased, even for smaller clones. Thus, spatial context adds progressively more value as the classification task becomes more difficult.

This observation may be rationalized from a statistical perspective. Each cell is classified by maximizing the probability that the assigned label is correct. We compute these probabilities using the estimated expression level of each cell. Neglecting spatial context, this estimate is limited to a single sample and is therefore highly sensitive to both measurement and biological noise. Incorporating spatial context expands the sample size and thereby reduces the standard error of the estimated fluorescence level. The strategy is thus generally well suited to scenarios in which fluorescence intensities correlate across large clones, and closely parallels computer vision methods that exploit spatial contiguity to segment image features with ill-defined borders [61]. Because increased measurement precision comes at the expense of spatial resolution, we expect strong performance when measurements are aggregated across relatively large clones, but failure to detect small, heterogeneous clones. These expectations are consistent with the observed results. They are also conveniently aligned with the anticipated properties of real data, as experiments typically attempt to mitigate edge effects by driving early recombination events to generate large clones.

Discussion

We used synthetic data to survey the performance of our annotation strategy across a much broader range of conditions than would have otherwise been possible with manually labeled data. This included conditions well beyond those of practical use. In particular, experiments designed to compare gene expression levels across clones would likely seek to avoid generating small clones with ambiguous clonal marker expression. Beyond complicating the annotation task, small clones are also exposed to diffusion-mediated signals from adjacent clones that can mask the effect of mutations. Cells located near the clone boundaries are often excluded for the same reason, as quantification is typically most reliable in cells surrounded by similar neighbors. Synthetic data provided a means to survey these edge cases and establish a lower bound on annotation performance. The strong performance observed across the remaining conditions bolsters our confidence that our annotation strategy is well suited to the images it is likely to encounter.

In each of our examples, clones were distinguished by ternary segregation of nuclear clonal marker fluorescence levels. Modern mosaic analysis techniques continue to deploy ternary labeling [62, 63], but also frequently opt for binary labeling of mutant versus non-mutant clones [6466] and dichromic labeling of twin-spots [67, 68]. Our annotation scheme readily adapts to each of these scenarios provided that the number of anticipated labels is adjusted accordingly. In the case of dichromic labeling, binary classification would be performed separately for each color channel before merging the assigned labels. Extending the same logic to combinatorial pairs of colors suggests that our framework may also be compatible with multicolor labeling schemes used to simultaneously trace many clonal lineages over time [6971]. A notable limitation of our approach is its reliance upon reporter fluorescence levels within distinct cells or nuclei. This requirement for discrete measurements precludes analysis of contiguous clones in which cytosplasmic fluorescence signals are indistinguishable between adjacent cells. Our framework is thus well suited to many different mosaic analysis platforms deployed in imaginal discs, so long as reporter fluorescence levels are measured on a discrete basis.

In principle, the framework described here should also be applicable to a wide variety of other tissues [72, 73] and model organisms [7476] in which mosaics are studied. In practice, application to alternate contexts would require modifying some stages of the analysis. Most notably, image segmentation is strongly context dependent and any attempts to develop a universally successful strategy are likely to prove futile [77]. For this reason, we implemented a modular design in which each stage of analysis may be applied separately. For example, a user could perform their own segmentation before using our bleedthrough correction and clone annotation tools. By offering modular functionalities we hope to extend the utility of our software to the wider community of developmental biologists. Furthermore, the open-source nature of our framework supports continued development of more advanced features as various demands arise. Our synthetic benchmarking platform could then be used to objectively confirm the benefit conferred by any future developments.

Materials and methods

Genetics and microscopy of Drosophila eye imaginal discs

We borrowed an experimental dataset from a separate study of neuronal fate commitment during eye disc development [38]. The data consist of six eye imaginal discs dissected and fixed during the third larval instar of Drosophila development. Within each disc, ey>FLP and FRT40A were used to generate clones. The chromosome arm (2L) targeted for recombination was marked with a Ubi-mRFPnls transgene (S1A Fig), enabling automated detection of clones marked by distinct levels of mRFP fluorescence (S1B Fig). The discs also carried a pnt-GFP reporter transgene located on a different chromosome that was not subject to mitotic recombination. The PntGFP reporter is predominantly expressed in two narrow stripes of progenitor cells during eye disc development [38]. The first stripe occurs immediately posterior to a wave of developmental signaling that traverses the eye disc. Progenitor cells located in this region are suitable for comparison because they are of approximately equivalent developmental age. We applied the Fly-QMA framework to a total of nine images of these cells.

Genetics, fly lines, immunohistochemistry, and imaging conditions related to this dataset have already been published [38]. All discs were dissected in PBS, fixed in 4% paraformaldehyde for 30 min at room temperature, and permeabilized with PBS-Triton X-100 0.1% for 20 min at room temperature to allow DAPI penetration without perturbing the fluorescence of the Pnt-GFP protein. Discs were subsequently stained with a 4’,6-diamidino-2-phenylindole (DAPI) nuclear marker, rinsed twice with PBS-Tween 0.5%, and mounted on Vecta Shield (Vector labs). Images were acquired using a Leica SP5 confocal equipped with a tunable detector. The 405, 488, and 561 nm lasers were used to excite DAPI, Pnt-GFP, and Ubi-mRFPnls, while photons were collected in the 437-481, 491–555, and 570-644 nm intervals for DAPI, GFP, and mRFP, respectively. Images were recorded with 16-bit resolution using a 40X oil objective. Discs were oriented with the dorso-ventral equator parallel to the horizontal axis, and all images captured at least six rows of ommatidia on either side of the equator. All discs were fixed, mounted, and imaged in parallel in order to reduce measurement error.

Characterization of fluorescence bleedthrough

For each image, we morphologically dilate the foreground until no features remain visible (S2A Fig). We then extract the background pixels and resample them such that the distribution of pixel intensities is approximately uniform (S2B Fig). Resampling helps mitigate the skewed distribution of pixel intensities found in the background. We then estimate values for each {α1, α2, …αK} and β by fitting a generalized linear model to the fluorescence intensities of the resampled pixels (S2C Fig). Each model is a variant of Eq 3 in which angled braces instead denote averages across all background pixels. We formulate these models with identity link functions under the assumption that residuals are gamma distributed. Their coefficients provide an estimate of the bleedthrough contribution strengths that may then be used to estimate the background fluorescence intensity of each nucleus in the corresponding image (S2D Fig). The measurements may then be corrected through application of Eq 4.

Clone annotation algorithm

We assume the measured fluorescence level xi for cell i is sampled from an underlying distribution pm(x) for cells carrying m copies of the gene encoding the clonal marker:

xipm(x) (5)

We further assume that pm(x) is comprised of a mixture of one or more lognormal distributions:

pm(lnx)=n=1NλnN(lnx|θn) (6)
n=1Nλn=1 (7)

where 0 ≤ λ ≤ 1 are the mixing proportions, θn=(μn,σn2) are the mean and variance of the nth distribution. This assumption is supported by both empirical observations and theoretical insights [46, 47]. By superposition, the global distribution of measured fluorescence levels p(lnx) for all values of m are also sampled from a mixture of K components:

p(lnx)=m=02αmpm(lnx)=m=02αmn=1NλnN(lnx|θn)=k=1KλkN(lnx|θk) (8)
k=1Kλk=1 (9)

where αm denotes the overall fraction of cells with m copies of the gene encoding the clonal marker. For brevity, we substitute X = lnx yielding:

p(X)=k=1KλkN(X|θk) (10)

Given a collection of sampled fluorescence levels, {Xi}i = 1…N, we use expectation maximization to find values of θk and λk for each of the model’s K components that maximize the log-likelihood of the observed sample. We repeat this procedure for a range of sequential values of K, resulting in multiple models of increasing size. We then balance model resolution against overfitting by selecting the model that yields the smallest value of the Bayesian Information Criterion (BIC):

BIC(K)=ln(N)qK-2ln(L^K) (11)
qK=K-1+2K (12)

where N is the sample size, ln(L^)K is the maximum value of the log-likelihood, the subscript K denotes the number of mixture components in the model, and qK is the total number of parameters (i.e. K − 1 values of λk and 2K values of μk and σk2).

Applying Bayes’ rule to the selected model infers the posterior probabilities that each sample Xi belongs to the kth component:

p(k|Xi)=p(Xi|k)p(k)p(Xi)=p(Xi|k)λkp(Xi) (13)

where p(Xik) is evaluated using the model’s likelihood function and p(Xi) is evaluated by marginalizing across each of the model’s K components. The end result is a mixture model that allows us to predict the probability that a given measurement of clonal marker expression belongs to a particular one of its component distributions.

We then define a many-to-one mapping, f, from each of the K components of the mixture to each of the three possible values of m:

f:{0,1,K}{0,1,2} (14)

We determine the mapping by k-means clustering the K component distributions into three groups on the basis of their mean values, eμk. We may then assign a genotype label m to each measurement Xi by predicting the component k from which it was sampled.

The accuracy of these labels depends upon how closely the fitted mixture model reflects the true partitioning of gene copies among clones. While finite mixtures are always identifiable given a sufficiently large sample [48], the algorithm used to fit the mixture tends toward local maxima of the likelihood function when the true components are similar (Wu, 1983). An approach based on a univariate mixture is thus inherently prone to failure when expression levels extensively overlap across clones, as variation within each clone precludes accurate classification of a cell’s genotype solely on the basis of its individual expression level. However, clonal lineages are unlikely to exist in isolation because recombination events are usually timed to generate large clones. Our strategy therefore integrates both clonal marker expression and spatial context to identify clusters of cells with locally homogeneous expression behavior.

We incorporate spatial context by introducing a second jointly-distributed variable Yi:

Yi=1Mij=0MiXj (15)

where the subscript j indexes all Mi neighbors of cell i. The new variable reflects the average expression level among the neighbors surrounding each cell. We define neighbors as pairs of cells located within a critical distance of each other. This distance, or sampling radius, is derived from the approximate length scale over which cells retain approximately similar clonal marker expression levels. Specifically, we determine the exponential decay constant of the spatial correlation function, ψ(δ):

ψ(δ)=<(Xi-μX)(Xj-μX)>i,jδσX2 (16)

where μX and σX2 are the global mean and standard deviation, and angled brackets denote the mean across all pairs of cells separated by distance δ. We efficiently implement this procedure by fitting an exponential decay function to the down-sampled moving average of ψ(δ) as a function of increasing separation distance.

Following the introduction of spatial context, the mixture model becomes:

p(X,Y)=k=1KλkN(X,Y|θk) (17)

where θk=(μk,σk2) contains the mean and variance of each component given by vectors of length two. This formulation constrains each component’s covariance matrix to be diagonal. The posterior is now:

p(k|Xi,Yi)=p(Xi,Yi|k)λkp(Xi,Yi) (18)

We can recover the univariate model by marginalizing the posterior over all values of Y:

p(k|Xi)=jp(k|Xi,Yj) (19)

When neglecting spatial context, we use this expression to classify each sample by applying the mapping f to the value of k that maximizes p(kXi):

f(argmaxkp(k|Xi)) (20)

In all other cases, we deploy a graph-based approach to refine the estimate of p(kXi, Yi). This first entails constructing an undirected graph connecting adjacent cells within each image. We obtain the graph’s edges through Delaunay triangulation of the measured cell positions, then exclude distant neighbors by thresholding the edge lengths. Each edge is assigned a weight wij reflecting the similarity of clonal marker expression between adjacent cells i and j:

wij=exp(-EijE) (21)
Eij=|Xi-Xj| (22)

where Eij is the absolute log fold-change in measured expression level and angled brackets denote the mean across all edges. We chose an exponential formulation because it yields an approximately uniform distribution of edge weights. We then detect communities within the graph using the Infomap algorithm [49]. The algorithm provides a hierarchical partitioning of nodes into non-overlapping clusters. We aggregate all clusters below a critical level that is again chosen by estimating the spatial correlation decay constant. We then enumerate p(kXi,Yic) where Yic is the spatial context obtained by averaging expression levels among all neighbors in the same community as cell i.

We further incorporate spatial context by allowing the posterior probabilities p(kXi,Yic) to diffuse among adjacent cells. We define the modified posterior probability p^(kXi,Yic) through a recursive relation analogous to the Katz centrality [50], initialized by p(kXi,Yic):

p^(kXi,Yic)=αjwijp^(kXi,Yic)+β (23)
β=(1-α)p(k|Xi,Yic) (24)

where α is the attenuation factor and wij are the edge weights. Expressed in matrix form, the solution for p^(kX,Yc) is given by:

p^(kX,Yc)=(I-αW)-1(1-α)p(kX,Yc) (25)

where I denotes the identity matrix and W is the matrix of edge weights wij. We then assign a label to each measurement Xi by applying f to the value of k that maximizes p^(kXi,Yic):

f(argmaxkp^(k|Xi,Yic)) (26)

Finally, we assess the total posterior probability of each assigned label, P^(mi):

P^(mi)={k|f(k)=mi}p^(k|Xi,Yic) (27)

This measure reflects the overall confidence that mi is the appropriate label. Labels whose confidence falls below 80% are replaced by their counterparts estimated using the marginal classifier. This substitution helps preserve classification accuracy in situations where spatial context is not informative, and is particularly useful when the annotated clones are relatively small.

Statistical comparison of fluorescence levels

To mitigate edge effects, cells residing on the periphery of each clone were excluded from all comparisons (S2E Fig). Border cells were identified by using a Delaunay triangulation to find all cells connected to a neighbor within a different clone. Our framework includes a simple graphical user interface that permits manual curation of which regions of the image field are included in subsequent analyses. We used this tool to limit our analysis to the region of elevated GFP expression near the morphogenetic furrow (S2F Fig). Comparisons were further restricted to cells undergoing similar stages of development (S2G Fig). These restrictions served to buffer against differences in developmental context and ensured that all compared cells were of similar developmental age. The remaining fluorescence measurements were then aggregated across all eye discs and compared between pairs of clones by two-sided Mann-Whitney U test.

Simulated cell growth and recombination

We simulated the two dimensional growth of a cell culture seeded with a single cell. Growth proceeds through sequential division of cells (S6A Fig). Not all cells divide at each time-step because cell division is a stochastic process. Instead, each cell divides stochastically with a rate controlled by a global growth rate parameter.

Cells in this culture carry a gene encoding a clonal marker (S6B Fig). During growth, the gene is subject to mitotic recombination (S6C Fig). Each time a cell divides, its genes are duplicated and equally partitioned between the two daughter cells. However, in some instances a heterozygous parent may instead partition its two duplicate genes unequally, with one daughter receiving both and the other receiving none. These mitotic recombination events occur stochastically with a frequency defined by a global recombination rate parameter.

After each round of cell division, all cells are repositioned in order to preserve approximately uniform spatial density (S6C Fig). Repositioning is achieved by equilibrating a network of springs connecting each cell with its neighbors. This undirected network is constructed through Delaunay triangulation of all cells spatial positions. Edges on the periphery of the culture are systematically excluded by establishing a maximum polar angle between neighbors. This filtration removes spurious edges between distant pairs of cells. Edges connecting pairs of cells with the same clonal marker dosage are assigned a 10% higher spring constant than edges that connect dissimilar cells. This modest bias ensures that cells tend to remain proximal to their clonal lineages. Cell positions are then updated using a force-directed graph drawing algorithm [51]. Alternating cell division and repositioning steps are then repeated until a predefined population size is reached.

The timing and duration of recombination events affects the number and size of the resultant clones. In real experiments, recombination events are restricted to a particular stage of the developmental program through localized exogenous expression of the recombination machinery. We incorporated this feature into our cell growth simulations via two adjustable parameters. The first determines the minimum population size at which recombination may begin, while the second determines the number of generations over which recombination may continue to occur. These two parameters provide a means to tune the average number and size of clonal subpopulations in the synthetic data (S6D Fig). Early recombination events generally entail larger clones, while shorter recombination periods limit the extent of clone formation (S6E Fig).

Generation of synthetic microscopy data

Each simulation yields a list of spatial coordinates and gene dosages for each nucleus (S6B Fig). Synthetic measurements for each nucleus were generated by randomly sampling fluorescence levels {x1, x2, …xi = N} from a lognormal distribution conditioned upon the corresponding gene dosage (S7A–S7C Fig):

lnxNn(θn) (28)

where the subscript n denotes the gene copy number and θn=(μn,σα2) are the mean and variance of the corresponding distribution. We define μn such that the mean fluorescence level doubles for each additional copy of the gene:

μn=ln(2n-1) (29)

We refer to σα as the fluorescence ambiguity because it modulates the similarity of fluorescence levels across gene dosages. Increasing σα increases the overlap among N0, N1, and N2 (S7D and S7E Fig), and consequently increases the difficulty of the annotation task (S7F Fig).

Synthetic benchmarking of annotation performance

We generated a large synthetic dataset spanning a broad range of sixteen different clone sizes and fluorescence ambiguities (S6D and S7F Figs, only half are shown). We performed 50 replicate simulations for each condition. All simulations were terminated when the total population exceeded 2048 cells. We assigned each cell a 20% probability of division upon each iteration, and each cell division event was accompanied by a 20% chance of mitotic recombination. Parent cells containing zero or two copies of the recombined genes were ineligible for recombination, effectively sealing the genetic fates of their respective lineages.

To annotate each set of measurements, the mixture model given by Eq 17 was independently trained and applied to each replicate. Training a single model on all replicates yields modestly stronger performance on average, but also yields more variable variable results across the parameter space because all labels are dependent upon the outcome of a single expectation maximization routine.

Data and software availability

We have distributed the automated mosaic analysis framework as an open-source python package available at https://sebastianbernasek.github.io/flyqma. The associated code repository contains resources designed to help users analyze their own microscope images. These include code documentation, a guide to getting started with Fly-QMA, and an interactive tutorial that uses example data to demonstrate the core features of the software. We also intend to incorporate Fly-QMA into future versions of FlyEye Silhouette, our open-source desktop application for quantitative analysis of the larval eye. The code used to generate synthetic microscopy data is also freely available at https://github.com/sebastianbernasek/growth. All segmented and annotated eye discs are accessible via our data repository (https://doi.org/10.21985/N2F207).

Supporting information

S1 Fig. Example clones in the larval fly eye.

(A) Genetic schema for a bleedthrough control experiment. Red and green ovals represent genes encoding a RFP-tagged clonal marker and a GFP-tagged control reporter, respectively. Black lines depict a genomic locus. Recombination does not affect gene dosage of the control reporter, so GFP variation across clones is attributed to fluorescence bleedthrough. (B) Confocal image of an eye imaginal disc. Red, green, and blue reflect clonal marker, control reporter, and nuclear stain fluorescence, respectively. (C) Segmentation of the DAPI nuclear stain. White lines show individual segments.

(TIF)

S2 Fig. Using background pixels to characterize bleedthrough contributions in the foreground.

(A) Extraction of background pixels (striped region). Foreground includes the merged RFP and GFP images, surrounded by a white line. White arrow marks the morphogenetic furrow (MF). (B) Background pixel values are resampled such that RFP intensities are uniformly distributed. (C) A generalized linear model characterizes the contribution of RFP bleedthrough to GFP fluorescence. Boxes reflect windowed distributions of resampled background pixel intensities. Red line shows the model fit. (D) Measured GFP levels before bleedthrough correction. Markers represent individual nuclei. Red line shows the inferred contributions of RFP fluorescence bleedthrough. Dashed portion is extrapolated. (E-G) Data curation prior to statistical comparison of GFP levels. (E) Cells on the periphery of each clone are excluded. (F) The selection is limited to the region of elevated GFP expression near the MF. (G) It is further limited to cells of the same developmental age, defined by their relative positions along the x-axis.

(TIF)

S3 Fig. Training a clone annotation model.

(A) One or more images are segmented, yielding a set of fluorescence measurements X. These are used to sample the spatial context Y of the neighborhood surrounding each cell. Both sets of values are used to train a mixture model. Subsequent panels demonstrate these procedures using the example shown in S3 Fig C. (B) Expression levels are jointly distributed with the local average among neighboring cells. Center panel shows the joint distribution. Top and right bar plots show marginal distributions. (C) Mixture model identifies seven distinct components ki. Center panel shows position and spread of each component. Top and right panels show marginal components scaled by their respective weights. Red shading denotes the label mi assigned to each component. The model predicts the posterior probabilities that a given sample (X, Y) belongs to each component. (D) Neighborhood size is estimated by computing the decay constant of the spatial correlation function, ψ(δ). Black line shows the moving average of ψ(δ), red line shows an exponential fit. Inset shows the resultant sampling region. (E) The optimal number of mixture components is determined by minimizing BIC score. (F) Mixture components are labeled by k-means clustering their mean values. Markers reflect the component means, colors denote the assigned label.

(TIF)

S4 Fig. Label assignment using a trained clone annotation model.

(A) Measurements are used to sample spatial contexts before the trained model is applied (blue and green path). In parallel, measurements are labeled using a marginal projection of the trained model (magenta path). The labels are then merged (red path). (B-D) Spatial context sampling. (B) Weighted undirected graph connecting adjacent cells. Line width reflects expression similarity between neighbors. (C) Community resolution is defined by aggregating clusters that fall below a hierarchical cut level δ. Panels show increasing levels of aggregation. Colors denote distinct communities. (D) Cut level is chosen by finding the maximum level (red dot) that remains lower than the decay constant of the spatial correlation function, ψ(δ) (black line). Panel E depicts aggregation below the third level for ease of visualization. (E) Application of the mixture model. (I) The graph contains distinct communities of locally similar expression. (II) Mean expression level within each community serves as the local average for each cell. (III) Mixture model estimates the probability that each cell belongs to each of its component. Bar plots within each cell illustrate the cumulative probability of each label. (IV) Posterior probabilities are diffused across the graph. (V) Each cell is assigned the most probable label. (F,G) Application of a marginal mixture model. (F) Marginal mixture components, shaded by their mapped labels. Dashed line is the overall marginal density. (G) Marginal classifier labels cells strictly on the basis of their individual fluorescence level. Red shading denotes the most probable label for each level. (H) Annotated measurements. Red shading denotes the assigned label. Labels with low confidence P^(mi)<0.8 are replaced by their marginal counterparts.

(TIF)

S5 Fig. Comparison of automated annotation with manually assigned labels.

(A) Distribution of labels among each possible value. (B) Visual comparison of the sole instance in which automated and manual annotation differ. Image shows clonal marker fluorescence, colors denote the assigned label.

(TIF)

S6 Fig. Simulated growth of a synthetic cell culture.

(A) Partial simulation time course. Each marker depicts a cell. Greyscale intensity reflects clonal marker gene dosage. Simulation time reflects the approximate number of cell divisions since the initial seed. (B) Simulations yield gene dosages and spatial coordinates for each cell. (C) Single iteration of an example simulation. Circles represent individual cells, red shading denotes clonal marker dosage. Cycles of cell division, recombination, and repositioning are repeated until the simulation reaches a specified end time (t > 11 in panel A). (D) Cultures simulated with varying recombination start times. All cultures were subject to four generations of recombination (δt = 4). Recombination start time increases from left to right. Later recombination events generally yield smaller clones. (E) Mean clone size (cells per clone) as a function of the recombination start time. Colors denote recombination period duration. Error bars reflect standard error of the mean across 50 replicates. Clone size generally decreases as recombination is limited to later times.

(TIF)

S7 Fig. Tunable generation of synthetic microscopy data.

(A) Fluorescence levels are sampled from lognormal distributions conditioned upon gene dosage. (B) Synthetic data include a measured fluorescence level for each reporter in each cell. Text color reflects the generative distribution in A. (C) Synthetic image of clonal marker fluorescence when σα = 0.25. Each nucleus is shaded in accordance with its sampled fluorescence intensity. (D-F) Left to right, increasing the fluorescence ambiguity parameter broadens the overlap in fluorescence levels across gene dosages. (D) Distributions used to generate clonal marker fluorescence levels. Red shading denotes gene dosage. (E) Evenly weighted sum of the generative distributions. (F) Example images of clonal marker fluorescence.

(TIF)

S8 Fig. Fraction of nuclei correctly labeled during synthetic benchmarking.

Each pixel reflects the average across 50 replicates. Clone size reflects the mean number of cells per clone. Performance improves with increasing clone size and worsens with increasing fluorescence ambiguity.

(TIF)

S9 Fig. Spatial context is most informative for large clones with ambiguous fluorescence.

(A) MAE of labels assigned using a marginal classifier that neglects spatial context. Performance worsens with increasing fluorescence ambiguity but does not depend upon clone size. (B) Annotation performance relative to the marginal classifier. Color scale reflects the log2 fold-change in MAE when spatial context is neglected. Blue indicates that spatial context improves performance.

(TIF)

Data Availability

The data underlying the results presented in the study are available in a public data repository hosted by Northwestern University. DOI: https://doi.org/10.21985/N2F207.

Funding Statement

SMB and LANA were supported by the John and Leslie McQuown Gift. RWC was supported by NIH R35GM118144 (https://www.nih.gov). LANA, NB, and RWC were supported by NSF 1764421 (https://www.nsf.gov). LANA, NB, and RWC were supported by Simons Foundation 597491 (https://www.simonsfoundation.org). NP was supported by the HHMI Hanna H. Gray Fellowship (https://www.hhmi.org/programs/hanna-h-gray-fellows-program). In all cases, the funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Oates AC, Gorfinkiel N, González-Gaitán M, Heisenberg CP. Quantitative approaches in developmental biology; 2009. Available from: http://www.nature.com/articles/nrg2548. [DOI] [PubMed]
  • 2. Muzzey D, van Oudenaarden A. Quantitative time-lapse fluorescence microscopy in single cells. Annual Review of Cell and Developmental Biology. 2009;25(1):301–327. 10.1146/annurev.cellbio.042308.113408 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Stelzer EHK. Light-sheet fluorescence microscopy for quantitative biology. Nature Methods. 2014;12(1):23–26. 10.1038/nmeth.3219 [DOI] [PubMed] [Google Scholar]
  • 4. Truong TV, Supatto W. Toward high-content/high-throughput imaging and analysis of embryonic morphogenesis. Genesis. 2011;49(7):555–569. 10.1002/dvg.20760 [DOI] [PubMed] [Google Scholar]
  • 5. Aghaeepour N, Finak G, Hoos H, Mosmann TR, Brinkman R, Gottardo R, et al. Critical assessment of automated flow cytometry data analysis techniques. Nature Methods. 2013;10(3):228–238. 10.1038/nmeth.2365 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Chen X, Hasan M, Libri V, Urrutia A, Beitz B, Rouilly V, et al. Automated flow cytometric analysis across large numbers of samples and cell types. Clinical Immunology. 2015;157(2):249–260. 10.1016/j.clim.2014.12.009 [DOI] [PubMed] [Google Scholar]
  • 7. Pyne S, Maier LM, Lin TI, Wang K, Rossin E, Hu X, et al. Automated high-dimensional flow cytometric data analysis. Proceedings of the National Academy of Sciences. 2009;106(21):8519–8524. 10.1073/pnas.0903028106 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Bernstein BE, Brown M, Johnson DS, Liu XS, Nussbaum C, Myers RM, et al. Model-based analysis of ChIP-Seq (MACS). Genome Biology. 2008;9(9):R137 10.1186/gb-2008-9-9-r137 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Hellemans J, Mortier G, De Paepe A, Speleman F, Vandesompele J. qBase relative quantification framework and software for management and automated analysis of real-time quantitative PCR data. Genome Biology. 2007;8(2). 10.1186/gb-2007-8-2-r19 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nature Methods. 2012;9(4):357–9. 10.1038/nmeth.1923 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Trapnell C, Pachter L, Salzberg SL. TopHat: Discovering splice junctions with RNA-Seq. Bioinformatics. 2009;25(9):1105–1111. 10.1093/bioinformatics/btp120 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Costes SV, Daelemans D, Cho EH, Dobbin Z, Pavlakis G, Lockett S. Automatic and quantitative measurement of protein-protein colocalization in live cells. Biophysical Journal. 2004;86(6):3993–4003. 10.1529/biophysj.103.038422 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Kelley LA, Mezulis S, Yates CM, Wass MN, Sternberg MJE. The Phyre2 web portal for protein modeling, prediction and analysis. Nature Protocols. 2015;10(6):845–858. 10.1038/nprot.2015.053 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Carpenter AE, Jones TR, Lamprecht MR, Clarke C, Kang IH, Friman O, et al. CellProfiler: Image analysis software for identifying and quantifying cell phenotypes. Genome Biology. 2006;7(10):R100 10.1186/gb-2006-7-10-r100 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Paintdakhi A, Parry B, Campos M, Irnov I, Elf J, Surovtsev I, et al. Oufti: An integrated software package for high-accuracy, high-throughput quantitative microscopy analysis. Molecular Microbiology. 2016;99(4):767–777. 10.1111/mmi.13264 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Schindelin J, Arganda-Carreras I, Frise E, Kaynig V, Longair M, Pietzsch T, et al. Fiji: An open-source platform for biological-image analysis; 2012. Available from: http://www.nature.com/articles/nmeth.2019. [DOI] [PMC free article] [PubMed]
  • 17.Sommer C, Straehle C, Kothe U, Hamprecht FA. Ilastik: Interactive learning and segmentation toolkit. In: Proceedings—IEEE International Symposium on Biomedical Imaging. 2011. p. 230–233. Available from: http://ieeexplore.ieee.org/document/5872394/.
  • 18. Jug F, Pietzsch T, Preibisch S, Tomancak P. Bioimage informatics in the context of Drosophila research. Methods. 2014;68(1):60–73. 10.1016/j.ymeth.2014.04.004 [DOI] [PubMed] [Google Scholar]
  • 19. Sbalzarini IF. Seeing is believing: Quantifying is convincing: Computational image analysis in biology. Advances in Anatomy, Embryology, and Cell Biology. 2016;219:1–39. 10.1007/978-3-319-28549-8_1 [DOI] [PubMed] [Google Scholar]
  • 20.Schindelin J, Rueden CT, Hiner MC, Eliceiri KW. The ImageJ ecosystem: An open platform for biomedical image analysis; 2015. Available from: http://doi.wiley.com/10.1002/mrd.22489. [DOI] [PMC free article] [PubMed]
  • 21.Simpson IT, Price DJ. Pax6; a pleiotropic player in development; 2002. Available from: http://doi.wiley.com/10.1002/bies.10174. [DOI] [PubMed]
  • 22. Parody TR, Muskavitch MAT. The pleiotropic function of Delta during postembryonic development of Drosophila melanogaster. Genetics. 1993;135(2):527–539. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Shilo BZ, Raz E. Developmental control by the Drosophila EGF receptor homolog DER; 1991. Available from: https://www.sciencedirect.com/science/article/pii/016895259190261N. [DOI] [PubMed]
  • 24. Xu T, Rubin GM. Analysis of genetic mosaics in developing and adult Drosophila tissues. Development. 1993;117(4):1223–37. [DOI] [PubMed] [Google Scholar]
  • 25. Xu T, Rubin GM. The effort to make mosaic analysis a household tool. Development. 2012;139(24):4501–4503. 10.1242/dev.085183 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Newsome TP, Asling B, Dickson BJ. Analysis of Drosophila photoreceptor axon guidance in eye-specific mosaics. Development. 2000;127(4):851–60. [DOI] [PubMed] [Google Scholar]
  • 27. Theodosiou NA, Xu T. Use of FLP/FRT system to study Drosophila development. Methods. 1998;14(4):355–365. 10.1006/meth.1998.0591 [DOI] [PubMed] [Google Scholar]
  • 28. Struhl G, Basler K. Organizing activity of wingless protein in Drosophila. Cell. 1993;. 10.1016/0092-8674(93)90072-X [DOI] [PubMed] [Google Scholar]
  • 29. Halfar K, Rommel C, Stocker H, Hafen E. Ras controls growth, survival and differentiation in the Drosophila eye by different thresholds of MAP kinase activity. Development. 2001;128(9):1687–96. [DOI] [PubMed] [Google Scholar]
  • 30. Tomlinson A, Struhl G. Delta/Notch and Boss/Sevenless signals act combinatorially to specify the Drosophila R7 photoreceptor. Molecular Cell. 2001;7(3):487–95. 10.1016/s1097-2765(01)00196-4 [DOI] [PubMed] [Google Scholar]
  • 31. Yang L, Baker NE. Role of the EGFR/Ras/Raf pathway in specification of photoreceptor cells in the Drosophila retina. Development. 2001;128(7):1183–91. [DOI] [PubMed] [Google Scholar]
  • 32. Huang J, Wu S, Barrera J, Matthews K, Pan D. The Hippo signaling pathway coordinately regulates cell proliferation and apoptosis by inactivating Yorkie, the Drosophila homolog of YAP. Cell. 2005;122(3):421–434. 10.1016/j.cell.2005.06.007 [DOI] [PubMed] [Google Scholar]
  • 33. Thompson BJ, Cohen SM. The Hippo pathway regulates the bantam microRNA to control cell proliferation and apoptosis in Drosophila. Cell. 2006;126(4):767–774. 10.1016/j.cell.2006.07.013 [DOI] [PubMed] [Google Scholar]
  • 34. Atkins M. Drosophila genetics: The power of genetic mosaic approaches In: Methods Mol. Biol. vol. 1893 Humana Press, New York, NY; 2019. p. 27–42. Available from: http://link.springer.com/10.1007/978-1-4939-8910-2_2. [DOI] [PubMed] [Google Scholar]
  • 35. Enomoto M, Siow C, Igaki T. Drosophila as a cancer model In: Advances in Experimental Medicine and Biology. vol. 1076 Springer, Singapore; 2018. p. 173–194. Available from: http://link.springer.com/10.1007/978-981-13-0529-0_10. [DOI] [PubMed] [Google Scholar]
  • 36. Germani F, Bergantinos C, Johnston LA. Mosaic analysis in Drosophila. Genetics. 2018;208(2):473–490. 10.1534/genetics.117.300256 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. Dai W, Peterson A, Kenney T, Burrous H, Montell DJ. Quantitative microscopy of the Drosophila ovary shows multiple niche signals specify progenitor cell fate. Nature Communications. 2017;8(1):1244 10.1038/s41467-017-01322-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Bernasek SM, Lachance JFB, Peláez N, Bakker R, Navarro HT, Amaral LAN, et al. Ratio-based sensing of two transcription factors regulates the transit to differentiation. bioRxiv. 2018; p. 430744.
  • 39. Ghiglione C, Jouandin P, Cérézo D, Noselli S. The Drosophila insulin pathway controls Profilin expression and dynamic actin-rich protrusions during collective cell migration. Development. 2018;145(14):dev161117 10.1242/dev.161117 [DOI] [PubMed] [Google Scholar]
  • 40. Li K, Baker NE. Regulation of the Drosophila ID protein Extra macrochaetae by proneural dimerization partners. Elife. 2018;7 10.7554/eLife.33967 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41. Bacia K, Petrášek Z, Schwille P. Correcting for spectral cross-talk in dual-color fluorescence cross-correlation spectroscopy. ChemPhysChem. 2012;13(5):1221–1231. 10.1002/cphc.201100801 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42. Elangovan M, Wallrabe H, Chen Y, Day RN, Barroso M, Periasamy A. Characterization of one- and two-photon excitation fluorescence resonance energy transfer microscopy. Methods. 2003;29(1):58–73. 10.1016/s1046-2023(02)00283-9 [DOI] [PubMed] [Google Scholar]
  • 43. Mort RL. Quantitative analysis of patch patterns in mosaic tissues with ClonalTools software. Journal of Anatomy. 2009;215(6):698–704. 10.1111/j.1469-7580.2009.01150.x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44. Helmuth JA, Paul G, Sbalzarini IF. Beyond co-localization: Inferring spatial interactions between sub-cellular structures from microscopy images. BMC Bioinformatics. 2010;11(1):372 10.1186/1471-2105-11-372 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45. Shivanandan A, Radenovic A, Sbalzarini IF. MosaicIA: An ImageJ/Fiji plugin for spatial pattern and interaction analysis. BMC Bioinformatics. 2013;14(1):349 10.1186/1471-2105-14-349 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46. Furusawa C, Suzuki T, Kashiwagi A, Yomo T, Kaneko K. Ubiquity of log-normal distributions in intra-cellular reaction dynamics. Biophysics. 2005;1:25–31. 10.2142/biophysics.1.25 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47. Beal J. Biochemical complexity drives log-normal variation in genetic expression. Engineering Biology. 2017;1(1):55–60. 10.1049/enb.2017.0004 [DOI] [Google Scholar]
  • 48. Teicher H. Identifiability of finite mixtures. The Annals of Mathematical Statistics. 1963;34(4):1265–1269. 10.1214/aoms/1177703862 [DOI] [Google Scholar]
  • 49. Rosvall M, Axelsson D, Bergstrom CT. The map equation. European Physical Journal. 2009;. [Google Scholar]
  • 50. Katz L. A new status index derived from sociometric analysis. Psychometrika. 1953;. 10.1007/BF02289026 [DOI] [Google Scholar]
  • 51. Kamada T, Kawai S. An algorithm for drawing general undirected graphs. Information Processing Letters. 1989;31(1):7–15. 10.1016/0020-0190(89)90102-6 [DOI] [Google Scholar]
  • 52. van der Walt S, Schönberger JL, Nunez-Iglesias J, Boulogne F, Warner JD, Yager N, et al. scikit-image: image processing in Python. PeerJ. 2014;. 10.7717/peerj.453 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Nobuyuki Otsu. A threshold selection method from gray-level histograms. IEEE Transactions on Systems, Man, and Cybernetics. 1979;.
  • 54. Bugarski M, Mansouri M, Niemann A, Rizk A, Berger P, Ziegler U, et al. Segmentation and quantification of subcellular structures in fluorescence microscopy images using Squassh. Nature Protocols. 2014;9(3):586–596. 10.1038/nprot.2014.037 [DOI] [PubMed] [Google Scholar]
  • 55. Peláez N, Gavalda-Miralles A, Wang B, Navarro HT, Gudjonson H, Rebay I, et al. Dynamics and heterogeneity of a fate determinant during transition towards cell differentiation. Elife. 2015;4 10.7554/eLife.08924 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56. Zinchuk V, Zinchuk O, Okada T. Quantitative colocalization analysis of multicolor confocal immunofluorescence microscopy images: Pushing pixels to explore biological phenomena. Acta Histochemica et Cytochemica. 2007;40(4):101–111. 10.1267/ahc.07002 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57. Arsenovic PT, Mayer CR, Conway DE. SensorFRET: A standardless approach to measuring pixel-based spectral bleed-through and FRET efficiency using spectral imaging. Scientific Reports. 2017;7(1). 10.1038/s41598-017-15411-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58. Kim D, Curthoys NM, Parent MT, Hess ST. Bleed-through correction for rendering and correlation analysis in multi-colour localization microscopy. Journal of Optics. 2013;15(9). 10.1088/2040-8978/15/9/094011 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59. McMullen PD, Morimoto RI, Amaral LAN. Physically grounded approach for estimating gene expression from microarray data. Proceedings of the National Academy of Sciences. 2010;107(31):13690–13695. 10.1073/pnas.1000938107 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60. Gaudette L, Japkowicz N. Evaluation methods for ordinal classification In: Lecture Notes in Computer Science. vol. 5549 LNAI. Springer, Berlin, Heidelberg; 2009. p. 207–210. Available from: http://link.springer.com/10.1007/978-3-642-01818-3_25. [Google Scholar]
  • 61. Nguyen TM, Wu QMJ. Gaussian mixture-model-based spatial neighborhood relationships for pixel labeling problems. IEEE Transactions on Systems, Man, and Cybernetics. 2012;42(1):193–202. 10.1109/TSMCB.2011.2161284 [DOI] [PubMed] [Google Scholar]
  • 62. Gambis A, Dourlen P, Steller H, Mollereau B. Two-color in vivo imaging of photoreceptor apoptosis and development in Drosophila. Developmental Biology. 2011;351(1):128–134. 10.1016/j.ydbio.2010.12.040 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63. Dourlen P, Levet C, Mejat A, Gambis A, Mollereau B. The Tomato/GFP-FLP/FRT method for live imaging of mosaic adult Drosophila photoreceptor cells. Journal of Visualized Experiments. 2013;79. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64. Fisher YE, Yang HH, Isaacman-Beck J, Xie M, Gohl DM, Clandinin TR. FlpStop, a tool for conditional gene control in Drosophila. Elife. 2017;6 10.7554/eLife.22279 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65. Wu JS, Luo L. A protocol for mosaic analysis with a repressible cell marker (MARCM) in Drosophila. Nature Protocols. 2007;1(6):2583–2589. 10.1038/nprot.2006.320 [DOI] [PubMed] [Google Scholar]
  • 66. Zhou Q, Neal SJ, Pignoni F. Mutant analysis by rescue gene excision: New tools for mosaic studies in Drosophila. Genesis. 2016;54(11):589–592. 10.1002/dvg.22984 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67. Heffern E, Perrimon N, Hohl AM, del Valle Rodriguez A, Bakal C, Bonvin M, et al. The twin spot generator for differential Drosophila lineage analysis. Nat Methods. 2009;6(8):600–602. 10.1038/nmeth.1349 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68. Yu HH, Kao CF, He Y, Ding P, Kao JC, Lee T. A complete developmental sequence of a Drosophila neuronal lineage as revealed by twin-spot MARCM. PLoS Biology. 2010;8(8):39–40. 10.1371/journal.pbio.1000461 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69. Denes AS, Caussinus E, Affolter M, Kanca O, Percival-Smith A. Raeppli: a whole-tissue labeling tool for live imaging of Drosophila development. Development. 2013;141(2):472–480. 10.1242/dev.102913 [DOI] [PubMed] [Google Scholar]
  • 70. Hadjieconomou D, Rotkopf S, Alexandre C, Bell DM, Dickson BJ, Salecker I. Flybow: Genetic multicolor cell labeling for neural circuit analysis in Drosophila melanogaster. Nature Methods. 2011;8(3):260–266. 10.1038/nmeth.1567 [DOI] [PubMed] [Google Scholar]
  • 71. Hampel S, Chung P, McKellar CE, Hall D, Looger LL, Simpson JH. Drosophila Brainbow: a recombinase-based fluorescence labeling technique to subdivide neural expression patterns. Nature Methods. 2011;8(3):253–259. 10.1038/nmeth.1566 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72. Neufeld TP, De La Cruz AFA, Johnston LA, Edgar BA. Coordination of growth and cell division in the Drosophila wing. Cell. 1998;93(7):1183–1193. 10.1016/s0092-8674(00)81462-2 [DOI] [PubMed] [Google Scholar]
  • 73. Tworoger M, Larkin MK, Bryant Z, Ruohola-Baker H. Mosaic analysis in the Drosophila ovary reveals a common Hedgehog- inducible precursor stage for stalk and polar cells. Genetics. 1999;. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74. Collins RT, Linker C, Lewis J. MAZe: A tool for mosaic analysis of gene function in zebrafish. Nature Methods. 2010;7(3):219–223. 10.1038/nmeth.1423 [DOI] [PubMed] [Google Scholar]
  • 75. Muñoz-Jiménez C, Ayuso C, Dobrzynska A, Torres-Mendéz A, Ruiz PdlC, Askjaer P. An efficient FLP-based toolkit for spatiotemporal control of gene expression in Caenorhabditis elegans. Genetics. 2017;206(4):1763–1778. 10.1534/genetics.117.201012 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76. Wang W, Warren M, Bradley A. Induced mitotic recombination of p53 in vivo. Proceedings of the National Academy of Sciences. 2007;104(11):4501–4505. 10.1073/pnas.0607953104 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77. Meijering E. Cell segmentation: 50 years down the road. IEEE Signal Processing Magazine. 2012;29(5):140–145. 10.1109/MSP.2012.2204190 [DOI] [Google Scholar]
PLoS Comput Biol. doi: 10.1371/journal.pcbi.1007406.r001

Decision Letter 0

Pedro Mendes, Douglas A Lauffenburger

13 Dec 2019

Dear Dr Bernasek,

Thank you very much for submitting your manuscript, 'Fly-QMA: Automated analysis of mosaic imaginal discs in Drosophila', to PLOS Computational Biology. As with all papers submitted to the journal, yours was fully evaluated by the PLOS Computational Biology editorial team, and in this case, by independent peer reviewers. The reviewers appreciated the attention to an important topic but identified some aspects of the manuscript that should be improved.

We would therefore like to ask you to modify the manuscript according to the review recommendations before we can consider your manuscript for acceptance. Your revisions should address the specific points made by each reviewer and we encourage you to respond to particular issues Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.raised.

In addition, when you are ready to resubmit, please be prepared to provide the following:

(1) A detailed list of your responses to the review comments and the changes you have made in the manuscript. We require a file of this nature before your manuscript is passed back to the editors.

(2) A copy of your manuscript with the changes highlighted (encouraged). We encourage authors, if possible to show clearly where changes have been made to their manuscript e.g. by highlighting text.

(3) A striking still image to accompany your article (optional). If the image is judged to be suitable by the editors, it may be featured on our website and might be chosen as the issue image for that month. These square, high-quality images should be accompanied by a short caption. Please note as well that there should be no copyright restrictions on the use of the image, so that it can be published under the Open-Access license and be subject only to appropriate attribution.

Before you resubmit your manuscript, please consult our Submission Checklist to ensure your manuscript is formatted correctly for PLOS Computational Biology: http://www.ploscompbiol.org/static/checklist.action. Some key points to remember are:

- Figures uploaded separately as TIFF or EPS files (if you wish, your figures may remain in your main manuscript file in addition).

- Supporting Information uploaded as separate files, titled 'Dataset', 'Figure', 'Table', 'Text', 'Protocol', 'Audio', or 'Video'.

- Funding information in the 'Financial Disclosure' box in the online system.

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com  PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

We hope to receive your revised manuscript within the next 30 days. If you anticipate any delay in its return, we ask that you let us know the expected resubmission date by email at ploscompbiol@plos.org.

If you have any questions or concerns while you make these revisions, please let us know.

Sincerely,

Pedro Mendes, PhD

Associate Editor

PLOS Computational Biology

Douglas Lauffenburger

Deputy Editor

PLOS Computational Biology

A link appears below if there are any accompanying review attachments. If you believe any reviews to be missing, please contact ploscompbiol@plos.org immediately:

[LINK]

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: The authors have developed “Fly-QMA”, an unsupervised annotation computational algorithm for automating clonal analysis on confocal microscopy images of Drosophila imaginal discs. This could be a very useful program for many researchers in Drosophila, and apparently also could be used to analyze clones in other models as well. The authors tested real and synthetic data to validate the efficacy of the annotation algorithm across different conditions. Although I lack expertise in the mathematics of the work, my take on it is that it could be a useful tool.

Strengths of the work include:

• Uses nuclear fluorescence quantification in segmentation

• Automated bleed-through correction

• Modular arrangement allows flexibility eg, different tissues or contexts

• Unsupervised annotation of clones, using clonal marker expression info and spatial information

• Should work well if markers are clear and different (eg membrane versus nuclear?)

• Best performance with larger clones (as is true in any method of analyzing clones).

• The algorithm is available on github, with a plan to incorporate core aspects into FlyEye Silhouette (their open source platform).

There are a few weaknesses, as well, although relatively minor:

• operating this algorithm seems complicated for someone who is not savvy with computer programming –even installing requires Python 3.6+. Even the tutorial seems complicated. Although some will be able to use it as is, it would be even more useful if the algorithm could be incorporated into a user-friendly and widely used program like Fiji/Image J.

• Requires use of a 3D confocal stack, which is fine for some analyses but can it also be used for single images from epifluorescence scope (e.g., +GFP vs -GFP)?

• Figure S8, “Fraction of nuclei correctly labeled during synthetic benchmarking” appears to be missing from the manuscript (the image labelled S8 appears to be S9, according to the legend information).

Reviewer #2: Bernasek et al. present a new method for automatically quantifying fluorescence reporter expression of cells in Drosophila imaginal discs using confocal microscopy. The proposed pipeline starts with the automatic segmentation of the image data based on well-known methods like CLAHE for local contrast enhancement, Otsu’s method for image binarization and a combination of Euclidean distance maps and a seeded watershed to identify distinct connected components for each of the cell nuclei. Moreover, a bleedthrough correction is performed that tries to eradicate the crosstalk impact of other fluorescent reporters or due to autofluorescence. Cells are then classified into three distinct expression levels, including a neighborhood analysis to correct for outliers. The analysis software and the simulated benchmark are released as open source, which definitely will increase the potential usefulness to the community. Moreover, the software contains an extensive documentation, getting started sections and tutorials that should make it straightforward to use the tool (given that at least basic knowledge of Python usage is available). The paper is very well written and I only have a few comments.

Comments:

- The author summary partly contains exact replications of the abstract. This should be avoided and at least a slight rephrasing should be performed.

- The notation with <>-brackets for the average across all pixels seems unusual to me. Maybe rather use a horizontal bar on top of the variable letter?

- To better understand that manual assessment is infeasible if the data set sizes grow, it would be good to have a comparison of the „processing times“ of a human expert and the software.

- If I understood it correctly, the simulated benchmark does not involve varying contrast in the different image regions. An easy addition for increased realism would be to add global illumination artifacts to the simulated images in order to also assess the validity of the CLAHE approach and to see if it works as expected for the simulated scenario.

- The reference list mixes sentence case and title case for the titles and journal/conference names. While being a minor issue, I would recommend to use either one or the other but not mixing them.

- Page 10, Fig. 2C should probably be Fig. 3C instead?

Reviewer #3: I will comment on the biology and image analysis. I am not qualified to review the mathematical equations critically and therefore hope that these will be covered by another reviewer.

The authors describe Fly-QMA a computational tool for performing quantitative clonal analysis on mosaicly labelled drosophila imaginal discs. The authors test their software using both real and synthetic data and demonstrate a robust performance. These tools are important and will be increasingly in demand and so I commend any effort to move the field forward. I have a number of questions that I don’t feel come out in the main text that could therefore be clarified.

1). Whilst bleed through correction might be a useful feature it would be pertinent to eliminate bleed through at the image acquisition stage by careful choice of excitation and emission filters or tuning of detectors. It is not possible to judge from the data presented what the cause of the bleed through is because details of the image acquisition are not included in the methods. The authors should add this detail. I would expect to see it here in this case even though it is previously published.

2). Historically the main bottleneck in automated clonal analyses (especially of data from fluorescent reporters) is in segmentation. This is often best achieved by trial and error because of the large variability between datasets and in their acquisition parameters. The authors are wise to point out that the pipeline can accommodate images segmented before hand but give no guidance on what the requirement of their software might be? Do they require the segmentation masks or regions of interest? Where is the standardised file structure described is it on github? Please clarify?

3). Non of the microscopy images in any of the figures include scale bars and it is not clear what objective was used for image capture so it is hard to tell what a comparable system for image acquisition would be. Please revise all the figures to include scale bars.

4). The analysis appears to be limited to nuclear signals as this segmentation occurs on nuclei. This is a limitation as often the analysis is designed to identify contiguous clones for which the extent of the cytoplasm may be important. This should be discussed.

5). I don’t see is any attempt to derive the size (spatially or by number of cells) of the individual clones? Or to correct for the probability of multiple clones being adjacent to one another?

Size is a key parameter and this is a common problem in many clonal analyses. Does the variation in intensities between clones with the same copy number mean that the algorithm can discriminate between adjacent clones? This seems unlikely. Therefore can the clone sizes be corrected for the probability of being multiple clones?

Or, with the spatial information available, can large polyclonal units be identified in the annotation?

6) The authors don’t do the excellent resources provided on github justice - please make it clear that there are tutorials and test data available there.

**********

Have all data underlying the figures and results presented in the manuscript been provided?

Large-scale datasets should be made available via a public repository as described in the PLOS Computational Biology data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information.

Reviewer #1: No: Figure S8 appears to be missing

Reviewer #2: Yes

Reviewer #3: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: No

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1007406.r003

Decision Letter 1

Pedro Mendes, Douglas A Lauffenburger

27 Jan 2020

Dear Dr Bernasek,

We are pleased to inform you that your manuscript 'Fly-QMA: Automated analysis of mosaic imaginal discs in Drosophila' has been provisionally accepted for publication in PLOS Computational Biology.

Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch within two working days with a set of requests.

Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated.

IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript.

Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS.

Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology. 

Best regards,

Pedro Mendes, PhD

Associate Editor

PLOS Computational Biology

Douglas Lauffenburger

Deputy Editor

PLOS Computational Biology

***********************************************************

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #3: The authors have satisfactorily addressed the comments and concerns that I raised.

**********

Have all data underlying the figures and results presented in the manuscript been provided?

Large-scale datasets should be made available via a public repository as described in the PLOS Computational Biology data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information.

Reviewer #3: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #3: No

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1007406.r004

Acceptance letter

Pedro Mendes, Douglas A Lauffenburger

25 Feb 2020

PCOMPBIOL-D-19-01579R1

Fly-QMA: Automated analysis of mosaic imaginal discs in Drosophila

Dear Dr Bernasek,

I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course.

The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript.

Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.

Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work!

With kind regards,

Laura Mallard

PLOS Computational Biology | Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom ploscompbiol@plos.org | Phone +44 (0) 1223-442824 | ploscompbiol.org | @PLOSCompBiol

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Fig. Example clones in the larval fly eye.

    (A) Genetic schema for a bleedthrough control experiment. Red and green ovals represent genes encoding a RFP-tagged clonal marker and a GFP-tagged control reporter, respectively. Black lines depict a genomic locus. Recombination does not affect gene dosage of the control reporter, so GFP variation across clones is attributed to fluorescence bleedthrough. (B) Confocal image of an eye imaginal disc. Red, green, and blue reflect clonal marker, control reporter, and nuclear stain fluorescence, respectively. (C) Segmentation of the DAPI nuclear stain. White lines show individual segments.

    (TIF)

    S2 Fig. Using background pixels to characterize bleedthrough contributions in the foreground.

    (A) Extraction of background pixels (striped region). Foreground includes the merged RFP and GFP images, surrounded by a white line. White arrow marks the morphogenetic furrow (MF). (B) Background pixel values are resampled such that RFP intensities are uniformly distributed. (C) A generalized linear model characterizes the contribution of RFP bleedthrough to GFP fluorescence. Boxes reflect windowed distributions of resampled background pixel intensities. Red line shows the model fit. (D) Measured GFP levels before bleedthrough correction. Markers represent individual nuclei. Red line shows the inferred contributions of RFP fluorescence bleedthrough. Dashed portion is extrapolated. (E-G) Data curation prior to statistical comparison of GFP levels. (E) Cells on the periphery of each clone are excluded. (F) The selection is limited to the region of elevated GFP expression near the MF. (G) It is further limited to cells of the same developmental age, defined by their relative positions along the x-axis.

    (TIF)

    S3 Fig. Training a clone annotation model.

    (A) One or more images are segmented, yielding a set of fluorescence measurements X. These are used to sample the spatial context Y of the neighborhood surrounding each cell. Both sets of values are used to train a mixture model. Subsequent panels demonstrate these procedures using the example shown in S3 Fig C. (B) Expression levels are jointly distributed with the local average among neighboring cells. Center panel shows the joint distribution. Top and right bar plots show marginal distributions. (C) Mixture model identifies seven distinct components ki. Center panel shows position and spread of each component. Top and right panels show marginal components scaled by their respective weights. Red shading denotes the label mi assigned to each component. The model predicts the posterior probabilities that a given sample (X, Y) belongs to each component. (D) Neighborhood size is estimated by computing the decay constant of the spatial correlation function, ψ(δ). Black line shows the moving average of ψ(δ), red line shows an exponential fit. Inset shows the resultant sampling region. (E) The optimal number of mixture components is determined by minimizing BIC score. (F) Mixture components are labeled by k-means clustering their mean values. Markers reflect the component means, colors denote the assigned label.

    (TIF)

    S4 Fig. Label assignment using a trained clone annotation model.

    (A) Measurements are used to sample spatial contexts before the trained model is applied (blue and green path). In parallel, measurements are labeled using a marginal projection of the trained model (magenta path). The labels are then merged (red path). (B-D) Spatial context sampling. (B) Weighted undirected graph connecting adjacent cells. Line width reflects expression similarity between neighbors. (C) Community resolution is defined by aggregating clusters that fall below a hierarchical cut level δ. Panels show increasing levels of aggregation. Colors denote distinct communities. (D) Cut level is chosen by finding the maximum level (red dot) that remains lower than the decay constant of the spatial correlation function, ψ(δ) (black line). Panel E depicts aggregation below the third level for ease of visualization. (E) Application of the mixture model. (I) The graph contains distinct communities of locally similar expression. (II) Mean expression level within each community serves as the local average for each cell. (III) Mixture model estimates the probability that each cell belongs to each of its component. Bar plots within each cell illustrate the cumulative probability of each label. (IV) Posterior probabilities are diffused across the graph. (V) Each cell is assigned the most probable label. (F,G) Application of a marginal mixture model. (F) Marginal mixture components, shaded by their mapped labels. Dashed line is the overall marginal density. (G) Marginal classifier labels cells strictly on the basis of their individual fluorescence level. Red shading denotes the most probable label for each level. (H) Annotated measurements. Red shading denotes the assigned label. Labels with low confidence P^(mi)<0.8 are replaced by their marginal counterparts.

    (TIF)

    S5 Fig. Comparison of automated annotation with manually assigned labels.

    (A) Distribution of labels among each possible value. (B) Visual comparison of the sole instance in which automated and manual annotation differ. Image shows clonal marker fluorescence, colors denote the assigned label.

    (TIF)

    S6 Fig. Simulated growth of a synthetic cell culture.

    (A) Partial simulation time course. Each marker depicts a cell. Greyscale intensity reflects clonal marker gene dosage. Simulation time reflects the approximate number of cell divisions since the initial seed. (B) Simulations yield gene dosages and spatial coordinates for each cell. (C) Single iteration of an example simulation. Circles represent individual cells, red shading denotes clonal marker dosage. Cycles of cell division, recombination, and repositioning are repeated until the simulation reaches a specified end time (t > 11 in panel A). (D) Cultures simulated with varying recombination start times. All cultures were subject to four generations of recombination (δt = 4). Recombination start time increases from left to right. Later recombination events generally yield smaller clones. (E) Mean clone size (cells per clone) as a function of the recombination start time. Colors denote recombination period duration. Error bars reflect standard error of the mean across 50 replicates. Clone size generally decreases as recombination is limited to later times.

    (TIF)

    S7 Fig. Tunable generation of synthetic microscopy data.

    (A) Fluorescence levels are sampled from lognormal distributions conditioned upon gene dosage. (B) Synthetic data include a measured fluorescence level for each reporter in each cell. Text color reflects the generative distribution in A. (C) Synthetic image of clonal marker fluorescence when σα = 0.25. Each nucleus is shaded in accordance with its sampled fluorescence intensity. (D-F) Left to right, increasing the fluorescence ambiguity parameter broadens the overlap in fluorescence levels across gene dosages. (D) Distributions used to generate clonal marker fluorescence levels. Red shading denotes gene dosage. (E) Evenly weighted sum of the generative distributions. (F) Example images of clonal marker fluorescence.

    (TIF)

    S8 Fig. Fraction of nuclei correctly labeled during synthetic benchmarking.

    Each pixel reflects the average across 50 replicates. Clone size reflects the mean number of cells per clone. Performance improves with increasing clone size and worsens with increasing fluorescence ambiguity.

    (TIF)

    S9 Fig. Spatial context is most informative for large clones with ambiguous fluorescence.

    (A) MAE of labels assigned using a marginal classifier that neglects spatial context. Performance worsens with increasing fluorescence ambiguity but does not depend upon clone size. (B) Annotation performance relative to the marginal classifier. Color scale reflects the log2 fold-change in MAE when spatial context is neglected. Blue indicates that spatial context improves performance.

    (TIF)

    Attachment

    Submitted filename: responses_to_reviewers.pdf

    Data Availability Statement

    The data underlying the results presented in the study are available in a public data repository hosted by Northwestern University. DOI: https://doi.org/10.21985/N2F207.

    We have distributed the automated mosaic analysis framework as an open-source python package available at https://sebastianbernasek.github.io/flyqma. The associated code repository contains resources designed to help users analyze their own microscope images. These include code documentation, a guide to getting started with Fly-QMA, and an interactive tutorial that uses example data to demonstrate the core features of the software. We also intend to incorporate Fly-QMA into future versions of FlyEye Silhouette, our open-source desktop application for quantitative analysis of the larval eye. The code used to generate synthetic microscopy data is also freely available at https://github.com/sebastianbernasek/growth. All segmented and annotated eye discs are accessible via our data repository (https://doi.org/10.21985/N2F207).


    Articles from PLoS Computational Biology are provided here courtesy of PLOS

    RESOURCES