Closed-form density-based framework for automatic detection of cellular morphology changes

Tarn Duong; Bruno Goud; Kristine Schauer

doi:10.1073/pnas.1117796109

. 2012 May 14;109(22):8382-8387. doi: 10.1073/pnas.1117796109

Closed-form density-based framework for automatic detection of cellular morphology changes

Tarn Duong ¹, Bruno Goud ^1,¹, Kristine Schauer ^1,^1,²

PMCID: PMC3365204 PMID: 22586080

Abstract

A primary method for studying cellular function is to examine cell morphology after a given manipulation. Fluorescent markers attached to proteins/intracellular structures of interest in conjunction with 3D fluorescent microscopy are frequently exploited for functional analysis. Despite the central role of morphology comparisons in cell biological approaches, few statistical tools are available that allow biological scientists without a high level of statistical training to quantify the similarity or difference of fluorescent images containing multifactorial information. We transform intracellular structures into kernels and develop a multivariate two-sample test that is nonparametric and asymptotically normal to directly and quantitatively compare cellular morphologies. The asymptotic normality bypasses the computationally intensive calculations used by the usual resampling techniques to compute the P-value. Because all parameters required for the statistical test are estimated directly from the data, it does not require any subjective decisions. Thus, we provide a black-box method for unbiased, automated comparison of cell morphology. We validate the performance of our test statistic for finite synthetic samples and experimental data. Employing our test for the comparison of the morphology of intracellular multivesicular bodies, we detect changes in their distribution after disruption of the cellular microtubule cytoskeleton with high statistical significance in fixed samples and live cell analysis. These results demonstrate that density-based comparison of multivariate image information is a powerful tool for automated detection of cell morphology changes. Moreover, the underlying mathematics of our test statistic is a general technique, which can be applied in situations where two data samples are compared.

Keywords: hypothesis test, integrated density functional, optimal bandwidth selection, quantitative cell comparison

Fluorescent markers attached to proteins of interest in conjunction with modern fluorescent microscopy technologies are a useful proxy for studying subcellular compartments and their behavior after a given manipulation. Treatment with chemical compounds or specific gene silencing by RNA interference are commonly used at the scale of individual experiments to high-throughput studies. Visual inspection by expert biologists has been performed for several decades, ranging from early studies by microscopists like Ramon y Cajal to contemporary large scale, high-throughput screens (1–4). Although human observation may be very accurate, the three major drawbacks are that (i) it lacks quantitative measures, (ii) it may be biased, and (iii) it is time consuming.

The structural features of cells and the topological relationships between the numerous intracellular compartments give rise to multivariate data whose unbiased, automatic comparison is a major challenge. Importantly, alterations in cellular morphology also occur in many diseases, including cancer, requiring quantitative tools for their detection. Given that the majority of functional cell biology is based on image comparison, few tools are available that allow an unbiased, automatic comparison of the multivariate data encoded in fluorescent images. The cytometric tools developed so far are based on the extraction of a variety of numerical features from images in combination with classification strategies (5–9). Features represent any measured property derived from the image, such as total/mean/standard deviation of fluorescence intensity, texture, Zernike shape descriptions, etc. (5, 10). Although feature-based approaches have proved to be very powerful in detection of morphological changes (11–14), they may suffer from lack of biologically meaningful, human interpretable measurements due to the acquisition of abstract numerical features and high-dimensional feature vector analysis. Furthermore, they require careful choice and calibration for each comparison (15). Current approaches also suffer from reduction of information as multidimensional information is transformed to one-dimensional metrics such as distances (16). There are few statistical approaches that directly assess intracellular organization, which makes automated image analysis of intracellular topology challenging. Thus, spatial comparisons could complement feature-based techniques for analyzing cell morphology alterations.

Recently, we showed that global spatial organization of defined subcellular structures (e.g., organelles, membrane domains) can be quantified by probabilistic density maps (17). We grew cells on adhesive micropatterns that enforce cells to take a certain shape, mimicking tissues’ microenvironment (18). Image stacks of fluorescently marked proteins from several tens of cells were transformed into a cloud of coordinate points by segmentation analysis and were aligned using characteristic landmarks of micropatterns. To rigorously measure the topology of the fluorescently labeled subcellular structures, we centered Gaussian functions (kernels) with mean zero and an optimized variance at each of the data points and summed, revealing the underlying density throughout the cell. This analysis demonstrated that density estimation is a reliable statistical technique for the analysis of the morphology of subcellular structures whose point coordinates can be resolved, providing the basics for a comprehensive framework for statistical analysis. By transforming intracellular structures into three-dimensional kernels, alterations in cellular global organization, and thus cell morphology, can be translated into differences in density maps that are tractable by mathematical tools.

The problem of comparing two data samples has attracted much research to investigate its theoretical and practical aspects. Historically, the first methods involved small computational burdens. The well-known t-test developed in the Guinness brewery fits normal distributions with different means but with equal variances to each data sample, thus reducing the original problem to a simpler comparison for a difference in the means. However, this test is limited by the prespecification of the parametric form. Amongst the most widely known nonparametric tests for one-dimensional continuous data are the Mann-Whitney, Kolmogorov-Smirnov, and Wald-Wolfowitz tests (19). The need for analogous tests for multivariate data has been addressed (20–22). However, these multivariate approaches have not met with the same wide acceptance as their univariate ancestors, because the former have not consistently yielded intuitive inferences when applied to experimental data. Given that the t-test is a density-based comparison, replacing parametric density estimates with their nonparametric counterparts should lead to a more flexible testing procedure. Kernel smoothing is a widely used computational technique for density estimation due to its intuitive construction and interpretation (23). Thus, it is an ideal basis for nonparametric density-based testing. Kernel-based tests have been developed with other discrepancy measures (24–27), but all rely on computationally intensive resampling methods to compute the critical quantiles of the null distribution. Although resampling methods provide a general framework for consistent tests, a second major trade-off is that they require sufficient familiarity, as resampling requires calibration for each data analysis situation at hand. These constraints prevent the wide adoption of bootstrap kernel density-based testing outside the computational statistical community. In particular, these tests are not easily available to biologists.

Here, we develop a test statistic that is asymptotically normal under the null hypothesis, allowing density-based, “black-box” comparisons of multivariate data. We use simulated and experimental data analysis to verify its performance for finite samples. Given that 3D organizations of cells can be expressed by probabilistic density maps, this test allows us to assess the statistical significance of the similarity or difference between two cellular topologies. Analyzing the data from fluorescent images of intracellular organelles, this test allows us to compare cellular morphology under different conditions in an automated and unbiased manner.

Results

Construction of the Test Statistic.

We have used the usual squared discrepancy measure in order to construct a nonparametric and multivariate test statistic Inline graphic that is asymptotically normal under the null hypothesis. (Algorithmic details are deferred to the Methods section).

Let X₁,X₂,…X_n1 and Y₁,Y₂,…Y_n2 be d-variate random samples from their respective common densities f₁ and f₂. Concretely, X₁,X₂,…X_n1 are the spatial coordinates of subcellular structures extracted from a first group of images, and likewise for Y₁, Y₁,Y₂,…Y_n2 from a second group of images. So f₁ represents the steady-state spatial probability density function of the subcellular structures in the first images, and likewise for f₂. This is the same statistical framework used in Schauer et al. (17) to construct density maps from a single set of images. The kernel density estimates of f₁ and f₂ are

graphic file with name pnas.1117796109eq45.jpg

[1]

where K is the kernel function with Inline graphic , and H_l is a bandwidth matrix, for l = 1,2.

To test the null hypothesis H₀: f₁ = f₂, we follow Anderson et al. (28), who proposed the following discrepancy measure T = ∫[f₁(x) - f₂(x)]²dx. As is the case in the rest of this manuscript whenever the limits of integration are omitted, integration is taken over the appropriate Euclidean space. We use the squared error measure, since it has the most extensive body of work in automatic optimal selection of the smoothing parameters in comparison to other discrepancy measures such the absolute error, Kullback-Leibler error, and Shannon-Jenson error. We rewrite the discrepancy as T = ψ₁ + ψ₂ - (ψ_1,2 + ψ_2,1) where ψ_l = ∫f_l(x)²dx and ψ_l₁,l₂ = ∫f_l₁(x)f_l₂(x)dx. The test statistic is Inline graphic where

graphic file with name pnas.1117796109eq46.jpg

We can interpret this test statistic as the comparing intrasample pairwise differences X_i₁ - X _i₂ and Y_j₁ - Y_j₂ to the intersample pairwise differences X_i - Y_j. So if the latter are larger than the former, then this indicates that the samples are different. The following theorem is our main result, which establishes the asymptotic normality under the null hypothesis of the test statistic Inline graphic .

Theorem 1.

Under the conditions in the Methods section, and assuming that the null hypothesis holds, H₀: f₁ = f₂ = f. As n₁,n₂ → ∞, then , where and .

Null Distribution Parameter Estimation.

To use the asymptotic null distribution, we need to estimate the mean parameter μ_T and the variance parameter Inline graphic . For μ_T, Chacón and Duong (29) showed an algorithm to obtain consistent estimators of the bandwidth matrices H₁ and H₂ as minimizers of the asymptotic Mean Squared Error of and respectively. For , it is straightforward to show that an estimator is where is an estimator of the variance of f₁(X) and Inline graphic an estimator of the variance of f₂(Y). Previous research has indicated that asymptotic normal approximations of a null distribution tend to reject the null hypothesis more often than is indicated by the nominal level of significance (25). One of the primary causes is the overestimation of the variance. In the context of kernel estimators, this usually arises from using a bandwidth which is optimal for density estimation, but which leads to an inflated variance estimate. Our proposed solution is to estimate the variance more directly using a larger bandwidth, since larger bandwidths reduce the variance by mitigating the effect that individual data points have on the value of the kernel estimator. Examining the first order Taylor’s series expansion about the expected value: f(X) ∼ f(EX) + (X - EX)^TDf(EX) where Inline graphic is the column vector of first partial order derivatives, thus Var f(X) ∼ [Df(EX)]^T(Var X)[Df(EX)]. So plug-in estimators of and are and where , are the sample means, S_l are the sample variances, and are the normal scale selectors for a kernel estimator of the first density derivative (30).

Given these parameter estimates, the standard equation to obtain a z-score from Inline graphic is . The p-value is then computed from this z-score using standard software or tables. The completely automatic testing procedure (including the parameter estimation, and the computation of the test statistic and its P-value) is programmed in the ks library (31) in the open-source R programming language.

Simulated Data Analysis.

To verify the performance of our kernel density-based test for finite samples, we performed simulation studies using pairs of mixture normal densities, mostly taken from Chacón (32). The contour plots of these test densities as well as representative scatter plots for the two considered sample sizes (n = 100 and n = 1000) are displayed in Fig. 1. The first pair N((-1/2,0),I₂) and N((1/2,0); I₂) represent two single normal densities with identity variance, whose means are separated by distance of 1. This example was treated as base case. The second pair both are bimodal densities, 1/2N((1,-1),Σ) + 1/2N((-1,1),Σ) and 1/2N((1,-1),Σ) + 1/2N((1,-1),I₂) where Σ = [4/9 4/15; 4/15 4/9]. The lower right component of the pairs was exactly the same, but their upper right component was different, making it potentially a challenging case to distinguish between two finite samples. As a third example, we chose a pair 3, N((0,0),I₂) and 1/2N((0,0),I₂) + 1/10N((0,0),1/16 I₂) + 1/10N((-1,-1),1/16 I₂) + 1/10N((-1,1),1/16 I₂) + 1/10N((1,-1),1/16 I₂) + 1/10N((1,1),1/16 I₂), that have (approximately) zero mean and identity variance. Because this pair reveals different internal structure, it would most likely benefit from a density-based, rather than a moment-based, test.

Fig. 1. — Contour plots of simulated data. Contour plots for three pairs of simulated normal mixture densities (*Left*). Representative scatter plots for n = 100 (*Middle*). Representative scatter plots for n = 1000 (*Right*).

First, we verified the asymptotic normality of Inline graphic by comparing the density estimates of the z-scores with the standard normal (Fig. S1). The larger sample gave better estimates of the zero mean. On the other hand, performance in variance estimation was more uneven, since the n = 1000 samples did not lead to better variance estimates for pair 2. Related results have been observed previously (25), indicating that the variance estimation is the most difficult part in calibrating an asymptotically normal null distribution.

We performed simulations of the test statistic for two common nominal levels of significance α = 0.05, 0.01 (Table 1), where α is the error rate of rejecting the null hypothesis H₀ when the null hypothesis is true (false positive). To estimate how close our statistical test in achieving this error rate, we computed the proportions of experiments Inline graphic where two samples are simulated from the same distribution, which reject H₀. Given a level of significance, the other error that can be made is to accept the null hypothesis H₀ when it is false (false negative). We estimated this by computing the proportion of the experiments where two samples are simulated from different distributions, which H₀ is accepted. The empirical power is Inline graphic . For the smaller sample size, we found that the empirical significance levels were close to the nominal values, but the power was low for pairs 2 and 3. This indicated that n = 100 data points were not sufficient to distinguish reliably between these more difficult comparisons. For the larger sample, the lack of power was resolved for all three pairs. The empirical levels of significance were more conservative for the larger sample size. This simulation evidence demonstrated that our proposed test does not identify more false positives than expected from the nominal level of significance and identifies almost all true negatives.

Table 1.

Simulation results

		Nominal α = 0.05		Nominal α = 0.01		Z
						mean	SD
	Pair 1	0.010	0.914	0.002	0.830	-0.226	0.781
n = 100	Pair 2	0.018	0.052	0.004	0.026	-0.285	0.919
	Pair 3	0.038	0.446	0.010	0.264	-0.160	1.019
	Pair 1	0.018	1.000	0.006	1.000	-0.1379	0.760
n = 1000	Pair 2	0.020	0.946	0.006	0.810	-0.0625	0.758
	Pair 3	0.074	1.000	0.026	1.000	-0.0001	1.191

Open in a new tab

Comparison of empirical level of significance ( Inline graphic ) to the nominal level (α), empirical power () and central moments of the null distribution of the normalized for n = 100; n = 1000.

To evaluate the performance of our test, we compared it to a parametric alternative. The t-test is a well-known hypothesis test for univariate data, which has been generalized for multivariate data in Nel and Van der Merwe (33). The average P-values from 100 simulations of sample size 1000 were computed for each of the three pairs target densities (Table S1). As expected, the modified Nel and Van der Merwe (MNV) was more sensitive for the first pair, which only differed in mean (average P-value = 0). However, the kernel test was still highly significant (average P-value = 1.142 ∗ 10^-29). For the next two pairs with similar mean values but clear differences in the internal organizations between the two densities, the MNV test gave nonsignificant average P-values of 0.5195 and 0.2158, whereas the kernel-based test gave highly significant average P-values of 1.353 ∗ 10^-8 and 3.386 ∗ 10^-23. This demonstrated that our density-based test outperformed the parametric MNV test in the detection of differences in internal organization.

Detection of Morphological Changes in Micropatterned Cells after Drug Treatment.

To evaluate how our test is performing on experimental data, we compared the morphologies of intracellular structures under different experimental conditions (Fig. 2). As indicated in the flowchart (Fig. 2A), we analyzed the morphology changes of multivesicular bodies (MVB) induced by a treatment with the drug nocodazole (NZ) that depolymerizes microtubules, a major component of the cellular cytoskeleton. MVB are endosomes involved in several important cellular functions, including processing of nutrients, ligands and receptors during endocytosis, exosome secretion, and autophagy (34) that are transported along microtubules (35). Intracellular MVB were visualized by indirect immunofluorescence against CD63, a transmembrane protein enriched in MVB (Fig. 2B). Cells were cultured on micropatterns of extracellular matrix proteins that standardize cell shape and allow alignment of CD63-marked structures. Combining the signals of CD63-marked components from several tens of cells, we showed that the 3D organization of MVB is reproducible in these normalized conditions (17). Disruption of microtubules with NZ disconnects MVB from microtubules, leading to subtle changes in cell morphology (Fig. 2C). We transformed the fluorescent signal of normalized cells into coordinates by segmentation analysis as previously reported (17). All detected signals from a control group of cells and a group of cells exposed to treatment conditions were combined to the test populations f₁ and f₂, respectively. In previous analysis, we estimated that pooling signals from about 20–30 normalized cells (containing several hundreds of structures each) was required to produce reliable density maps (17). So we took these cell numbers as a starting point for our comparisons. Representative 2D and 3D scatter plots of 40 cells are shown in Fig. 2D–G; the 2D scatter plots of individual cells are represented in Fig. S2. The coordinates from 40 cells from each condition were compared by Inline graphic .

Fig. 2. — Changes of CD63 morphology upon disruption of the microtubule cytoskeleton. (A) Flow chart of morphology comparison between two populations of cells seeded on micropatterned substrates. (B), (C) Maximum intensity projection of the deconvolved fluorescence of CD63-marked multivesicular bodies (MVB) without treatment (Ctrl) (B) and upon nocodazole (NZ) treatment (C). Scale bars are 10 μm. (D), (E) Representative 2D scatter plots for 40 cells with n = 11786 structures for Ctrl (D) and for 40 cells with n = 13615 structures for NZ (E). (F), (G) Representative 3D scatter plots for Ctrl (F) and NZ treatment (G) The z-axis is zoomed 500%. (H) Average P-values from kernel test (solid lines) and permutation analysis (dashed lines) as a function of the number of cells analyzed for 100 comparisons. Average P-values for test statistics for Ctrl (black) and NZ treatments (gray).

First, we compared a nontreated control group 1 of 40 cells with 11786 detected structures with a second control group 2 of 40 cells containing 12585 structures. The two control samples gave slightly different estimates of the CD63 steady-state distribution (Fig. S3). We estimated the normalized Inline graphic and corresponding to P_2D-value of 0.2581 and P_3D-value = 0.1138. So there is strong evidence that the minor differences between the two control samples are not significant.

We then compared the control group 1 with the NZ treated group of 40 cells with 13615 detected structures. For the control versus treatment condition, we estimated the normalized Inline graphic and giving rise to P_2D-value of 1.589 ∗ 10^-5 and P_3D-value = 1.280 ∗ 10^-11, indicating that there is strong evidence that the drug treatment significantly affects the distribution of MVB. These results agree with previous studies demonstrating microtubule-dependent movement of MVB (35). To further evaluate our approach, we performed additional analysis of diverse subcellular structures (SI Text, Fig. S4, and Table S2).

Second, we compared how our test is performing in comparison to a resampling strategy that was previously established for the comparison of fluorescent images (17). We calculated average P-values from either our test statistics or the permutation analysis as a function of the number of cells analyzed, taking 100 random samples of 1, 2, 10, 20, and 40 cells (Fig. 2H, Table S3). First, we randomly picked two subsamples from the control conditions (Ctrl) to estimate the false positive rate of our test. Then we compared one control subsample with one subsample taken from treated condition (NZ). Dashed lines represent the permutation test; solid lines represent the kernel density test (Fig. 2H). According to the fundamental P-value calculations, P-values follow a uniform distribution on [0, 1] and thus mean 0.5, assuming the null hypothesis holds as expected for Ctrl. This is true for the permutation test, since it can mimic the sampling distribution of the test statistic. The kernel density test gives more small P-values (false positives) than predicted due to the asymptotic approximation (see also Fig. S5). However, an average P-value > 0.05 was obtained in each case, not rejecting the null hypothesis at the usual significance levels. Applied to the comparison between Ctrl and NZ treatment, the kernel density test gives lower P-values (more true positives) that the permutation test for < 10 cells. Together, these two tests gave the same conclusions when testing a treatment for more than 10 cells (as typically analyzed), demonstrating that the normal approximation for the sampling distribution was as accurate as bootstrap resampling in this case. Thus, our test statistic is comparable to bootstrap resampling.

Next, we evaluated how sensitive our method is to errors in cell alignment on patterned substrates. We systematically estimated how strongly our test statistic degrades as a function of rotational and translational misalignment (see SI Text and Table S4). Overall, as expected, the P-values uniformly decrease as the magnitude of misalignment increases. As expected from the simulated data analysis, our test was highly sensitive when the entire cell sample to be compared was misaligned. Both rotations of as little as 10° and translations of 20–30 pixels were sufficient to give significant P-values (P < 0.05) (pattern size was 550 pixels). Our test was, however, less sensitive to random misalignment in individual cells within one sample. Significant P-values for rotations appeared at 30–40° and at 30–50 pixels for translations. This analysis highlights the importance of cell alignment to reduce false positive results. Together, our analysis on CD63 demonstrated that the density-based comparison is well suited to detect changes in steady-state morphology of cells cultured under controlled conditions of adhesion.

Detection of Morphological Changes in Live-Cell Assays in Unconstrained Cells.

To demonstrate that our density-based framework is also valid for the detection of morphological changes in unconstrained cells that are classically studied, we applied our test to live cell analysis. Because cells maintained a consistent orientation during a given time period, fast changes in intracellular organization as observed after drug treatments could be analyzed by our density-based method. We comprehensively benchmarked the statistical method on the dynamics of MVB in unconstrained cells before and after treatment with NZ (Fig. 3). We acquired 3D stacks over 24 min with acquisition at each 60 s, extracted 3D positional information of labeled compartments by segmentation, and compared morphology changes.

We split the images of Movie S1 into four groups (1–4) containing six images each (Fig. 3A). Groups 1 and 2 represented the nontreated control groups, with 1080 and 1002 detected CD63-positive structures that were acquired before addition of the drug. Groups 3 and 4 were the treatment test groups, containing 1019 and 801 structures that were recorded after the addition of the drug. The corresponding 2D and 3D scatter plots of the four groups are shown in Fig. 3B and C. We applied our test statistic on each of the possible combination of pairs. The corresponding P-values for the 2D and 3D comparison are listed in Table S5. The results indicated that whereas no significant changes in CD63 morphology was detected before drug treatment (P_2D-value = 0.4136; P_3D-value = 0.3565), the treatment with NZ significantly affected CD63-morphology (P_2D-value = 3.998 ∗ 10^-6; P_3D-value = 4.844 ∗ 10^-6). The effect of the drug was more significant for later time points in agreement with visual inspection of the images, demonstrating that the statistical significance can quantify compound influence. Thus, our approach allowed unbiased automated detection of morphological changes in live-cell assays in unconstrained cells.

Discussion

We have developed a test statistic which inherits the advantages of kernel density estimates to facilitate generally applicable two-sample comparisons of multivariate data. By drawing on recent distributional results for kernel estimators, we were able to express its null distribution in a closed asymptotic form, thus circumventing the requirement for resampling to determine the critical quantiles of the null distribution.

This test allowed us to compare complex data from fluorescent microscopy without reducing the provided information into simple summary statistics. This allowed quantitative comparison of cellular morphology by directly measuring the three-dimensional organization of intracellular structures visualized by fluorescent microscopy. Note that our proposed image comparison focuses on the spatial localization of structures whose point coordinates can be resolved. It is also applicable to the comparison of continuous structures, such as microtubules, but performance is not optimal. As our test statistic requires independent data points (like most kernel-based estimators), the representation of continuous structures by several connected coordinates leads to smaller P-values. This is a disadvantage compared to commonly used feature-based techniques that collate any type of measured quantity from the microscopy images and therefore can also be applied even to diffuse fluorescent patterns. However, feature-based comparisons require the critical stage of feature selection to be calibrated carefully for each comparison (15), especially in order to compute P-values. So the challenge to automatize optimal feature selection into a black-box method remains an open problem. Our simpler, more direct approach of comparing spatial distributions does not face equivalent problems allowing full automatization. As image acquisition facilities are developed at an accelerated speed, there is a rising need for image analysis tools that estimate the required test parameters directly from the data and do not require computationally intensive analytical techniques. As feature-based and density-based approaches use different numerical information, they are complementary to each other and the combination of both of them should lead to an improvement of image analysis.

Such computational imaging methods are indispensable tools for the high-content and high-throughput image acquisition capability of advanced microscopes that daily acquire thousands of high-resolution images in time-lapse experiments. We have shown that our density-based mathematical framework is powerful for phenotype profiling and can be easily adjusted to high-throughput analysis. Attempts are underway to incorporate this computational imaging comparison into a high-throughput workflow to screen for cell morphological changes due to chemical compounds treatment and siRNA-based gene silencing.

A second disadvantage of our approach is that it requires the cells to have a constant shape in order to construct spatial density maps and the test statistic. Fortunately, the micropatterning technique allows us to grow cells reproducibly into standardized shapes in culture. One important advantage of growing cells in controlled conditions of adhesion is that cells are much closer to their physiological state in tissues, where cells are restrained, than in classical (unconstrained) culture conditions on Petri dishes. Another advantage is that standardizing cells by micropatterning technology represents an important step towards quantitative approaches in cell biology. We have also shown that our testing procedure can be applied to live cell comparisons. By orienting an unconstrained cell through time, we validate that unconstrained cells are in principle analyzable. However, alignment of unconstrained cells with the help of computational approaches (36, 37) will be a requisite in order to apply density-based comparison as a general approach for unconstrained cells. An important future application is to compare cell spatial morphology in tissues, in which many cell types show reproducible shapes and inherent polarization. In particular, this application would be important to detect alterations in the cellular architecture during pathological processes such as cancer. Since imaging and alignment of tissue cells is more challenging, it is not yet suitable to apply our testing procedure.

A promising extension of our test is to elaborate the regions of the sample space, which are the largest contributors to the overall statistical difference. There have been some attempts to tackle this problem (38) using a heuristic density differences approach based on data mining approaches, but which is unable to make rigorous statistical inferences. Developing a rigorous analogue would be an important advance for the analysis of the multivariate samples comparisons. Another future challenge is analyzing structures with a diffuse fluorescence. An alternative is to consider diffuse fluorescence patterns as functional data; i.e., not to be finite dimensional vectors (as in multivariate analysis) but infinite dimensional functions. The state of the art in formal hypothesis testing for functional data analysis is less advanced than in multivariate data analysis, leaving a testing procedure for functional data analogous to our proposed statistic an open problem.

Methods

Proof of Theorem 1:

To establish the asymptotic sampling distribution of , we follow the approach of Chacón and Duong (29). Suppose that the conditions hold. For l = 1, 2,

(F) The target densities f_l have two derivatives, which are bounded, continuous, and square integrable.

(H) The bandwidths H_l = H_l(n_l) are a sequence of symmetric, positive definite matrices such that all elements of H_l → 0 and as n_l → ∞.

(K) The kernel K is a symmetric probability density function such that m₀(K²) = ∫K(x)²dx is finite, and that ∫xx^TK(x)dx = m₂(K)I_d for some real number m₂(K) and I_d is the d × d identity matrix.

(N) The sample sizes n₁,n₂ are such that n₁/n₂ and n₂/n₁ are bounded away from zero and infinity as n₁,n₂ → ∞. The proof is deferred to the SI Text.

Cells and Sample Preparation.

Cell culture and sample preparation for fixed cells was as in ref. 17. Antibodies used were primary α-CD63 (Invitrogen), Sec13 (17), α-tubulin (BD Biosciences) and Alexa-Fluor 488, Cy-3, or Cy-5-coupled secondary antibodies. EGFP-CD63-expressing stable cells (generated by transfection of the plasmid pEGFP-CD63 (34) into RPE-1 cells) were seeded on iwaki glass base dishes (Asahi Glass) for live cell observation. To depolymerize microtubules, NZ was added to a final concentration of 20 μM. Cells were imaged before and after addition of NZ.

Immunofluorescence Image Acquisition and Processing.

Image acquisition of fixed cells was as in ref. 17. Live cell imaging was performed on a Yokogawa spinning disc mounted on an Eclipse TE2000 Inverted Microscope using 60x Plan Apo VC 1.4 Oil objective, Laser 491 nm and CCD camera (Roper CoolSnap HQ2). Z-series of images were taken every 0.2 μm every 60 s.

Images were segmented with MetaMorph (Universal Imaging Corporation) as described in ref. 17. Briefly, the centroids of fluorescent objects were detected as fluctuations that are 15-fold larger than noise. The watershed function was routinely applied to precisely detect individual structures in dense regions. The coordinates of the segmented structures were aligned using the center of the micropatterns as in ref. 17 and used in a completely automatic testing procedure programmed in the ks library (31) in the open-source R programming language.

Supplementary Material

Supporting Information

supp_109_22_8382__index.html^{(1.1KB, html)}

Acknowledgments.

We thank Christophe Zimmer and Jost Enninga for critical reading of the manuscript, and Sara Chambard for participating in the analysis for the parametric test. We acknowledge the Nikon Imaging Centre and Imaging Facility at Institut Curie (IC)—Centre National de la Recherche Scientifique (CNRS) for support with microscopes and deconvolution service. K.S. received funding from the Fondation pour la Recherche Médicale en France and Association pour la Recherche sur le Cancer. T.D. was supported by Mayent-Rothschild. This project was further supported by grants from Agence Nationale de la Recherche, the CNRS, and IC.

Footnotes

Conflict of interest statement: A patent has been filed on the reported approach.

This article is a PNAS Direct Submission.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1117796109/-/DCSupplemental.

References

1.Farhan H, et al. MAPK signaling to the early secretory pathway revealed by kinase/phosphatase functional screening. J Cell Biol. 2010;189:997–1011. doi: 10.1083/jcb.200912082. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Kiger AA, et al. A functional genomic analysis of cell morphology using RNA interference. J Biol. 2003;2:27. doi: 10.1186/1475-4924-2-27. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Kim JK, et al. Functional genomic analysis of RNA interference in C. elegans. Science. 2005;308:1164–1167. doi: 10.1126/science.1109267. [DOI] [PubMed] [Google Scholar]
4.Ramon y Cajal S. Studies on Vertebrate Neurogenesis. Springfield, IL: CC Thomas; 1960. [Google Scholar]
5.Boland MV, Murphy RF. A neural network classifier capable of recognizing the patterns of all major subcellular structures in fluorescence microscope images of HeLa cells. Bioinformatics. 2001;17:1213–1223. doi: 10.1093/bioinformatics/17.12.1213. [DOI] [PubMed] [Google Scholar]
6.Carpenter AE, et al. CellProfiler: image analysis software for identifying and quantifying cell phenotypes. Genome Biol. 2006;7:R100. doi: 10.1186/gb-2006-7-10-r100. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Jones TR, et al. Scoring diverse cellular morphologies in image-based screens with iterative feedback and machine learning. Proc Natl Acad Sci USA. 2009;106:1826–1831. doi: 10.1073/pnas.0808843106. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Lamprecht MR, Sabatini DM, Carpenter AE. CellProfiler: free, versatile software for automated biological image analysis. Biotechniques. 2007;42:71–75. doi: 10.2144/000112257. [DOI] [PubMed] [Google Scholar]
9.Loo LH, et al. An approach for extensibly profiling the molecular states of cellular subpopulations. Nat Methods. 2009;6:759–765. doi: 10.1038/nmeth.1375. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Ljosa V, Carpenter AE. Introduction to the quantitative analysis of two-dimensional fluorescence microscopy images for cell-based screening. PLoS Comput Biol. 2009;5:e1000603. doi: 10.1371/journal.pcbi.1000603. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Chen SC, Zhao T, Gordon GJ, Murphy RF. Automated image analysis of protein localization in budding yeast. Bioinformatics. 2007;23:i66–i71. doi: 10.1093/bioinformatics/btm206. [DOI] [PubMed] [Google Scholar]
12.Moffat J, et al. A lentiviral RNAi library for human and mouse genes applied to an arrayed viral high-content screen. Cell. 2006;124:1283–1298. doi: 10.1016/j.cell.2006.01.040. [DOI] [PubMed] [Google Scholar]
13.Perlman ZE, et al. Multidimensional drug profiling by automated microscopy. Science. 2004;306:1194–1198. doi: 10.1126/science.1100709. [DOI] [PubMed] [Google Scholar]
14.Singh DK, et al. Patterns of basal signaling heterogeneity can distinguish cellular populations with different drug sensitivities. Mol Syst Biol. 2010;6:369–379. doi: 10.1038/msb.2010.22. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Logan DJ, Carpenter AE. Screening cellular feature measurements for image-based assay development. J Biomol Screen. 2010;15:840–846. doi: 10.1177/1087057110370895. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Andrey P, et al. Statistical analysis of 3D images detects regular spatial distributions of centromeres and chromocenters in animal and plant nuclei. PLoS Comput Biol. 2010;6:e1000853. doi: 10.1371/journal.pcbi.1000853. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Schauer K, et al. Probabilistic density maps to study global endomembrane organization. Nat Methods. 2010;7:560–566. doi: 10.1038/nmeth.1462. [DOI] [PubMed] [Google Scholar]
18.Thery M, et al. Anisotropy of cell adhesive microenvironment governs cell internal organization and orientation of polarity. Proc Natl Acad Sci USA. 2006;103:19771–19776. doi: 10.1073/pnas.0609267103. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Gibbons JD, Chakraborti S. Nonparametric Statistical Inference. New York: Marcel Dekker; 2003. Nonparametric statistical inference. [Google Scholar]
20.Bickel PJ. A distribution free version of the Smirnov two-sample test in the p-variate case. Ann Math Stat. 1969;40:1–23. [Google Scholar]
21.Friedman JH, Rafsky LC. Multivariate generalizations of the Wald-Wolfowitz and Smirnov 2-sample tests. Ann Stat. 1979;7:697–717. [Google Scholar]
22.Liu RY, Singh K. A quality index based on data depth and multivariate rank-tests. J Am Stat Assoc. 1993;88:252–260. [Google Scholar]
23.Simonoff JS. Smoothing Methods in Statistics. New York: Springer-Verlag; 1996. [Google Scholar]
24.Alba Fernandez V, Jimenez Gamero MD, Munoz Garcia J. A test for the two-sample problem based on empirical characteristic functions. Comput Stat Data Anal. 2008;52:3730–3748. [Google Scholar]
25.Cao R, Van Keilegom I. Empirical likelhood ratio tests for two-sample problems via non-parametric density estimation. Can J Stat. 2006;34:61–77. [Google Scholar]
26.Martınez-Camblor P, De Una-Alvarez J, Corral N. k-sample test based on the common area of kernel density estimators. J Stat Plan Inference. 2008;138:4006–4020. [Google Scholar]
27.Molanes-Lopez EM, Cao R. Plug-in bandwidth selector for the kernel relative density estimator. Ann Inst Stat Math. 2008;60:273–300. [Google Scholar]
28.Anderson NH, Hall P, Titterington DM. Two-sample test statistics for measuring discrepancies between two multivariate probability density functions using kernel-based density estimates. J Multivar Anal. 1994;50:41–54. [Google Scholar]
29.Chacon JE, Duong T. Multivariate plug-in bandwidth selection with unconstrained bandwidth matrices. Test. 2010;19:375–398. [Google Scholar]
30.Chacon JE, Duong T, Wand MP. Asymptotics for general multivariate kernel density derivative estimators. Stat Sinica. 2011;21:807–840. [Google Scholar]
31.Duong T. ks: Kernel density estimation and kernel discriminant analysis for multivariate data in R. J Stat Softw. 2007;21:1–16. [Google Scholar]
32.Chacon JE. Data-driven choice of the smoothing parametrization for kernel density estimators. Can J Stat. 2009;37:249–255. [Google Scholar]
33.Nel DG, Van der Merve CA. A solution to the multivariate Behrens-Fisher problem. Commun Stat Theory Methods. 1986;15:3719–3735. [Google Scholar]
34.Ostrowski M, et al. Rab27a and Rab27b control different steps of the exosome secretion pathway. Nat Cell Biol. 2010;12:19–30. doi: 10.1038/ncb2000. [DOI] [PubMed] [Google Scholar]
35.Driskell OJ, Mironov A, Allan VJ, Woodman PG. Dynein is required for receptor sorting and the morphogenesis of early endosomes. Nat Cell Biol. 2007;9:113–120. doi: 10.1038/ncb1525. [DOI] [PubMed] [Google Scholar]
36.Pincus Z, Theriot JA. Comparison of quantitative methods for cell-shape analysis. J Microsc. 2007;227:140–156. doi: 10.1111/j.1365-2818.2007.01799.x. [DOI] [PubMed] [Google Scholar]
37.Zhao T, Murphy RF. Automated learning of generative models for subcellular location: building blocks for systems biology. Cytometry A. 2007;71:978–90. doi: 10.1002/cyto.a.20487. [DOI] [PubMed] [Google Scholar]
38.Duong T, Koch I, Wand MP. Highest density difference region estimation with application to flow cytometric data. Biom J. 2009;51:504–521. doi: 10.1002/bimj.200800201. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supporting Information

supp_109_22_8382__index.html^{(1.1KB, html)}

1117796109_pnas.1117796109_SI.pdf^{(2.5MB, pdf)}

Download video file^{(330.3KB, avi)}

[B1] 1.Farhan H, et al. MAPK signaling to the early secretory pathway revealed by kinase/phosphatase functional screening. J Cell Biol. 2010;189:997–1011. doi: 10.1083/jcb.200912082. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B2] 2.Kiger AA, et al. A functional genomic analysis of cell morphology using RNA interference. J Biol. 2003;2:27. doi: 10.1186/1475-4924-2-27. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B3] 3.Kim JK, et al. Functional genomic analysis of RNA interference in C. elegans. Science. 2005;308:1164–1167. doi: 10.1126/science.1109267. [DOI] [PubMed] [Google Scholar]

[B4] 4.Ramon y Cajal S. Studies on Vertebrate Neurogenesis. Springfield, IL: CC Thomas; 1960. [Google Scholar]

[B5] 5.Boland MV, Murphy RF. A neural network classifier capable of recognizing the patterns of all major subcellular structures in fluorescence microscope images of HeLa cells. Bioinformatics. 2001;17:1213–1223. doi: 10.1093/bioinformatics/17.12.1213. [DOI] [PubMed] [Google Scholar]

[B6] 6.Carpenter AE, et al. CellProfiler: image analysis software for identifying and quantifying cell phenotypes. Genome Biol. 2006;7:R100. doi: 10.1186/gb-2006-7-10-r100. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B7] 7.Jones TR, et al. Scoring diverse cellular morphologies in image-based screens with iterative feedback and machine learning. Proc Natl Acad Sci USA. 2009;106:1826–1831. doi: 10.1073/pnas.0808843106. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B8] 8.Lamprecht MR, Sabatini DM, Carpenter AE. CellProfiler: free, versatile software for automated biological image analysis. Biotechniques. 2007;42:71–75. doi: 10.2144/000112257. [DOI] [PubMed] [Google Scholar]

[B9] 9.Loo LH, et al. An approach for extensibly profiling the molecular states of cellular subpopulations. Nat Methods. 2009;6:759–765. doi: 10.1038/nmeth.1375. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B10] 10.Ljosa V, Carpenter AE. Introduction to the quantitative analysis of two-dimensional fluorescence microscopy images for cell-based screening. PLoS Comput Biol. 2009;5:e1000603. doi: 10.1371/journal.pcbi.1000603. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B11] 11.Chen SC, Zhao T, Gordon GJ, Murphy RF. Automated image analysis of protein localization in budding yeast. Bioinformatics. 2007;23:i66–i71. doi: 10.1093/bioinformatics/btm206. [DOI] [PubMed] [Google Scholar]

[B12] 12.Moffat J, et al. A lentiviral RNAi library for human and mouse genes applied to an arrayed viral high-content screen. Cell. 2006;124:1283–1298. doi: 10.1016/j.cell.2006.01.040. [DOI] [PubMed] [Google Scholar]

[B13] 13.Perlman ZE, et al. Multidimensional drug profiling by automated microscopy. Science. 2004;306:1194–1198. doi: 10.1126/science.1100709. [DOI] [PubMed] [Google Scholar]

[B14] 14.Singh DK, et al. Patterns of basal signaling heterogeneity can distinguish cellular populations with different drug sensitivities. Mol Syst Biol. 2010;6:369–379. doi: 10.1038/msb.2010.22. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B15] 15.Logan DJ, Carpenter AE. Screening cellular feature measurements for image-based assay development. J Biomol Screen. 2010;15:840–846. doi: 10.1177/1087057110370895. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B16] 16.Andrey P, et al. Statistical analysis of 3D images detects regular spatial distributions of centromeres and chromocenters in animal and plant nuclei. PLoS Comput Biol. 2010;6:e1000853. doi: 10.1371/journal.pcbi.1000853. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B17] 17.Schauer K, et al. Probabilistic density maps to study global endomembrane organization. Nat Methods. 2010;7:560–566. doi: 10.1038/nmeth.1462. [DOI] [PubMed] [Google Scholar]

[B18] 18.Thery M, et al. Anisotropy of cell adhesive microenvironment governs cell internal organization and orientation of polarity. Proc Natl Acad Sci USA. 2006;103:19771–19776. doi: 10.1073/pnas.0609267103. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B19] 19.Gibbons JD, Chakraborti S. Nonparametric Statistical Inference. New York: Marcel Dekker; 2003. Nonparametric statistical inference. [Google Scholar]

[B20] 20.Bickel PJ. A distribution free version of the Smirnov two-sample test in the p-variate case. Ann Math Stat. 1969;40:1–23. [Google Scholar]

[B21] 21.Friedman JH, Rafsky LC. Multivariate generalizations of the Wald-Wolfowitz and Smirnov 2-sample tests. Ann Stat. 1979;7:697–717. [Google Scholar]

[B22] 22.Liu RY, Singh K. A quality index based on data depth and multivariate rank-tests. J Am Stat Assoc. 1993;88:252–260. [Google Scholar]

[B23] 23.Simonoff JS. Smoothing Methods in Statistics. New York: Springer-Verlag; 1996. [Google Scholar]

[B24] 24.Alba Fernandez V, Jimenez Gamero MD, Munoz Garcia J. A test for the two-sample problem based on empirical characteristic functions. Comput Stat Data Anal. 2008;52:3730–3748. [Google Scholar]

[B25] 25.Cao R, Van Keilegom I. Empirical likelhood ratio tests for two-sample problems via non-parametric density estimation. Can J Stat. 2006;34:61–77. [Google Scholar]

[B26] 26.Martınez-Camblor P, De Una-Alvarez J, Corral N. k-sample test based on the common area of kernel density estimators. J Stat Plan Inference. 2008;138:4006–4020. [Google Scholar]

[B27] 27.Molanes-Lopez EM, Cao R. Plug-in bandwidth selector for the kernel relative density estimator. Ann Inst Stat Math. 2008;60:273–300. [Google Scholar]

[B28] 28.Anderson NH, Hall P, Titterington DM. Two-sample test statistics for measuring discrepancies between two multivariate probability density functions using kernel-based density estimates. J Multivar Anal. 1994;50:41–54. [Google Scholar]

[B29] 29.Chacon JE, Duong T. Multivariate plug-in bandwidth selection with unconstrained bandwidth matrices. Test. 2010;19:375–398. [Google Scholar]

[B30] 30.Chacon JE, Duong T, Wand MP. Asymptotics for general multivariate kernel density derivative estimators. Stat Sinica. 2011;21:807–840. [Google Scholar]

[B31] 31.Duong T. ks: Kernel density estimation and kernel discriminant analysis for multivariate data in R. J Stat Softw. 2007;21:1–16. [Google Scholar]

[B32] 32.Chacon JE. Data-driven choice of the smoothing parametrization for kernel density estimators. Can J Stat. 2009;37:249–255. [Google Scholar]

[B33] 33.Nel DG, Van der Merve CA. A solution to the multivariate Behrens-Fisher problem. Commun Stat Theory Methods. 1986;15:3719–3735. [Google Scholar]

[B34] 34.Ostrowski M, et al. Rab27a and Rab27b control different steps of the exosome secretion pathway. Nat Cell Biol. 2010;12:19–30. doi: 10.1038/ncb2000. [DOI] [PubMed] [Google Scholar]

[B35] 35.Driskell OJ, Mironov A, Allan VJ, Woodman PG. Dynein is required for receptor sorting and the morphogenesis of early endosomes. Nat Cell Biol. 2007;9:113–120. doi: 10.1038/ncb1525. [DOI] [PubMed] [Google Scholar]

[B36] 36.Pincus Z, Theriot JA. Comparison of quantitative methods for cell-shape analysis. J Microsc. 2007;227:140–156. doi: 10.1111/j.1365-2818.2007.01799.x. [DOI] [PubMed] [Google Scholar]

[B37] 37.Zhao T, Murphy RF. Automated learning of generative models for subcellular location: building blocks for systems biology. Cytometry A. 2007;71:978–90. doi: 10.1002/cyto.a.20487. [DOI] [PubMed] [Google Scholar]

[B38] 38.Duong T, Koch I, Wand MP. Highest density difference region estimation with application to flow cytometric data. Biom J. 2009;51:504–521. doi: 10.1002/bimj.200800201. [DOI] [PubMed] [Google Scholar]

PERMALINK

Closed-form density-based framework for automatic detection of cellular morphology changes

Tarn Duong

Bruno Goud

Kristine Schauer

Abstract