Abstract
The morphology and morphodynamics of cells as important biomarkers of the cellular state are widely appreciated in both fundamental research and clinical applications. Quantification of cell morphology often requires a large number of geometric measures that form a high-dimensional feature vector. This mathematical representation creates barriers to communicating, interpreting, and visualizing data. Here, we develop a deep learning-based algorithm to project 13-dimensional (13D) morphological feature vectors into 2-dimensional (2D) morphological latent space. We show that the projection has less than 5% information loss and separates the different migration phenotypes of metastatic breast cancer cells. Using the projection, we demonstrate the phenotype-dependent motility of breast cancer cells in the 3D extracellular matrix, and the continuous cell state change upon drug treatment. We also find that dynamics in the 2D morphological latent space quantitatively agrees with the morphodynamics of cells in the 13D feature space, preserving the diffusive power and the Lyapunov exponent of cell shape fluctuations even though the dimensional reduction projection is highly nonlinear. Our results suggest that morphological latent space is a powerful tool to represent and understand the cell morphology and morphodynamics.
Keywords: phenotype transition, machine learning, cell migration, cancer metastasis
Introduction
Cell morphology as a biomarker for the cellular state dates back to the 17th century and remains widely used in both fundamental research and clinical tests [1, 2, 3]. With the advance of high-content imaging techniques, cell morphology is often redefined, including a plethora of geometric measures characterizing the shape, texture, and internal organizations of cells’s organelles from microscopy images [4, 5, 6]. As a result, cell morphology is reported as a high-dimensional feature vector, where the dimensionality can range from tens [7, 8] to hundreds [6, 9, 4, 10]. Extending to large feature vectors, combined with modern data science tools, has significantly enhanced the sensitivity and robustness of morphology as a reporter of cellular state [11, 12]. However, using high-dimensional, high-content representations also introduces challenges in interpreting and visualizing morphological data.
To address the challenge, one notes the components of a morphological feature vector may have internal correlations, and the redundancy of information leaves room for reducing the dimensionality. A similar problem arises in bioinformatics, where the gene expression of a cell is described by a high-dimensional vector [13, 14]. To aid in visualization and intuitive understanding of the cellular state, various dimensional reduction strategies have been developed [15, 16, 17, 18, 19, 20]. This ranges from linear algorithms such as the principal component analysis [21, 22], to non-linear embeddings such as t-SNE [23, 24, 25]. In recent years, advances in deep neuron networks have led to the emergence of another category of approaches, where an internal layer of a network, also known as the latent space, is trained to compress high-dimensional data [26, 27, 28]. Latent space variables have demonstrated exceptional performance to lower the dimensionality of high-content data for a wide range of applications including studies of animal behaviors [29], fertility screening [30, 31], cancer research [32], or other morphological analyses[33, 34, 35, 36, 37, 38].
The morphology and morphodynamics of breast cancer cells encode crucial information of their phenotype states. For instance, MDA-MB-231 cells, a triple-negative metastatic breast cancer cell line, exhibit four main morphological phenotypes corresponding to their migration programs [39]. This includes Filopodia (FP, [40, 41]) and Lamellipodia (LP, [42, 43]), which exhibit strong cell-matrix adhesions and elongated geometry, as well as Actin-enriched leading edge (AE, [44]) and Blebbing (BB, [45, 46, 47]) which exhibit rounded cell shape and weak cell-matrix adhesions. The morphodynamics of a breast cancer cell is associated with its migration mode transition, which facilitates the invasion of the cell in heterogeneous tissue environment [48]. Previously we have shown that high-content morphological data, which consists of 13-dimensional (13-D) morphological feature vectors (see Supplementary S1 for more information on each geometric feature), is sufficient to classify the cell migration modes[48].
In this study, we develop latent space representation of the morphology of MDA-MB-231 breast cancer cells in 3-D collagen matrices. We show that such latent variables can be obtained by combining a encoder-decoder architecture with an adversary network, the adversarial autoencoder (AAE). With minimal loss of information, our approach projects 13-D morphological feature vectors into a 2-D latent space that is tessellated by adjacent regions corresponding to cell migration phenotypes. As a result, we introduce a low-dimensional representation of cell morphology and morphodynamics which facilitates visualizing and interpreting cellular states. In the latent space, we analyze the phenotype-dependent cell motility and drug responses. We also examine the continuity and conformality of the mapping between feature vectors to latent representations. Finally, we show that the characteristics of cell morphodynamics are quantitatively preserved after dimensional reduction to the latent space. Our result paves the way to effectively represent single cell morphological data for high-content imaging in medical diagnostics, screening, and drug developments[49, 50, 51].
Results
Image processing pipeline
In order to construct an appropriate latent space representation of cancer cell morphology, we have developed an image processing pipeline to map confocal image stacks to morphological latent axes (MLA). Previously, we have reported an algorithm to segment cancer cells from confocal stacks with low axial resolution. This algorithm, Projection Enhancement Network (PEN), allows extraction of cells in 3D samples with minimal axial sectioning and produces 2D cell masks[52] (Fig. 1A). Once a cell mask is obtained, we calculate a 13-dimensional morphological feature vector (MFV) that can be used to classify the migration mode of the cancer cell (Fig. 1B). The classification is based on the distinct morphologies associated with 3D cancer cell migration phenotypes, and is implemented with a fully trained Support Vector Machine [48]. As we have shown previously, this strategy achieves more than 90% accuracy in classifying breast cancer cells into one of four migration phenotypes: Filopodia (FP), Lamellipodia (LP), Actin-enriched leading edge (AE), and Blebbing (BB). The classification is consistent with the actin cytoskeleton organization of cells, and is robust with respect to 3D orientation of cells in the sample [48].
Figure 1.

Image processing pipeline to map raw cell images to their morphological latent space representation. (A) The previously reported Projection Enhancement Network (PEN) algorithm [52] segments confocal z-stacks of 3D cultured cells to obtain single cell masks. (B) Each mask generates a 13-dimensional morphological feature vector (MFV) to quantify the shape of a cell. (C) The masks are used to train a standard encoder-decoder architecture, fitted with a Gaussian adversary network, to compress the 13-dimensional MFV onto a 2D latent space with minimal loss of cell shape information. (D) The latent space, termed as morphological latent space (MLS), is divided into regions corresponding to distinct cell migration phenotypes (black curves) and bounded by non-physical cell shapes (red curve). Sample morphology have been provided for each phenotype. BB: blebbing, AE: actin-enriched pseudopodial, LA: lamellipodial, FP: filopodial.
Because the morphological feature vector (MFV) comprehensively characterizes the shape and phenotype of a breast cancer cell, we seek to compress the high dimensional MFVs to lower dimensions for effective visualization and analysis. To this end, we employ an adversarial autoencoder network, described in Supplementary Material S1. Briefly, this network uses a series of densely connected layers with a leaky ReLU activation to first down-sample then up-sample the input, so that the redundancy can be learned and eliminated layer-by-layer in the encoder while ensuring reconstruction of the input by the decoder. The network was trained using cell morphologies from a 1.5mg/mL spheroid invasion, more specific the training and validation data can be found in Supplementary Material S2.1. Because the encoder produces highly nonlinear mapping of the input, the well-defined phenotype boundaries in the morphological feature space may be lost during the encoding process. To control large scale structure of the latent representation, we include an adversary network with Gaussian priors.
As shown in Fig. 1C, the adversary encoder-decoder projects 13-dimensional morphological feature space (MFS) into lower dimensional morphological latent spaces (MLS). The mapping shows a mean absolute error of 0.024 in the reconstructed features, and maintains the boundary topology separating cell migration phenotypes (Fig. 1D, see Supplementary Material S2.4 for the error analysis). Since the components are normalized between 0 and 1, we use the mean absolute error to approximate the average error to the features. We find the two-dimensional latent space is optimal: a one-dimensional latent space can not sufficiently encode the morphological features, yielding a mean absolute error 0.1. On the other hand, a higher dimensional latent space yields a mean absolute error indistinguishable from the two dimensional model (see Supplementary Material S2.2 for more latent dimension analysis).
Due to the nonlinear nature of the MFS→MLS mapping, the 2D latent space may include non-biological, or non-physical regions–regions that do not encode any valid cell morphology. To determine the boundary of interested regions, we imposed two simple criteria: upper and lower bounds to the components of MFVs. The lower bound is set to zero, since all of the 13 geometric measures are non-negative. The upper bound is estimated from a dataset consisting of 32,000 cell morphologies. After imposing these upper and lower bounds on the decoded vectors of a grid of latent vectors centered at the origin, we have identified in the latent space a completely enclosed region (red solid curve in Fig. 1D) that represents valid cell morphologies (see more details in Supplementary Material S3). With the help of the adversary network’s organization, four different migration phenotypes occupy adjacent and non-overlapping regions in the latent space. Characteristic cell images corresponding to each migration phenotypes (blebbing (BB), actin-enriched leading edge (AE), lamellipodial (LA), and filopodial (FP)) are shown as insets in Fig. 1, see Supplementary Material S4 for additional details of the four migration phenotypes.
Disseminating cancer cells from tumor spheroids
We first apply the image processing pipeline to visualize the phenotype distribution of breast cancer cells disseminating from tumor spheroids into 3D extracellular matrix (ECM). The tumor invasion assay and confocal imaging were discussed previously [52]. Briefly, GFP-labeled MDA-MB-231 tumor spheroids are embedded in ECM consists of 1.5 mg/ml type I collagen. After seeding the spheroids and allow cells to invade the surrounding ECM we continuously image the sample for 24 hours using a laser scanning confocal microscope (Leica SPE) and a 10x objective lens (NA = 0.4) to obtain 3D image stacks at 15 minute intervals.
Fig. 2A shows snapshots of a sample at 12 hours and 24 hours after the invasion begins. The segmented and classified cells are rendered with pseudo colors: green (BB), yellow (AE), blue (LA) and magenta (FP). At 12 hours, we detect 195 cells separated from the tumor spheroid. At 24 hours, the number of disseminating cells have increased to 386 and the leading cells have invaded 335 μm from the spheroid boundary. Note that to avoid segmentation errors we have excluded the region within the spheroid boundary, focusing only on the disseminating cells.
Figure 2.

Representing invading breast cancer cells in the morphological latent space. Here we study MDA-MB-231 tumor spheroids in 3D extracellular matrix (ECM) consists of type I collagen. (A) A tumor spheroid invades collagen ECM with concentration of 1.5 mg/ml. Top: a snapshot of the spheroid after 12 hours of invasion into surrounding ECM. Middle: a snapshot of the spheroid after 24 hours of invasion. Here disseminating cells are segmented and classified using the image processing pipeline described in Fig. 1. Cells are colored by their migration phenotype. Bottom: the latent space representation of disseminating cells after 24 hours of invasion. Data points are colored by cell migration phenotype. Outset: the distribution of cells along each of the morphological latent axis (MLA1 and MLA2). (B) A tumor spheroid invades collagen ECM with concentration of 3.0 mg/ml. Same as in (A), the top and middle panels show processed snapshots at 12 hours and 24 hours of invasion. Bottom panel shows latent space representation of disseminating cells after 24 hours of invasion. Colors of cells and data points: green (BB), yellow (AE), blue (LA) and magenta (FP). Scale bars: 200 μm.
The latent space projection of disseminating cells are shown in the bottom panel of Fig. 2A. Here each data point represents one cell’s morphology after 24 hours of invasion. We have also applied kernel density estimator to compute the distribution of migration phenotypes along MLAs. The distributions are displayed as the outset of the bottom panel. Cells demonstrate highly heterogeneous phenotype, populating most physical regions in the morphological latent space. The fraction of each cell type are: filopodia (FP, 24.4%), actin-enriched leading edge (AE, 17.7%), lamellipodia (LP, 28.9%), and blebbing (BB, 29.0%).
It is well established that 3D cancer invasion critically depends on the mechanical property of the ECM [53, 54, 55]. To investigate the effect of ECM mechanics on cancer cell migration phenotype, we have also studied the invasion of MDA-MB-231 tumor spheroids in 3.0 mg/ml type I collagen matrices. Increasing the collagen concentration from 1.5 mg/ml to 3.0 mg/ml leads to 20% increase of ECM storage modulus and 5 fold increase of ECM loss modulus, thereby presenting stronger physical barriers to the dissemination of breast cancer cells.
Fig. 2B shows the processed images after 12 hours and 24 hours of invasion, and latent space projections of MDA-MB-231 cells in 3.0 mg/ml collagen matrices. It is evident that as collagen concentration doubles in the ECM, the number of disseminating cells significantly decreases by ≈ 40% compared with the number in Fig. 2A. Migration phenotype distribution remain largely unchanged, suggesting that the phenotype homeostasis is robust even as the ECM mechanics varies significantly. It is noted that the phenotype distribution shifts slightly toward blebbing cells. The fractions of the cell types become: filopodia (FP, 23.7%), actin-enriched leading edge (AE, 18.3%), lamellipodia (LP, 25.7%), and Blebbing (BB, 32.3%).
Representing cells in the morphological latent space also reveals the relation between cell morphology and cell motility. To demonstrate this, we perform automatic tracking of cells to obtain the combined information of cell morphology and velocity. After transforming into the morphological latent space, we construct heat maps showing the velocity and persistence of cell migration in the ECM (Fig. 3). In particular, cells are tracked for three consecutive frames at time , , and from the confocal images to calculate the velocity , and , in addition to the cell geometry at time (Fig. 3A). Here the time interval is 15 minutes. From the data we construct the latent space heat map of cell velocity in the radial direction as shown in Fig. 3B, since we are primarily interested in the migration away from spheroid.
Figure 3.

Cancer cell motility for invasion in 3D collagen ECM depends on cell migration phenotype and cell morphology. (A) Trajectories of a few tracked MDA-MB-231 cells disseminating from a tumor spheroid. Cell centroids at each frame is colored based on the radial velocity. (B) Heat map in the morphological latent space shows the average radial velocity of cells when disseminating from a tumor spheroids into 3D collagen ECM. In (A-B) the radial direction is measured from the spheroid center. (C) Trajectories of the same cells in (A) with cell centroids colored by persistence (see main text for definition). (D) Heat map in the morphological latent space shows the average persistence of cells when disseminating from a tumor spheroids into 3D collagen ECM. Here is the angle of cell velocity between two consecutive time intervals. In (A-D), the data is obtained from the same experiment in Fig. 2A. The heat maps are generated from 431 data points followed by Gaussian smoothing.
The efficiency of tumor invasion not only depends on the velocity of cells, but also on their directional persistence exploring the extracellular matrix [56]. Hence in addition to the velocity heat map, we have also computed distribution of persistence in the latent space. Here the persistence is defined as , where is the angle between velocity at time and . When reaches a maximum value of 1, cells maintain their direction of migration. When approaches its minimum value of −1, cells take 180-degree turns in consecutive steps. Fig. 3C shows sample cell trajectories color coded by the persistence at each step. These single step data is then translated into the latent space to generate the heatmap of Fig. 3D.
The results in Fig. 3 show that invasion dynamics of breast cancer cells depend on their migration phenotype, which is encoded in the cell morphology and represented in the latent space. In particular, filopodial cells have the highest radial velocity and their invasiveness is additionally boosted by high persistence. The two amoeboidal phenotypes, including blebbing (BB) and actin-enriched leading edge cells (AE), are generally less effective in invasion because of the low persistence.
Dynamics of cells in the morphological latent space
Our image processing pipeline can also be applied to continuous imaging datasets, which result in a low dimensional representation of the cell morphodynamics in the morphological latent space (MLS).
Fig. 4 shows the morphodynamics of 3D cultured MDA-MB-231 cells when their mechanotransduction pathways are perturbed. We first treat the cells with Rho Kinase activator CN03 (Fig. 4A). The streamlines in Fig. 4B represent the mean velocity field of cells in the MLS and the color of the streamlines represents the time lapse measured from the onsite of drug treatment. Upregulating Rho signaling increases cell contractility. As a result, after 6 hours of drug treatment cells converge to the boundary of BB-AE-LA states, where they assume an more rounded shape.
Figure 4.

Latent space dynamics of breast cancer cells under pharmacological perturbations. (A) Snapshots of 3D cultured MDA-MB-231 cells before (blue channel) and after (red channel) 6 hours treatment of CN03, which is a Rho-kinase activator. (B) Streamlines showing the CN03-induced velocity field of cells in the morphological latent space. (C) Snapshots of 3D cultured MDA-MB-231 cells before (blue channel) and after (red channel) 24 hours treatment of Y27632, which is a Rho-kinase inhibitor. (D) Streamlines showing the Y27632-induced velocity field of cells in the morphological latent space. In (A,C), scale bars are 100 μm. In (B,D) The streamlines are color-coded by time lapse from the beginning of drug treatment.
Conversely, treating the cells with Rho inhibitor Y27632 (Fig. 4C) facilitate actin cytoskeleton to form elongated bundles. Indeed, the velocity field shown in Fig. 4D suggests that many cells that are initially in the blebbing state move towards the filopodial state. The opposite flow directions demonstrated in Fig. 4B and 4D are therefore consistent with the expected downstream effects of Rho signaling.
Since Rho signaling is a major pathway that mediates cancer cell mechanosensing to the ECM, MLS may be used to infer the cellular dynamics with altered microenvironment.
Dynamics of cells in MLS is consistent with cell morphodynamics in MFS
To better understand latent dynamics exhibited by cells, we quantify the mapping between the 13-dimensional morphological feature space (MFS), where each dimension corresponds to a particular geometric measure, to the morphological latent space (MLS). We first examine the continuity and conformality of the MFS → MLS projection. When a cell makes two small consecutive steps in the morphological feature space A→B→C, (Fig. 5A), the steps are mapped into the latent space as A’→B’→C’ (Fig. 5B). We compare the Euclidean distances between points A and B with between A’ and B’. As shown in Fig. 5C, results from randomly sampled consecutive steps shows the MFS → MLS mapping is continuous. Small variations of cell geometry corresponds to small deviations in the latent space projections, regardless of the cell migration phenotype (colors in Fig. 5C).
Figure 5.

The geometric properties of the mapping from morphological feature space (MFS) to morphological latent space (MLS). (A-B) Schematics showing the process of computing distance map and angular map. In MFS, two consecutive steps (A→B→C) are mapped to A’→B’→C’ in MLS. The distance map can be obtained from the Euclidean step size and its corresponding Euclidean step size . To obtain the angular map, we compute the angle , where the vector inner product assumes identity metric. Similar, we compute the angle . (C-D) The distance and angular maps obtained from 320 randomly sampled cell trajectories, where each cell migration phenotype contributes to 25% of the data. Color of the plotted symbols represent the cell migration phenotype.
We also examine the mapping of angles from the MFS to MLS. To this end we compute the angle from vector to , and the corresponding angle from vector to . Fig. 5D shows the MFS → MLS mapping is not conformal, except for when and are approximately in parallel see also SI section S6. Therefore if a cell moves persistently or make 180-degree turns in the MFS, the corresponding latent trajectories will share the same characteristics.
After characterizing the projections of small steps from MFS to MLS space, we then ask if dynamic features of cells will be preserved. To this end, we manually track 16 MDA-MB-231 cells disseminating from a tumor spheroid and analyze their morphodynamics. Fig. 6A shows the time evolution of the cell shapes as quantified by the mean square displacement in the morphological feature space, . are approximately power-law functions (log-log regression fit gives a slope of ~ 0.2 with p < 0.01 for both MFS and MLS) of time and the best-fit diffusive powers are shown in the inset of Fig. 6A. It is evident that the cell morphodynamics in the MFS is subdiffusive. When mapped to the morphological latent space, we find similar subdiffusive behavior (Fig. 6C). The fitted powers are very close: for morphodynamics in the MFS, the average diffusive power is 0.20, whereas for the latent dynamics in the MLS, the average diffusive power is 0.23.
Figure 6.

Comparing the morphodynamic properties of cells in the morphological feature space (MFS) and morphological latent space (MLS). (A) The mean-square-displacement in the morphological feature space as a function of time lag for MDA-MB-231 cells disseminating from tumor spheroids into 3D collagen matrices. The results show that can be approximated as a power-law such that . Inset: the fitted powers of each trajectory. The average power is 0.20, indicating overall subdiffusive morphodynamics in the MFS. (B) The mean-square-displacement in the morphological latent space () as a function of time lag obtained by mapping the morphodynamics in (A) to the latent space. is approximately a power-law function of time lag . Inset: the fitted powers . The average power is 0.23. In (A-B) the axes are in log-log scale. (C) Natural log of the relative Euclidean distance of cell pairs in the morphological feature space as a function of time lag. The results show that is approximately stationary. Inset: fitting to an exponential function such that , where is the Lyapunov exponent. The mean value of . (D) Natural log of the relative Euclidean distance of cell pairs in the morphological latent space as a function of time lag. Similar to (C), is approximately stationary. Inset: fitted Lyapunov exponents . The mean value of is −0.01. In (C-D) three sample curves are plotted in gray, the green shaded region indicates the density of the 90 to 100% of lines from darkest to lightest green respectively. In (A-D) a total of 16 manually tracked cell trajectories are analyzed from the same experiment shown in Fig. 2.
In addition to quantifying the shape fluctuations of single cells, we have also examined how shape deviation between cells evolves over time. For random walks such as the morphodynamics exhibited by the cells, separation between trajectories reflects the underlying physical processes that drive the walk. often follows a power-law function , where represent ensemble average and is the Lyapunov exponent. The Lyapunov exponent quantifies the tendencies of the trajectories to diverge, converge, or remain equidistant from each other, corresponding to , , and respectively.
When the Lyapunov exponent is zero, dynamics is considered as neutral stable. This could result from trajectories near fixed points or limit cycles where dynamic flow field keeps trajectories evenly spaced. As shown in Fig. 6(C,D), the Lyapunov exponents in MFS and in MLS are both close to zero, indicating neither converging nor diverging trends. Therefore cell morphodynamics show consistent neutral stability in both morphological feature space and latent space (See Supplementary Material S5 for projected trajectories in the MLS).
Applying the Model to Other Cell Lines
We have applied our approach to other cell types by evaluating the latent representation of 9 distinct cell lines. The result on four of these cell lines are demonstrated in Figure 7. Results on additional cell types are included in SI section S9. These cell images are obtained from publicly available datasets. Cells are classified via their geometric features, and are projected into the morphological latent space using the same network trained by MDA-MB-231 cell images.
Figure 7.

Morphological Latent Space representation of cells from varying organisms under different physiological conditions. (A-D) correspond to A172, BV-2, SH-SY5Y, and Killifish embryo cells. Sample segmentation of the original images as well as MLS projections are shown side by side. The cell segmentation data for the A172, BV-2, and SH-SY5Y are obtained from the LIVECell dataset[61]. The Fundulus heteroclitus (Killifish) Embryo cell images are obtained from the Cell Image Library 35208 and the frames are segmented with Cellpose[60, 62].
We find different cell populations demonstrate distinct latent space distributions. The A172 cells are derived from the brain tissue of a glioblastoma patient[57]. As shown in Fig. 7A, A172 cells are predominantly in the lamellipodial state when cultured on a flat surface. The BV-2 cells are murine microglial cells derived from mice[58]. As shown in Fig. 7B, BV-2 cells are generally rounded and have morphologies characteristic of cells in the BB and AE states.
The SH-SY5Y cells are derived from the bone tissue of a neuroblastoma patient[59]. SH-SY5Y cells are highly metastatic, and they occupy a broad area spanning all four phenotypical regions in the latent space (Fig. 7C). Finally, the killifish embryo cells are known to migrate through blebbing and lamellipodial activities[60]. Consistently, we see that the cells predominantly occupy the BB and LA regions in the latent space (Fig. 7D). These results demonstrate that our model, which are trained with images of MDA-MB-231 cells, can be applied to visualize and quantify the morphological features of cells from different organisms and under various physiological conditions. To enable the broader application of our algorithm, such as in biomedical imaging of brains and tissues, we have incorporated a transfer learning feature into our source code. This functionality allows users to retrain the final layers of the encoder, decoder, and discriminator subnetworks, while keeping the remaining layers frozen. Detailed instructions for using the transfer learning feature are provided in our GitHub repository (refer to the Data Availability section).
Conclusion and Discussion
Reducing the dimensionality of high-content biomedical imaging data is essential for creating intuitive representations and enhancing the interpretation and communication of results[31, 25, 63, 64]. Here, we develop a deep neuron network that can be trained to optimize the nonlinear projection of 13-dimensional morphological features into 2-dimensional morphological latent space. We train our neuron network with a dataset consists of 32,000 images of breast cancer cells disseminating from tumor spheroids into 3D collagen ECM. This in vitro tumor model is widely used for cancer research and drug screening [65, 66, 67, 68].
We construct the morphological latent space by combining an encoder-decoder with an adversary network. The inclusion of adversary network preserves the topology of the morphological feature vectors: cells that belong to the same morphological phenotype occupy a continuous patch in the latent space (Fig. 1). The adversary network effectively distills the style and content of the source data, organizing the latent space, and has been successfully deployed for various applications [28, 69, 70]. Here we find utilizing a simple Gaussian prior in the adversary network is sufficient to achieve the desirable large-scale latent space structure.
The adversarial autoencoder provides small reconstruction errors and high classification accuracy for a 2D projection of morphological data. Adversarial autoencoders have been employed in a wide variety of biomedical applications such as screening compounds for anti-cancer properties [71], unsupervised anomaly detection in medical images [72, 73], differentiating breast cancer sub-types from biologically relevant genes[74], and more [75, 76]. We also investigate other dimensionality reduction techniques, including principal component analysis (PCA), kernel PCA, uniform manifold approximation and projection (UMAP), autoencoder, and variational autoencoders (VAE). Apart from VAEs, these methods strictly underperform AAEs on our morphology data for tasks involving reconstruction or classification. The difference in performance in these tasks between VAEs and AAEs is minimal, possibly due to their similar architectures. We therefore choose to employ an adversarial autoencoder (AAE) for our work because it offers relaxed restrictions on the prior compared to the VAE and is well suited to learning the underlying structure of the morphology data rather than modeling its distribution (see additional discussions of alternative methods in Supplementary Section S7).
With only two dimensions to span the latent space, cell morphology and morphodynamics can be clearly visualized. We demonstrate that the morphological distribution of breast cancer cells disseminating from tumor spheroids remains mostly unchanged even when the ECM mechanics alters (Fig. 2). We show that cancer cell motility, including speed and persistence, is coupled with cell morphological phenotype (Fig. 3). Under drug treatment, cell responses appear as continuous flows in the latent space (Fig. 4). These examples underscore the value of our dimensional reduction algorithm in providing intuitive, dynamic visualizations of cellular activities.
Our algorithm significantly reduces the dimensionality of morphological data while maintaining minimal loss of information. We show that the phenotype classification accuracy is as high as 82% with a reconstruction error of 0.024 (See Supplementary S2.4 for more detail)., suggesting a large degree of redundancy in the morphological feature vectors. Indeed, some measure are geometrically related, such as Solidity and Extent which are ratios between Area and Convex Area or the Area of the Bounding Box respectively. Others may follow constraints set by biophysical properties of the cell such as approximate volume preserving shape fluctuations [77]. Instead of manually identifying the interdependence within the data, we take advantage of encoder-decoder deep neuron network so that the network model learns the optimal projection from feature vectors to latent variables.
We find the mapping between morphological features to latent variables is continuous, but not angle-preserving. Surprisingly, morphodynamic characteristics, including the diffusive power and Lyapunov exponent are preserved in the latent space. To gain a deeper understanding of this property, we conducted numerical experiments comparing our approach with other dimensional reduction methods for mapping random walks in feature space. (Supplementary Material S7). We find that the adversarial autoencoder neatly encodes the morphological feature space to four distinct phenotype regions, yielding the most interpretable space. Further, PCA’s explained variance indicated that it would require higher dimensionality to fully encompass the data to the degree of an autoencoder decreasing the ease of visualization. These results highlights the effectiveness of our approach to quantitatively reproduce morphodynamic fingerprints of cells at lower dimensions.
Latent variables can be obtained by exploiting a variety of network architectures. It will be interesting for future studies to investigate different strategies, such as image-to-image autoencoders [78, 64], segmentation networks [79], and classifiers [80, 81]. Further extending the work to different cells for improved multicell-type latent organization, in a manner similar to Morpho-VAE’s approach for encoding different mandibles with an additional classification input, may be another interesting strategy for future work[82]. Once an effective latent representation is achieved, it is also interesting to construct data-driven physical models that predicts the phenotype evolution of cells during metastasis and therapeutic interventions.
Supplementary Material
Acknowledgments
This work is supported by National Science Foundation PHY-1844627. BS is supported by National Institute of General Medical Sciences grant R35GM138179.
Data availability
-
Availability of data and materials
Imaging data can be found at https://doi.org/10.6084/m9.figshare.c.7539075
-
Code availability
The software for cell segmentation and classification can be found at https://github.com/TerminalCursor/LatentMorphodynamics.git
Bibliography
- [1].Hooke R 1665. Micrographia (The Royal Society; ) [Google Scholar]
- [2].Mazzarello P 1999. Nat. cell biol E13–E15 [DOI] [PubMed] [Google Scholar]
- [3].Mayr E 1982. The Growth of the Biological Thought (Belknap, Cambridge, MA: ) [Google Scholar]
- [4].Way GP, Kost-Alimova M, Shibue T, Harrington WF, Gill S, Piccioni F, Becker T, Shafqat-Abbasi H, Hahn WC, Carpenter AE et al. 2021. Molecular biology of the cell 32 995–1005 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [5].Cimini BA, Chandrasekaran SN, Kost-Alimova M, Miller L, Goodale A, Fritchman B, Byrne P, Garg S, Jamali N, Logan DJ et al. 2023. Nature protocols 18 1981–2013 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [6].Bray MA, Singh S, Han H, Davis CT, Borgeson B, Hartland C, Kost-Alimova M, Gustafsdottir SM, Gibson CC and Carpenter AE 2016. Nature protocols 11 1757–1774 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [7].Doughty MJ 1989. Optometry and vision science 66 626–642 [DOI] [PubMed] [Google Scholar]
- [8].Masseroli M, Bollea A and Forloni G 1993. Computer methods and programs in biomedicine 41 89–99 [DOI] [PubMed] [Google Scholar]
- [9].Bray MA, Gustafsdottir SM, Rohban MH, Singh S, Ljosa V, Sokolnicki KL, Bittker JA, Bodycombe NE, Dančík V, Hasaka TP et al. 2017. Gigascience 6 giw014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [10].Carpenter AE, Jones TR, Lamprecht MR, Clarke C, Kang IH, Friman O, Guertin DA, Chang JH, Lindquist RA, Moffat J et al. 2006. Genome biology 7 1–11 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [11].Alizadeh E, Castle J, Quirk A, Taylor CD, Xu W and Prasad A 2020. Computers in biology and medicine 126 104044. [DOI] [PubMed] [Google Scholar]
- [12].Chen S, Zhao M, Wu G, Yao C and Zhang J 2012. Computational and mathematical methods in medicine 2012 101536. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [13].George G and Raj VC 2011. arXiv preprint arXiv:1109.1062 [Google Scholar]
- [14].Clarke R, Ressom HW, Wang A, Xuan J, Liu MC, Gehan EA and Wang Y 2008. Nature reviews cancer 8 37–49 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [15].Schölkopf B, Smola A and Müller KR 1997. Kernel principal component analysis International conference on artificial neural networks (Springer; ) pp 583–588 [Google Scholar]
- [16].Van Der Maaten L, Postma E, Van den Herik J et al. 2009. J Mach Learn Res 10 [Google Scholar]
- [17].Jia W, Sun M, Lian J and Hou S 2022. Complex & Intelligent Systems 8 2663–2693 [Google Scholar]
- [18].Ray P, Reddy SS and Banerjee T 2021. Artificial Intelligence Review 54 3473–3515 [Google Scholar]
- [19].Ayesha S, Hanif MK and Talib R 2020. Information Fusion 59 44–58 [Google Scholar]
- [20].Grys BT, Lo DS, Sahin N, Kraus OZ, Morris Q, Boone C and Andrews BJ 2017. Journal of Cell Biology 216 65–71 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [21].Pearson K 1901. The London, Edinburgh, and Dublin philosophical magazine and journal of science 2 559–572 [Google Scholar]
- [22].Hasan BMS and Abdulazeez AM 2021. Journal of Soft Computing and Data Mining 2 20–30 [Google Scholar]
- [23].Hinton GE and Roweis S 2002. Advances in neural information processing systems 15 [Google Scholar]
- [24].Van der Maaten L and Hinton G 2008. Journal of machine learning research 9 [Google Scholar]
- [25].Berman GJ, Choi DM, Bialek W and Shaevitz JW 2014. Journal of The Royal Society Interface 11 20140672. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [26].Bank D, Koenigstein N and Giryes R 2023. Machine learning for data science handbook: data mining and knowledge discovery handbook 353–374 [Google Scholar]
- [27].Mathieu E, Rainforth T, Siddharth N and Teh YW 2019. Disentangling disentanglement in variational autoencoders International conference on machine learning (PMLR) pp 4402–4412 [Google Scholar]
- [28].Makhzani A, Shlens J, Jaitly N, Goodfellow I and Frey B 2015. arXiv preprint arXiv:1511.05644 [Google Scholar]
- [29].Thomas M, Jensen FH, Averly B, Demartsev V, Manser MB, Sainburg T, Roch MA and Strandburg-Peshkin A 2022. Journal of Animal Ecology 91 1567–1581 [DOI] [PubMed] [Google Scholar]
- [30].Erlich I, Ben-Meir A, Har-Vardi I, Grifo JA and Zaritsky A 2021. MedRXiv 2021–10 [Google Scholar]
- [31].Rotem O, Schwartz T, Maor R, Tauber Y, Shapiro MT, Meseguer M, Gilboa D, Seidman DS and Zaritsky A 2024. Nature communications 15 7390. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [32].Wang Z and Wang Y 2019. BMC bioinformatics 20 1–7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [33].Becht E, McInnes L, Healy J, Dutertre CA, Kwok IW, Ng LG, Ginhoux F and Newell EW 2019. Nature biotechnology 37 38–44 [DOI] [PubMed] [Google Scholar]
- [34].Caicedo JC, Cooper S, Heigwer F, Warchal S, Qiu P, Molnar C, Vasilevich AS, Barry JD, Bansal HS, Kraus O et al. 2017. Nature methods 14 849–863 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [35].Becher B, Schlitzer A, Chen J, Mair F, Sumatoh HR, Teng KWW, Low D, Ruedl C, Riccardi-Castagnoli P, Poidinger M et al. 2014. Nature immunology 15 1181–1189 [DOI] [PubMed] [Google Scholar]
- [36].Phillip JM, Han KS, Chen WC, Wirtz D and Wu PH 2021. Nature protocols 16 754–774 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [37].Yao K, Rochman ND and Sun SX 2019. Scientific reports 9 13467. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [38].Chen D, Sarkar S, Candia J, Florczyk SJ, Bodhak S, Driscoll MK, Simon Jr C G, Dunkers JP and Losert W 2016. Biomaterials 104 104–118 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [39].Petrie RJ and Yamada KM 2012. J. Cell Science 125 5917–5926 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [40].Nalbant P, Hodgson L, Kraynov V, Toutchkine A and Hahn KM 2004. Science 305 1615–1619 [DOI] [PubMed] [Google Scholar]
- 41.Trinkaus JP 1973. Developmental biology 30 68–103 [DOI] [PubMed] [Google Scholar]
- [42].Abercrombie M, Heaysman JE and Pegrum SM 1970. Experimental cell research 59 393–398 [DOI] [PubMed] [Google Scholar]
- [43].Petrie RJ, Gavara N, Chadwick RS and Yamada KM 2012. Journal of Cell Biology 197 439–455 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [44].Wyckoff JB, Pinner SE, Gschmeissner S, Condeelis JS and Sahai E 2006. Current Biology 16 1515–1523 [DOI] [PubMed] [Google Scholar]
- [45].Charras G and Paluch E 2008. Nature reviews Molecular cell biology 9 730–736 [DOI] [PubMed] [Google Scholar]
- [46].Lorentzen A, Bamber J, Sadok A, Elson-Schwab I and Marshall CJ 2011. Journal of cell science 124 1256–1267 [DOI] [PubMed] [Google Scholar]
- [47].Yamazaki D, Kurisu S and Takenawa T 2005. Cancer science 96 379–386 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [48].Eddy CZ, Raposo H, Manchanda A, Wong R, Li F and Sun B 2021. Scientific Reports 11 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [49].Heath JR, Ribas A and Mischel PS 2016. Nature reviews Drug discovery 15 204–216 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [50].Tang Q, Ratnayake R, Seabra G, Jiang Z, Fang R, Cui L, Ding Y, Kahveci T, Bian J, Li C et al. 2024. Briefings in Bioinformatics 25 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [51].Petersen I. Deutsches Ärzteblatt International. 2011;108:525. doi: 10.3238/arztebl.2011.0525. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [52].Eddy CZ, Naylor A, Cunningham CT and Sun B 2023. Physical Biology 20 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [53].Kim J, Feng J, Jones CAR, Mao X, Sander LM, Levine H and Sun B 2017. Nature Communications 8 842. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [54].Wolf K, te Lindert M, Krause M, Alexander S, te Riet J, Willis AL, Hoffman RM, Figdor CG, Weiss SJ and Friedl P 2013. J. Cell Biol 201 1069–1084 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [55].Winkler J, Abisoye-Ogunniyan A, Metcalf KJ and Werb Z 2020. Nat Commun 11 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [56].Petrie RJ, Doyle AD and Yamada KM 2009. Nature reviews Molecular cell biology 10 538–549 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [57].ATCC A172 https://www.atcc.org/products/crl-1620 [Online; accessed 13-February-2025]
- [58].Cytion BV-2 https://www.cytion.com/us/BV2-Cells/305156 [Online; accessed 13-February-2025]
- [59].ATCC SH-SY5Y https://www.atcc.org/products/crl-2266 [Online; accessed 13-February-2025]
- [60].Rachel Fink PW. CIL:35208, fundulus heteroclitus, deep cell Cell Image Library (Dataset) 2011 [Google Scholar]
- [61].Edlund C, Jackson TR, Khalid N, Bevan N, Dale T, Dengel A, Ahmed S, Trygg J and Sjögren R 2021. Nature methods 18 1038–1045 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [62].Stringer C, Wang T, Michaelos M and Pachitariu M 2021. Nature Methods 18 100–106 [DOI] [PubMed] [Google Scholar]
- [63].Rotem O and Zaritsky A 2024. Nature Methods 21 1394–1397 [DOI] [PubMed] [Google Scholar]
- [64].Zaritsky A, Jamieson AR, Welf ES, Nevarez A, Cillay J, Eskiocak U, Cantarel BL and Danuser G 2021. Cell systems 12 733–747 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [65].Gunti S, Hoke AT, Vu KP and London Jr N R 2021. Cancers 13 874. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Sant S and Johnston PA 2017. Drug Discovery Today: Technologies 23 27–36 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [67].Costa EC, Moreira AF, de Melo-Diogo D, Gaspar VM, Carvalho MP and Correia IJ 2016. Biotechnology advances 34 1427–1441 [DOI] [PubMed] [Google Scholar]
- [68].Naylor A, Zheng Y, Jiao Y and Sun B 2023. Soft Matter 19 9–16 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [69].Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A and Bengio Y 2014. Advances in neural information processing systems 27 [Google Scholar]
- [70].Mao J, Wang H and Spencer BF Jr 2021. Structural Health Monitoring 20 1609–1626 [Google Scholar]
- [71].Kadurin A, Aliper A, Kazennov A, Mamoshina P, Vanhaelen Q, Khrabrov K and Zhavoronkov A 2016. Oncotarget 8 10883. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [72].Zhang H, Guo W, Zhang S, Lu H and Zhao X 2022. Journal of Digital Imaging 35 153–161 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [73].Esmaeili M, Toosi A, Roshanpoor A, Changizi V, Ghazisaeedi M, Rahmim A and Sabokrou M 2023. IEEE Access 11 17906–17921 [Google Scholar]
- [74].Mondol RK, Truong ND, Reza M, Ippolito S, Ebrahimie E and Kavehei O 2021. IEEE/ACM Transactions on Computational Biology and Bioinformatics 19 2060–2070 [DOI] [PubMed] [Google Scholar]
- [75].Creswell A, Pouplin A and Bharath AA 2018. IET Computer Vision 12 1105–1111 [Google Scholar]
- [76].Wang HY, Zhao JP, Zheng CH and Su YS 2023. Briefings in Bioinformatics 24 bbac585. [DOI] [PubMed] [Google Scholar]
- [77].Beck LE, Lee J, Coté C, Dunagin MC, Lukonin I, Salla N, Chang MK, Hughes AJ, Mornin JD, Gartner ZJ et al. 2022. Cell systems 13 547–560 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [78].Chen M, Shi X, Zhang Y, Wu D and Guizani M 2017. IEEE Transactions on Big Data 7 750–758 [Google Scholar]
- [79].Kohl S, Romera-Paredes B, Meyer C, De Fauw J, Ledsam JR, Maier-Hein K, Eslami S, Jimenez Rezende D and Ronneberger O 2018. Advances in neural information processing systems 31 [Google Scholar]
- [80].Wang Y and Mori G 2010. A discriminative latent model of object classes and attributes Computer Vision–ECCV 2010: 11th European Conference on Computer Vision, Heraklion, Crete, Greece, September 5-11, 2010, Proceedings, Part V 11 (Springer; ) pp 155–168 [Google Scholar]
- [81].Felzenszwalb P, McAllester D and Ramanan D 2008. A discriminatively trained, multiscale, deformable part model 2008 IEEE conference on computer vision and pattern recognition (Ieee) pp 1–8 [Google Scholar]
- [82].Tsutsumi M, Saito N, Koyabu D and Furusawa C 2023. NPJ systems biology and applications 9 30. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
-
Availability of data and materials
Imaging data can be found at https://doi.org/10.6084/m9.figshare.c.7539075
-
Code availability
The software for cell segmentation and classification can be found at https://github.com/TerminalCursor/LatentMorphodynamics.git
