Skip to main content
Nucleic Acids Research logoLink to Nucleic Acids Research
. 2020 Sep 1;48(17):9505–9520. doi: 10.1093/nar/gkaa725

Inference and multiscale model of epithelial-to-mesenchymal transition via single-cell transcriptomic data

Yutong Sha 1,2, Shuxiong Wang 3, Peijie Zhou 4, Qing Nie 5,6,7,
PMCID: PMC7515733  PMID: 32870263

Abstract

Rapid growth of single-cell transcriptomic data provides unprecedented opportunities for close scrutinizing of dynamical cellular processes. Through investigating epithelial-to-mesenchymal transition (EMT), we develop an integrative tool that combines unsupervised learning of single-cell transcriptomic data and multiscale mathematical modeling to analyze transitions during cell fate decision. Our approach allows identification of individual cells making transition between all cell states, and inference of genes that drive transitions. Multiscale extractions of single-cell scale outputs naturally reveal intermediate cell states (ICS) and ICS-regulated transition trajectories, producing emergent population-scale models to be explored for design principles. Testing on the newly designed single-cell gene regulatory network model and applying to twelve published single-cell EMT datasets in cancer and embryogenesis, we uncover the roles of ICS on adaptation, noise attenuation, and transition efficiency in EMT, and reveal their trade-off relations. Overall, our unsupervised learning method is applicable to general single-cell transcriptomic datasets, and our integrative approach at single-cell resolution may be adopted for other cell fate transition systems beyond EMT.

INTRODUCTION

The epithelial-to-mesenchymal transition (EMT) is an important process observed in many biological systems, including embryogenesis, wound healing and malignant progression (1). Recently, several lines of in vitro and in vivo evidence, along with computational modeling, suggest that cells undergoing EMT is not a simple binary switch, and during the transition some cells exhibit mixed features of both epithelial and mesenchymal features (1,2). Those cells characterized as intermediate cell state (ICS) have been implicated in the potential roles of stemness, collective migration, drug resistance, metastasis, and noise control (1,3,4).

Key gene regulatory elements of EMT, such as EMT-suppressing microRNAs and EMT-promoting transcriptional factors, have been used for modeling and experimental analysis of ICS. Existence of multi-stable states of the modeled gene regulatory networks has been used to imply existence of ICS (5–7). Few regulators have been found to be critical in formation of ICS, such as a transcriptional factor Ovol for regulating growth and Notch signaling for cell-cell communications (7–9), and few others have been suggested in stabilizing ICS (10–12).

What are the functional advantages of ICS in state transitions? Cell population modeling suggests the increased number of ICS attenuates the fluctuations in cell numbers during transition (13) in addition to help maintain the mean of signal response (14). Experimental and modeling analysis shows ICS can also facilitate the robustness of population dynamics (15). Signal adaptation has been found to tightly constrain gene regulations (16), and however, could be important as a ‘survival strategy’ in growth and migration of cells (17). At the level of gene regulations, achieving robustness and signal adaptation, which both are important to cell fate transition, often require different, sometimes competitive, gene regulations (18). Comparisons of ICS across different EMT systems remain a major open problem (19).

Are the cells in ICS showing strong variability or tightly controlled? Single-cell RNA sequencing (scRNA-seq) technology provides unprecedented opportunities to explore cellular heterogeneity, distinct cell states, marker genes and the accompanying functions (20–22). Expression levels of epithelial and mesenchymal markers and transcription factors of ICS have been recently analyzed in EMT at single-cell resolution (23). EMT scoring metrics have been developed by applying the best-fit model obtained from a previously-developed iterative statistical procedure to quantify EMT status of cells in different cell lines (24–26). More recently, a topographic map underlying EMT has been constructed to explore ICS for its phenotypic plasticity (27).

One major challenge is to analyze temporal dynamics of cells in EMT from the snapshot transcriptomic data. Pseudo-temporal ordering (pseudotime) of cells in scRNA-seq data provides trajectories of cells that may recapitulate transition between cell states. However, such approach is usually dependent on the cell-embedding in the low-dimensional space via dimension reduction or structured graphs (28–30). Recently, the single-cell method SOUP allows classification of both pure and intermediate cells by constructing the cell-cell similarity matrix and estimating a membership matrix (28). Robust tools to quantify the transition trajectories and detect driving genes in EMT are still in need.

What are the transitional properties of cells near or at ICS? Is ICS simply another stable cell state between epithelial and mesenchymal states? Can we construct and quantify the transition paths in EMT? Here, we first develop an unsupervised learning method (QuanTC) to infer and quantify transitional property of individual cells in scRNA-seq data. After validating against our EMT multiscale single-cell model, which combines several previously published gene regulatory networks, we apply QuanTC to twelve published EMT transcriptomic datasets in cancer and embryogenesis. By inspecting transition cells, ICS, and their relationship with epithelial and mesenchymal states, we construct the ICS-regulated EMT trajectories. We then compare the inferred transition trajectories, which are different between cancer and embryogenesis, with another method based on critical transition theory, and re-construct core gene regulatory circuits for the published datasets to analyze the similarity and consistency in state transition.

To further investigate the inferred trajectories shared by various EMT systems, we develop and analyze cell transition models by defining and measuring three metrics emergent from EMT cell population dynamics. Differences between inferred EMT trajectories and their integrations with scRNA-seq data are then analyzed. Our integrative approach, which fuses unsupervised learning of gene expression data at single-cell resolution along with principle-guided cell population model, provides multiscale effective connections between genes and cells in analyzing complex cell fate decision that involves ICS, multiple trajectories, and genes that mark transitions.

MATERIALS AND METHODS

Method details

Overview of QuanTC

QuanTC takes the scRNA-seq data matrix as input to construct a cell-cell similarity matrix using a consensus clustering method (Figure 1) (20). Via non-negative matrix factorization (31), a method of soft clustering, QuanTC then calculates the probabilities of a given cell belonging to the identified clusters (Figure 1C). To detect transition cells (TC), the cell-to-cluster probabilities are next used to measure the plasticity of each cell, i.e. the extent to which the cell may change its cluster identity. To better visualize cells in transition, we project cells to a low-dimensional space based on a probabilistic regularized embedding (PRE) (Figure 1C). The transition trajectories are then inferred by summing the cluster-to-cluster transition probabilities that are calculated from cell-to-cluster probabilities and TC between clusters. The clusters in the middle of the transition trajectories are denoted as ICS. The transition genes and marker genes of clusters are obtained through factorizing the gene expression matrix as product of cell-to-cluster probabilities and likelihoods of genes uniquely marking each cluster.

Figure 1.

Figure 1.

Outline of key components of the approach in analyzing transition cells and ICS. (A) Input single-cell transcriptomic datasets to an unsupervised learning method (QuanTC) to explore the transition cells, transition genes and other transition properties. (B) Develop multi-scale agent-based of gene regulatory network and cell-population dynamics models to validate and test outputs from QuanTC. (C) Overview of QuanTC: 1) feature selection and consensus clustering, 2) calculation of cell-to-cell similarity matrix, 3) computing cell-to-cluster matrix via NMF, and 4) using probabilistic regularized embedding (PRE) for two-dimensional visualization: Each solid circle represents one cell, colored by the value of Cell Plasticity Index (CPI) that quantifies the transition capability of each cell, and each larger circle represents the center of a stable cell subpopulation.

Feature selection and consensus matrix construction

We start by removing the low-expressed cells (expressed Inline graphic of the total number of genes), and the rare and ubiquitous genes that are either expressed in less than Inline graphic of cells or expressed with low variance (Inline graphic) among all cells (Figure 1C). Then we fit expressions of each gene with a Gaussian mixture model consisting of three distributions and use the weights and means of the model to choose the most informative (bimodal distributed) genes. We remove the rarely expressed genes for which the components of the mixture models with mean Inline graphic accounting for more than Inline graphic weights. To select the bimodal distributed genes, we rank the remaining genes according to two criteria. We first sort the difference between means of the top two components in descending order. Then we sort the difference between weights of the top two components in ascending order. By aggregating the ranks of the two orders, we select the top Inline graphic informative genes for further analysis.

Quantifying transition cells via cell plasticity index (CPI)

QuanTC computes a cell-to-cell similarity matrix, Inline graphic, through the cluster-based similarity partitioning algorithm to estimate the similarity between cells. A binary matrix is constructed for each clustering outcome such that two cells are classified within one cluster, the corresponding value in the binary matrix is one, otherwise zero. A cell-to-cell similarity matrix Inline graphic is calculated as the mean of the binary matrices constructed from clustering, leading to a symmetric non-negative matrix.

Through symmetric non-negative matrix factorization (31–34), the cell-cell similarity matrix Inline graphic is decomposed into a product of a non-negative low-rank matrix Inline graphic and its transpose (Inline graphic is the number of cells, Inline graphic is the number of clusters) (Figure 1C):

graphic file with name M23.gif (1)

Each column of Inline graphic represents a cluster and each row of Inline graphic corresponds to the relative weights of a cell belonging to all the clusters. In other words, Inline graphic contains the clustering information of cells: the largest element in each row showing the cluster identity of the corresponding cell and the likelihood of a cell belonging to each cluster. The number of clusters Inline graphic is estimated by analyzing the largest gap of the sorted eigenvalues of symmetric normalized graph Laplacian (Supplementary Figure S1A).

By normalizing each row of Inline graphic, we obtain a probability-like matrix Inline graphic where Inline graphic represents the probability of cell Inline graphic belonging to cluster Inline graphic. QuanTC uses an entropy approach to characterize the degree of plasticity of each cell through a Cell Plasticity Index (CPI) (for cell i) defined as (Figure 1C):

graphic file with name M33.gif (2)

A cell undergoing the transition between clusters has higher entropy in contrast to cells located in one well-defined cluster. A higher value of CPI for a cell implies the cell is more plastic, making transition between clusters.

Visualization of transition trajectories

In order to faithfully capture both transition trajectories and discrete cell states, the cells are visualized through a probabilistic regularized embedding (PRE) approach using a probability-like matrix Inline graphic in a low-dimensional space (Figure 1C). We first calculate the cluster-cluster relationship from Inline graphic, where each row of Inline graphic denotes to what extend the cells belonging to each cluster while each row of Inline graphic defines a distribution of weights over all cells in the cluster. The locations of cluster centers Inline graphic in the two-dimensional space are then computed via the projection of the cluster-cluster relationship (35). The projection of cells Inline graphic is achieved by aligning each cell to the cluster centers based on the probabilities while keeping cells separate from each other through the following constraint:

graphic file with name M40.gif (3)

The cluster with possible transitions to all the other clusters, which shows strong potential of high plasticity, is considered as a candidate for an ICS. The potential transition trajectory among clusters are then inferred via selecting one of the non-ICS (e.g. epithelial cells) as the initial cluster and ordering the clusters according to transitions. Two clusters are considered as neighbor if there are TC between them. By aligning cells along the potential cluster transition via the probability matrix Inline graphic, QuanTC detects the transition trajectories. A cell Inline graphic is aligned between cluster Inline graphic and Inline graphic if the two largest elements of Inline graphicth row of the probability matrix Inline graphic are located at Inline graphicth and Inline graphicth columns. The cells aligned from cluster Inline graphic to Inline graphic are then ordered in ascending CPI with the largest element at Inline graphicth location and in descending CPI with the largest element at Inline graphicth location. The starting cell is selected as the cell with the largest probability belonging to the chosen initial cluster. In the method, multiple transition trajectories might exist, and the probabilities of occurrence of different transition trajectories are calculated by the percentage of cells included in each trajectory over the entire cell population size.

Furthermore, QuanTC calculates its own pseudotime of cells in each transition trajectory. A cell's pseudotime value is calculated as the Euclidean distance in PRE from the starting cell. In order to make the pseudotime value comparable for cells from different trajectories, we scale the range of pseudotime values between neighboring clusters to obtain a global pseudotime value of each cell by using the minimum value along all possible transition trajectories.

Finding cluster marker genes and the transition genes that mark transition

In order to identify the marker genes of clusters, we calculate the probabilities for each gene to uniquely mark a cluster. This is achieved by minimizing the difference between the submatrix Inline graphic, containing cells from one inferred transition trajectory of the original feature selected gene expression Inline graphicand the submatrix Inline graphic, with such cells of the factorized matrix Inline graphic (Inline graphic is the number of genes):

graphic file with name M58.gif (4)

The optimization solution leads to a gene-cluster matrix Inline graphic to ensure that the factor matrix Inline graphic is similar to Inline graphic derived from the consensus similarity matrix. Then the gene-cluster matrix Inline graphic can be used to infer transition genes and marker genes. Each column of Inline graphic, after normalization, describes likelihoods for the corresponding gene to uniquely mark the clusters. Each row of Inline graphic, describes how well the genes delineate the corresponding cluster. The marker genes of cluster Inline graphic are the genes with the largest values located at Inline graphicth row of the column-normalized Inline graphic. The marker genes of a specific cluster are then ordered based on their corresponding elements in row Inline graphic of the column-normalized Inline graphic. The difference of the top two elements of each gene is chosen to be greater than a given value (default value is 0.03) to ensure that the gene is differentially expressed in cluster Inline graphic. The default value of Inline graphic is 10, and how Inline graphic depends on the parameter is investigated, showing robustness of the method (Supplementary Figure S1B).

In order to uncover genes that mark the transition, that is, the genes varying most among the transition (Figure 1A), we select the marker genes of the two clusters involved in the transition and calculate the Spearman's rank correlation coefficient between gene expression and the order of cells by CPI undergoing transition. Genes with absolute value of Spearman's rank correlation coefficient above a specified threshold (default value is 0.64) are considered as transition genes for the transition of the two clusters. A positive coefficient implies the gene expression levels of aligned cells show increasing changes while the negative coefficient implies decreasing in gene expressions during transition.

Multiscale agent-based single-cell model based on gene regulatory network

A multiscale model is constructed to track the gene expression values in each cell using an EMT regulatory circuit of genes (7) that are stochastic in time. 18 ordinary differential equations are used to describe the expression levels over time based on a previous study (7). With certain parameters, the circuit has four distinct stable steady states. Each cell is located at one of the four steady states or makes transition towards those steady states. The transition between different steady states may be caused by external signals or induced by stochastic influences over time. In the model, we make the following assumption:

  1. The initial population is composed of 200 cells: 50 epithelial cells (E), 50 first intermediate cells (I1), 50 second intermediate cells (I2) and 50 mesenchymal cells (M).

  2. All cells divide at a normally-distributed rate Inline graphic (Inline graphic refers to a normal distribution). The time unit in the model is hour and the parameter values of the model are chosen based on a previous study (7). Every time a cell divides, its expression levels of all the EMT factors are used as initial conditions to its daughter cells. The gene expression levels of each cell are compared to the expression levels of different stable steady states in the EMT spectrum to determine the cell's phenotype. The E state is characterized by high Ecad expression, and M state is characterized by high Vim expression. I1 and I2 states are characterized by both relatively high Ecad and Vim expression while I1 corresponds to stronger Ecad expression and I2 corresponds to stronger Vim expression among the stable steady states (Supplementary Figure S2A). The cells not at any steady states are considered as TC.

  3. Stochastic effects are integrated into our model by adding two types of noise (Supplementary Figure S2B). (a) we first perturb the expression levels of the mother cell upon its division into two daughter cells:
    graphic file with name M75.gif
    graphic file with name M76.gif
    graphic file with name M77.gif
    In this case, the noise added at the division is the expression levels of mother cell multiplied by a normally-distributed rate. The perturbed expressions serve as the initial conditions for the daughter cells. (b) The multiplicative noise is applied to the parameters in the EMT model:
    graphic file with name M78.gif

    The function Inline graphic represents the EMT regulatory circuit dynamics and Inline graphic stands for the Wiener process with Inline graphic and Inline graphic. Inline graphic represents the noise amplitude with default value 0.01. We use Euler-Maruyama scheme to numerically solve the system.

  4. The number of times a cell can divide is described by a discrete uniform distribution Inline graphic with an equal probability chosen from a natural number between 2 and 7. Once the cell cannot divide any more, the cell dies at a normally-distributed rate Inline graphic

The multiscale model is simulated over a time span of five cell division cycles.

Dynamical system modeling of transition trajectories and three dynamic quantities

To reduce the parameter complexity and increase model accountability, we simplify the model to incorporate only three dimensionless parameters Inline graphic and Inline graphic (Supplementary Figure S3). For easy comparison, the direct transition rate (DTR) from E to M state is used as a base for comparison (set to one). The parameter Inline graphic represents the dimensionless cell-state transition rate from M state directly to the E state (i.e. the reverse DTR). We assume that Inline graphic to guarantee that E state is more stable at equilibrium when there is no induced EMT by extrinsic signal. It also incorporates the effects of other possible M-to-E transitions (MET) that might not be revealed by the trajectories in EMT datasets. The parameter Inline graphic depicts the forward transition rate between adjacent cell states along the ICS-regulated transition path, also denoted as the indirect transition rate (IDR) of EMT. We useInline graphic to represent the reverse cell-state transition rates along the indirect EMT routes with ICS. Based on the inferred transition paths (Results), we assume that Inline graphic and Inline graphic such that EMT is mainly carried out through the ICS-regulated trajectories, and the rate of EMT is significantly larger than the reverse MET along these trajectories.

Then the prescribed ordinary differential equations (ODEs) that describe the population fraction change of epithelial Inline graphic, mesenchymal Inline graphic and ICS Inline graphic can be derived.

graphic file with name M97.gif (5)
graphic file with name M98.gif (6)
graphic file with name M99.gif (7)
graphic file with name M100.gif (8)
graphic file with name M101.gif (9)

The initial conditions of ODEs are set as Inline graphic to assume only E cells initially. To tackle the stiffness problem introduced by large N or Inline graphic, we called ODE15s solver in Matlab to evolve the dynamical systems.

To study noise attenuation, we add the persistent white noise term to epithelial dynamics, Equation (5) to simulate the extrinsic fluctuation, i.e. we modify the dynamics as stochastic differential equation (SDE)

graphic file with name M104.gif (10)

where Inline graphic is the standard Wiener process with Inline graphic and Inline graphic and Inline graphic represents the noise amplitude, which is set as 1 in our simulation. We use Euler-Maruyama scheme to simulate system described by Equations (610).

The mesenchymal population fraction Inline graphic potentially measures how the EMT process adapts or responds to extrinsic signals or fluctuations, as well as the efficiency of transition from epithelial to mesenchymal cells. To quantify the three properties, in a model with Inline graphic intermediate states we define adaptation sensitivity (AS), noise attenuation (NA) and transition efficiency (TE) as

graphic file with name M111.gif

where Inline graphic denotes the mesenchymal population in the stochastic ODEs. The reliance of Inline graphic and Inline graphic on N and Inline graphic are investigated to study different EMT lineage structures and role of ICS in population-survival. We explore the AS, NA and TE as the functions of key parameters N and Inline graphic (Supplementary Figure S3B-D). From the single-cell data analysis, the embryonic EMT is associated with an increase of Inline graphic, while in cancer EMT there is a simultaneous increase of N and Inline graphic.

Roles of ICS in adaptation

When the ICS does not exist in the system, the dynamics of M population can be solved explicitly as Inline graphic, which is a monotonic function of time. Therefore, the adaptation sensitivity is zero in the two-state system. Generally, in the linear system Equations (59) with N ICS, the solution can be expressed as Inline graphic. When the eigen-valuesInline graphic are real andInline graphic have different signs, there could exist local maximums of Inline graphic trajectory, resulting in the non-zero adaptation sensitivity. Meanwhile, if the eigen-valuesInline graphic are complex, we even can have the oscillatory trajectory of Inline graphic before it reaches stationary state. Through numerical simulation, we validate that the adaptation sensitivity will increase with N when keeping other parameters as constant (Supplementary Figure S3B).

Quantification and statistical analysis

hESCs data

The single-cell qPCR data (36) was performed with 48 selected genes during a sequential EMT-MET from days 0 to 21. We start with 345 cells from day 0 to day 3. Based on the cell-cell similarity matrix resulting from consensus clustering (20), we use the largest gap of consecutive eigenvalues of symmetric normalized graph Laplacian to infer the number of cluster Inline graphic. The initial cluster chosen to be the start of transition trajectory because of including day 0 (epithelial) cells.

SCC data

We apply QuanTC to the SCC dataset (37) including 382 cells. After removing the low-expressed cells (expressed <5% of the total number of genes), 361 cells remain for further analysis. After feature selection, we use top 3000 genes for consensus clustering and inference of marker genes and transition genes. The cluster having the smallest number of TC around (i.e. low transition taking place) is considered as the start or the end of the transition trajectory. The initial cluster is named as E state based on the high expression levels of Epcam. Other clusters are named based on the inferred transition trajectories compared with the E-I1-I2-M spectrum in EMT. The cell-cycle phase of each cell is determined based on the computed cell cycle scores provided in Seurat (38,39).

Mouse embryonic development data

This scRNA-Seq data (40) includes cells from skin (155 cells), lung (176 cells), liver (123 cells), and intestine (173 cells) during E9.5 to E11.5. After removing the low-expressed cells (expressed <5% of the total number of genes), 155 skin cells, 176 lung cells, 123 liver cells and 173 intestine cells remain for future analysis as in SCC data.

HNSCC data

This dataset (41) has ∼6000 single cells from 18 head and neck squamous cell carcinoma (HNSCC) patients. We focus on six tumors from which the largest numbers of malignant cell transcriptomes and cells involved in EMT were acquired. The six tumors include patient 5 (132 tumor cells), patient 6 (123 tumor cells), patient 17 (330 tumor cells), patient 18 (140 tumor cells), patient 25 (209 tumor cells) and patient 28 (138 tumor cells). For each patient, we first use all the tumor cells, based on the selected features by QuanTC, for clustering. Similar to the original study (41), we remove the clusters having high expression levels of the cell cycle and stress markers because those cells are known not involved in EMT. For the remaining tumor cells, mostly similar to epithelial cells, we add 20 fibroblast cells to each dataset to act as a reference of mesenchymal cells. We then apply QuanTC to the mixed datasets of each patient. We notice that all the six datasets have four clusters including two ICS. The raw and filtered datasets are available on the package website (https://github.com/yutongo/QuanTC).

Mouse hematopoietic progenitors data

This scRNA-Seq data (42) includes 2018 cells. After removing the low-expressed cells (expressed <5% of the total number of genes), 1957 cells remain for further analysis. Twelve clusters are identified by QuanTC (Supplementary Figure S4A). The cells with high CPI values (>0.34) are considered as TC (Supplementary Figure S4B). Cluster C6, C7 and C12 are considered as non-ICS or a potential start or end of the transition trajectories because fewer TC exist in or around them (Supplementary Figure S4C) with weak capability of making transition. B cells and plasmacytoid dendritic cells (pDC) share a common progenitor (42). Cluster C6, C7 are B cells and pDC, respectively, based on the high expressions of the known marker genes (Ebf1, Irf8 and Siglech). Based on the relative number of TC between clusters (Supplementary Figure S4D), the transition trajectories C5–C8–C7 and C5–C11–C6 indicate that B cells (C6) and pDC (C7) share a common progenitor C5. The transition trajectories inferred by QuanTC are consistent with the previous findings (42). QuanTC identifies the maker genes and transition genes involved in the two transition trajectories (Supplementary Figure S4E). When ordering cells in the transition trajectories, the known lineage markers increase along the pseudotime (Supplementary Figure S4F).

Gene Ontology enrichment

The Gene Ontology enrichment analysis (43–45) is performed on the top 100 markers genes (Supplementary Table S2) of each ICS selected by QuanTC.

Comparison with Monocle 3

Monocle 3 (46) is applied to the simulation and SCC datasets (Supplementary Figure S5). While Monocle 3 separates Epcam+ tumor cells from Epcam tumor cells in SCC dataset, it is unable to obtain the known epithelial to mesenchymal lineage (Supplementary Figure S5A). However, if only using the top 3000 genes selected by QuanTC (Supplementary Figure S5B), Monocle 3 is able to capture the previously observed epithelial to mesenchymal lineage, suggesting usefulness of QuanTC in feature selection. For the simulation dataset, Monocle 3 separates different cell states, however, it cannot identify TC, consequently cannot obtain the transitions between clusters (Supplementary Figure S5C).

RESULTS

Our study consists of two major components: a) unsupervised learning of scRNA-seq data and b) modeling the inferred EMT dynamics (Figure 1). To scrutinize the transition of cells, we first propose QuanTC (Figure 1C, Materials and Methods), a method to quantify the transitional status of individual cells and identify the transition genes that mark the transition process and the marker genes that distinguish different cell states. The QuanTC is then validated on a multiscale agent-based stochastic model based on a core EMT gene regulatory network (Figure 1B). By applying QuanTC to twelve published single-cell datasets during embryogenesis or cancer, we reveal the common cell lineage structures mediated by the ICS. We finally model such cell lineages (Figure 1B) to investigate similarity and difference of identified cell lineages in terms of signal adaptation, noise attenuation and EMT transition.

QuanTC faithfully captures cell plasticity and transition trajectory in simulated datasets

To test capability of QuanTC in capturing transition cells and intermediate cell states, we first constructed a multiscale single-cell model using a core EMT/MET gene regulatory network (Figure 2A) (5,7,10,13,47). The new agent-based model dynamically describes the expression levels of genes featured in the regulatory circuit within individual cells, and explicitly includes cell division to track the individual cells. The cell state transition may be caused by the external signal (TGF-Inline graphic) or stochastic effects in cell division and/or gene regulatory dynamics (Supplementary Figure S2B). The single-cell model outputs a group of single cells along with the expression values of the 18 modeled regulatory components at each temporal point (Materials and Methods) to mimic an EMT scRNA-seq dataset.

Figure 2.

Figure 2.

Testing QuanTC on simulated EMT datasets and a qPCR dataset for hepatic differentiation of hESCs. (A) The EMT gene regulatory network used in the multi-scale agent-based model; blue: epithelial promoting factor; purple: mesenchymal promoting factor. (B) Illustration of the modeling output: each cell colored by its true state labels. (C) A simulation dataset: the proportion of each state induced by the previous cell states at the end of each cell cycle. The size of the dot is proportional to the number of cells, and the color denotes the cell states of the mother cell. The arrows represent the occurred state transitions and the circle represents the state of the daughter cell. It shows the transition dynamics of each state. (D, E) PRE visualization of each cell at the end of first cell cycle (a circle) colored by its true state from the model (D) and the calculated CPI value (E). The percentage for each cell type is the percentage of a given cell type over the entire cell population size. (F) Clustering and PRE visualization of the qPCR dataset. Each dot represents one cell colored by the identified state, and its shape represents its real time. (G) Percentage of TC in each state relative to the total number of TC with colors consistent with (F). Dashed box: the intermediate cell state. (H) Comparison of the inferred pseudotime and the day collected in the experiment of each cell. The parameters are provided in Supplementary Table S1.

One typical model simulation exhibits four distinct stable steady states corresponding to four cell phenotypes: epithelial state (E), two intermediate cell states (I1 and I2) and mesenchymal state (M) (Figure 2B). The intermediate state closer to the E is denoted as I1, and the one closer to the M as I2. The cells that have not reached any of the steady states are considered as transition cells (TC). In this simulated system, initially each state consists of 50 cells and after five cell cycles the system grows to 2030 cells. To detect possible transitions between the different states, the cells at the end of each cell cycle were tracked back to the previous cell cycle to identify their mother cells (Figure 2C and Supplementary Figure S6A). For example, E cells were found to come from TC whereas M cells came from TC with few from I1 and I2. The observed transitions among the four states indicate that TC have the strongest capability to give rise to all different EMT subpopulations with the cells in ICS next in such transition capability. Interestingly, E and M cells show less potential to make transitions directly (Figure 2C and Supplementary Figure S6A).

The simulation dataset provides the true label of each cell and its transition details. Applying QuanTC to the data collected at the end of the first cell cycle, we identified four cell states and TC between them (Figure 2D, Materials and Methods). Principal component analysis (PCA) was unable to separate different states at the end of later cell cycles let alone detecting the potential transitions between states (Supplementary Figure S6D, E). To quantify the transition capability, we computed cell plasticity index (CPI) of all cells (Figure 2E) and found that the TC marked using modeling data have relatively high CPI values while cells closer to the primary states have lower CPI values. More TC with higher CPI values were found to be around the two ICS (Supplementary Figure S6B, D, E), suggesting high transition potential of ICS.

The transition genes that mark the transition processes between states, and the marker genes of identified states were uncovered using QuanTC (Supplementary Figure S6C). Ecad and ZEB, along with other genes sharing the similar expression behavior, were found to be marker genes of E and M cells. As for ICS cells, while no clear state marker genes were identified, multiple transition genes are highly expressed due to their strong potential to make transitions (Supplementary Figure S6C).

Through cell state identification, estimating cell plasticity, and inferring marker and transition genes, QuanTC recapitulates the observed states and their transitions in the single-cell model that can be explicitly delineated.

A near synchronous EMT though one ICS during embryonic stem cell differentiation

Previous studies revealed a global epithelial–mesenchymal–epithelial transition during the hepatic differentiation of human embryonic stem cells (hESCs) (48). Recently, a single-cell qPCR analysis with 48 selected genes was performed to study this process (36). In this dataset, cells from day 0 are all epithelial cells in a pluripotent state while cells at day 3 are definitive endoderm (DE) cells in a typical mesenchymal-like status. Cells from day 0 to day 3 are found to follow a near synchronous EMT.

We applied QuanTC to the dataset of 345 cells from day 0 to day 3, identifying three clusters (Figure 2F). Two clusters are E (high expression of pluripotent marker gene SOX2) and M (high expression of DE marker genes FOXA2 and GATA6) whereas the other expresses both epithelial marker gene CDH1 and DE marker gene FOXA2 (Supplementary Figure S7), named as intermediate state I.

Next we quantified the transition dynamics of EMT in embryonic stem cell differentiation using QuanTC. We found that the cells located around the overlapping space between clusters have higher CPI values, while cells closer to cluster centers have lower CPI value (Supplementary Figure S7A). More TC with higher CPI values locate around the identified state I, suggesting that the I state has high potential to make transitions to both E and M (Figure 2G). The transition trajectory from E to M via I state includes 99.7% of total cells, indicating that the ICS-mediated path dominates the cell transitions during EMT.

The cells in early pseudotime were found to be the same ones in early real time (Figure 2H), suggesting the transition from day 0 to day 3 follows a near synchronous EMT, a result consistent with the experimental observations on differentiation of hESCs to hepatic lineage (36).

Novel transition genes and marker genes of the three states were identified (Supplementary Figure S7B-C). MIXL1, the marker of DE, is identified as a transition gene from E-I, because its expression level increases gradually during E–I transition (Supplementary Figure S7D). Two pluripotency markers, POU5F1 and NANOG, and other genes sharing similar expression profiles are transition genes of I–M because of the observed gradual decrease from I to M.

For this dataset, QuanTC not only captures the synchronous EMT but also detects ICS that express both E and M markers. The ICS identified by QuanTC shows strong transition dynamics and ICS-regulated path dominates the cell transitions during EMT.

Multiple ICS found in mouse skin tumor dataset

To study epithelial-to-mesenchymal transition in cancer (1,49), we applied QuanTC to a skin squamous cell carcinoma (SCC) dataset, in which multiple tumor subpopulations associated with different EMT stages were identified, and some of them displayed hybrid phenotypes that likely represent multiple distinct ICS in vivo (37). This dataset of 382 cells on skin tumors contains FACS-isolated epithelial YFP+Epcam+ tumor cells, which are relatively homogeneous, and mesenchymal-like YFP+Epcam tumor cells, which are more heterogeneous (37).

Four clusters were identified by QuanTC, showing two clusters are clearly E and quasi-mesenchymal (QM) states (Figure 3A and Supplementary Figures S8–S9) and the two other clusters, labeled as I1 and I2, express both epithelial marker gene Dsp and mesenchymal marker gene Vim. Nearly all epithelial YFP+Epcam+ cells were found in the E state while most mesenchymal-like cells were clustered into I1, I2 or the QM state. The remaining mesenchymal-like cells were clustered into E but closer to I1, similar to the I1 cells. The overall cell distributions in four different states are very much consistent with the previous observed Epcam+ and Epcam cells in their levels of heterogeneity (37).

Figure 3.

Figure 3.

Analyzing EMT in mouse skin squamous cell carcinoma (SCC) dataset using QuanTC. (A–C) Visualization of cells via PRE. (A) Each star or solid circle colored by the corresponding cell state represents one of the 67 epithelial YFP+Epcam+ and 292 mesenchymal-like YFP+Epcam- tumor cells. (B) Identification of TC. Each dot is colored by its CPI value. The cells outside circles with relatively high CPI values are considered as TC. The parameters are given in Supplementary Table S1. (C) Transition trajectory inference. Arrowed solid and dashed lines show two main transition trajectories, with cells colored based on their pseudotime. (D) Percentage of TC associated with each state relative to the total number of TC. (E) Percentage of TC between two states relative to the total number of cells. (F) Visualization of marker genes and transition genes between states. Each triangle represents a gene colored by its type and arrowed lines indicate the transition direction of EMT. (G) Expression levels of top transition genes with cells ordered along the two most probable transition trajectories. Solid lines, smoothed expression curves for each gene in the transition trajectory. (H, I) Heat map of normalized expression of marker genes and transition genes. Columns represent cells ordered along the transition trajectory and rows represent genes. Coloring represents the normalized expression value of each gene. Transition genes are marked in the box. Top: CPI values of each cell along the transition trajectory.

Novel transition trajectories from E to QM were revealed according to the locations of TC (Figure 3B). There are two main transition trajectories: E-I1-I2-QM and E-I1-QM, which consist of 94% of cells (Figure 3C). This suggests the two most probable transition trajectories from E to QM both pass through ICS. The I1 and I2 states, consisting of TC from all the other states around them (Figure 3D), show strong capability of making transition—a nature property of cells in intermediate cell state. The transition between I1 and QM was found to have most TC (almost 30% TC in total) followed by the transition between I1 and I2 (Figure 3E).

The identified marker genes of E (Figure 3F-I) have a broad agreement with known markers of epithelial cells (50) (Supplementary Figure S9), with their levels of transition genes varying significantly during transition. For example, Lad1 decreases gradually and Pdgfrb increase gradually as E cells transition to I1.

Using QuanTC we identified new marker genes for ICS, with some of them shown to have special functions in EMT via separating ICS from the mesenchymal-like states. For example, Igf1 and Mfap2, differentially expressed in I1 state, have been shown to induce EMT in hepatocellular carcinoma and in gastric cancer cells respectively (51,52). As a result, ICS can be identified not only via co-expression of epithelial and mesenchymal markers but also through specific ICS markers.

The two ICS, I1 and I2 states, are indeed distinct cell states based on the Gene Ontology enrichment analysis of the top marker genes of I1 and I2 states. Both I1 and I2 states share similar biological processes including cell migration and cell motility (mesenchymal features), in addition to proliferation and cell-to-cell communications (Supplementary Table S2). The ability of regulating cell communication and signaling is uniquely found for ICS. I1 state not only has all the biological processes included in I2 state but also has the unique biological processes related to cell adhesion that shares with the epithelial cells. This suggests that the cells in ICS display hybrid epithelial/mesenchymal features (11) as well as communicates with other cells through cell signaling (9,53).

EMT via ICS during mouse embryonic development

scRNA-seq datasets were collected for four organs and tissues of E9.5 to E11.5 mouse embryos: skin (155 cells), lung (176 cells), liver (123 cells), and intestine (173 cells) (40). Applying QuanTC to the four datasets, three clusters were observed for each dataset (Figure 4). Based on the known cluster labels of epithelial and mesenchymal cells (40) and the marker genes inferred by QuanTC, two clusters are clearly E and M cells (Figure 4 and Supplementary Figures S10–S11). The remaining cluster is located between E and M, with more TC of higher CPI values around it, showing clear characteristics of ICS. The cells close to the I state matches the known labels well, exhibiting mixture of features of epithelial and mesenchymal cells (40).

Figure 4.

Figure 4.

Comparison analysis of EMT during organogenesis in intestine, liver, lung and skin. (A–D) Top: the expression levels of E-I transition genes (green) and I-M transition genes (blue) along the E–I–M transition colored by inferred state of cells. Solid lines are smoothed expression curves for each gene in the transition trajectory. Bottom: Cells are ordered along a line according to their pseudotime values. Each dot represents a single cell shaped by the cell states previously identified in the original study on the corresponding dataset and colored by the CPI value. The parameters are given in Supplementary Table S1.

In the four datasets, >86% cells were found to be involved in the newly discovered E-I-M transition trajectory, suggesting most cells undergoing EMT via the intermediate cell state instead of direct transition from E to M (Supplementary Figures S10–11A, G). Except for skin having only a few more TC in E–I than I–M transition, the other three have significantly more TC in I–M transition than E-I transition (Supplementary Figures S10–11D, J). This observation suggests that I and M states are potentially more similar to each other whereas E could be a distinct state.

Gene Ontology enrichment analysis of the top marker genes (Supplementary Table S2) indicates that the ICS from intestine and liver share several biological processes, including cellular component movement, cell motility and cell migration (mesenchymal features), cell adhesion (epithelial features), regulation of signal transduction and cell communication. The ICS from lung and skin relate to the mesenchymal and epithelial cell differentiation. Interestingly, the transition genes inferred from the four organs or tissues are quite different (Supplementary Figures S10–11), indicating that genes regulating EMT may vary under different conditions at different developmental stages.

Comparisons with another state transition method and inference of gene regulatory networks

To further investigate the transition in EMT and validate QuanTC, we next used a previously developed state transition index Inline graphic to predict transitions based on a different method that uses correlated information between cells and genes (54). The index Inline graphic serves as an early warning signal of a critical transition that coincides with lineage commitment (54). By evaluating Inline graphic for all five datasets, we found nearly all TC identified via QuanTC admit higher Inline graphic than the cells in the stable states (Figure 5A), consistent with the observation that TC are the cells involved in the transition process. The relatively low cell–cell correlation and high gene-gene correlation (Supplementary Figure S12A) during state transitions correspond to the idea that the state transition involves a decrease of cell–cell correlation and concomitant increase of gene–gene correlation. One exception happens for the E–I trajectory in lung, partly due to a very small number of TC cells (only three cells) identified between E and I.

Figure 5.

Figure 5.

State transition index and gene regulatory networks for five EMT datasets and their comparisons with QuanTC outputs. (A) State transition index of relatively stable cells in each state and the TC between states. Dashed box: TC with high value of state transition index. (B) Gene regulatory networks of top marker genes and transition genes using the PIDC algorithm from the SCC and mouse embryonic development datasets (the top ∼80% of edges are shown). The parameters are given in Supplementary Table S1. Each dot represents a gene colored by its type. Each large dashed circle labels marker genes of a particular cell state. Graph edges indicate the top interactions and the length of the edge is inversely proportional to the interaction strength between genes.

To investigate how transition genes may regulate state marker genes in EMT, we inferred gene regulatory networks of both state marker genes and transition genes via the PIDC algorithm (55). The inferred markers of different states were projected into lower-dimensional space, with top genes marked by their states or transition trajectories and the edge length, which is inversely proportional to the interaction strength between genes (Figure 5B and Supplementary Figure S12B). Two genes that are close to each other with a short edge indicate a strong regulatory interaction, in contrast to genes located away from each other with a longer edge between them.

For example, in the SCC dataset, E markers are mostly linked to I1 markers through E-I1 transition genes, and marker genes of I1 and I2 are linked directly or via I1-I2 transition genes, showing a gene regulatory circuit consistent with the inferred trajectory and CPI values using QuanTC (Figure 3B, C). In addition, marker genes of I2 and QM are linked directly or via I2-QM transition genes along with an edge linking markers of I1 and QM to I1-QM transition genes nearby, suggesting that E-I1-QM is another transition trajectory, consistent with the two previously inferred trajectories (Figure 3C). Interestingly, markers of E have longer edges linking to other marker genes, suggesting the relative dissimilarity of E to I1, I2 and Q, consistent with our findings directly using QuanTC (Figure 3). Similar structures in gene regulatory networks were seen among the intestine, liver and lung. In particular, marker genes of E, I and M form distinct groups and markers of E and I are linked directly or via E–I transition genes, while markers of I and M are linked directly or via I-M transition genes. Interestingly, for skin, different markers are much less separated compared to other three embryonic development systems, except for markers of E, suggesting the transitions and the genes regulating the transition in developing skin could be more intermingled and complicated.

Dynamical properties of inferred ICS-regulated EMT trajectories

To explore the dynamics of the inferred transition trajectories, we developed a cell population model that contains multiple ICS and only relies on three effective dimensionless parameters (Materials and Methods, Supplementary Figure S3A). Subsequently, three emergent quantities were then defined to measure the EMT population dynamics (Figure 6A, Materials and Methods): (i) sensitivity of signal adaptation, (ii) coefficient of variance (CV) to quantify noise attenuation and (iii) the efficiency of population transition from epithelial to mesenchymal states. We then investigated how the existence of ICS, as well as the transitions via ICS, affect the robustness and efficacy of EMT dynamics using these three quantities.

Figure 6.

Figure 6.

Dynamical properties of inferred ICS-regulated EMT trajectories. (A) The definitions and measurements of three quantities – adaptation, noise attenuation and population transition properties of cell population dynamics. (B) The key parameters of model including ICS number N and ITR gamma (see also Materials and Methods, Supplementary Figure S3). Increase of ICS number N can result in the multiple peaks in M population trajectory, forming the oscillatory adaptation. (C) Effect of tuning N and gamma on the three quantities (see also Supplementary Figure S3). (top row) Changes in three quantities by fixing N = 2 and tuning gamma from 5 to 80. The increase in ITR gamma lowers the noise coefficient of variance (CV) of output M population, and increases the transition efficiency from E to M. The signal adaptation sensitivity is not a monotonic function of gamma, which reaches the peak before a certain threshold and declines afterwards with further increase in gamma. (bottom row) Change of three quantities by fixing gamma and tuning N from 1 to 18. The increase in N improves adaptation sensitivity and noise attenuation, however reducing the value of transition efficiency. (D) Tuning parameter gamma and N separately cannot achieve all the desired properties (i.e. simultaneous increase of adaptation sensitivity, noise attenuation and EMT efficiency, indicated by brown dashed line). The desired properties can be achieved by increasing ITR gamma (blue line, increase gamma from 5 to 80 and fix N as 1) first and increasing N subsequently (red line, increase N from 1 to 8 and fix gamma as 80). (E) EMT trajectories inferred from SCC dataset, with node colors consistent with Figure 3. Other inferred trajectories are shown in Supplementary Figures S12–S13. The arrow represents potential transition between states, and number represents the percentage of TC. The red arrows indicate the major transition trajectory mediated by ICS, and the dashed arrow refers to the direct transition route from E to QM state.

The signal adaptation property is demonstrated by the reset of output level after the response to stimulus in cell populations (Figure 6A). In cancer EMT, adaptation with high sensitivity permits the transient peak of the massive release of malignant mesenchymal population, forming the effective metastasis strategy under the immune regulation. In the two-state system with only pure epithelial or mesenchymal states, we rigorously proved that no adaptation is allowed (Materials and Methods). The modeling results suggest that both the increase in ICS number and the moderate increase in indirect transition rate (ITR) via the ICS (Supplementary Figure S3B, Materials and Methods) can increase the adaptation sensitivity (Figure 6C), however, further increase in ITR (over a certain threshold) can instead decrease the sensitivity. Interestingly, the increase in ICS number may result in the oscillatory adaptation of cell population dynamics, i.e. the M population goes through multiple peaks before reaching a steady level (Figure 6B). This potentially provides a ‘hide-and-seek’ strategy for metastatic mesenchymal cells battling with immune systems in cancer.

The noise attenuation property depicts the system's capability to reduce fluctuations in population dynamics. Both the increase in ICS number and ITR help reduce the CV of M population trajectories (Figure 6C), stabilizing the dynamics in population transition. The property of population transition is quantified by the final fraction of M population that originates from pure E population. The increase in ITR results in boosting of population transition efficiency in EMT, while the increase in ICS number reduces such efficiency.

The trade-off between adaptation sensitivity and transition efficiency were observed in EMT (Figure 6C, D). Although larger ICS numbers may increase adaptation sensitivity, it also impairs the effective transition toward M state (Figure 6C). On the other hand, increasing ITR can boost efficiency while the overly-large value results in a decrease in adaptation sensitivity. Hence, an increase in one parameter only, either ICS number or ITR, fails to optimize all the properties simultaneously (Figure 6D). The transition trajectories may need a combined increase in both ICS and ITR to achieve the desired property, as seen in the inferred SCC transition trajectories (Figure 6E).

The derived relationships between three emergent quantities and EMT population parameters shed light on our findings obtained from single-cell EMT data mining. Based on the percentages of TC between state transitions among all the cells involved in EMT, we quantified the EMT trajectories in twelve single-cell datasets by QuanTC (Figure 6E and Supplementary Figures S12C, S13B), which include six additional head and neck squamous cell carcinoma (HNSCC) datasets (Materials and Methods, Supplementary Figure S13). For all the investigated mouse and human datasets from both normal and tumor tissues, we found that the majority of transitions involve ICS while the direct transition between epithelial and mesenchymal states is relatively rare (Supplementary Figures S12C, S13B). This corresponds to the increase in ITR in the model, resulting in the strengthening of noise attenuation property (Figure 6C), as well as enhancement of adaptation sensitivity (provided that increase in ITR does not over-exceed the observed threshold in Figure 6C). Besides, compared to only one ICS involved in EMT in embryo, cancer EMT has more numbers of ICS. Therefore, in cancer EMT the adaptation sensitivity of population dynamics is further enforced by the presence of multiple ICS, with sacrifice of E-to-M transition efficiency. In comparison, in embryogenesis EMT fewer ICS and the large ITR flux can lead to higher E-to-M transition efficiency, however, at the cost of lower sensitivity of population dynamics adaptation.

DISCUSSION

By unsupervised learning of transition trajectories in twelve EMT single-cell datasets and multiscale mathematical modeling, we have analyzed transition cells and dynamics of EMT that highlights the transition trajectories mediated by ICS. By investigating several emergent dynamic quantities of describing transitions, we have suggested that the inferred transition trajectories not only attenuate the noise, but also enhance the signal adaptation in EMT. Modeling analysis has indicated cancer EMT trajectories strengthen the signal adaptation, whereas trajectories in embryogenesis EMT is in favor of effective population transition toward mesenchymal states.

Compared with direct clustering (20,28) and pseudotime analysis (56–58) for scRNA-seq data, the unsupervised learning algorithm QuanTC can simultaneously detect the intermediate cell states, and construct transition trajectories via quantifying the cell plasticity. An attractive feature of QuanTC is its soft clustering approach to identify cells in mixed states or undergoing transition between states, a ubiquitous property in many cell fate systems. The projection of cells in PRE marked by CPI for transitions offers a parsimonious and meaningful alternative to analyzing a large number of discrete cell states. To compare with other methods, we have applied the popular pseudotime inference method Monocle 3 to the simulation datasets and SCC datasets (Materials and Methods, Supplementary Figure S5). While Monocle 3 correctly depicts the overall progression of epithelial-mesenchymal transition, it lacks the resolution to distinguish transition cells from other stable cells. In addition, the trajectories inferred by Monocle 3 strongly depends on input gene selections. Interestingly, the features selected by QuanTC could improve the consistency of trajectory inference by Monocle 3 in SCC dataset (Supplementary Figure S5), suggesting usefulness and its broader application of the feature selection function in QuanTC.

Unlike other methods that can only infer marker genes for cell subpopulations, such as a recent random coefficient matrix-based regularization method on identifying transition cells (59), QuanTC can uncover key genes that mark the state transitions. The projection of cells in PRE marked by CPI for transition processes offers a parsimonious and meaningful alternative to analyzing a large number of discrete cell types. Besides, QuanTC is adaptive to the downstream analysis of other soft clustering methods and is applicable to systems beyond EMT. For instance, we applied QuanTC to a single-cell RNA-seq dataset of ∼2,000 mouse hematopoietic progenitors (Materials and Methods, Supplementary Figure S4). We found two prominent non-ICS, i.e. plasmacytoid dendritic cells (pDCs) and B cells, exactly corresponding to the target states identified in the original study (42). The transition cells along the trajectory indicates that pDCs and B cells share the same progenitors, consistent with the findings based on the FateID inference (42).

A multiscale agent-based model of EMT gene regulatory network has been developed to generate simulation data with the ground truth, allowing easy validation of our unsupervised learning method QuanTC. Previous models were mainly focused on the regulation mechanisms of EMT by ODEs with feedback control to identify important agents that are responsible for initiating or suppressing EMT (3,5–7). In those models, cell activities or states defined by changes in gene expressions are confined within each individual cell. We have extended the modeling of EMT to a heterogeneous population of cells, while still incorporating gene regulatory networks, offering a convenient framework to explore cell proliferation by monitoring the changes in gene expressions prompted by interactions between various EMT agents, which is important for cancer studies (23,60,61). Our model explicitly incorporates stochastic effects caused by each cell division (62,63) that may affect cell fates. Our model can also easily incorporate different assumptions on proliferative dynamics of each cell state. For example, we have analyzed a case in which the I1 cells are assumed to be non-proliferative (Supplementary Figure S14) to investigate ICS under cell cycle arrest during EMT (64,65).

Interesting trade-offs among signal adaptation, noise attenuation and effective transition have been observed in modeling analysis. Consistent with previous findings (13), the increase in ICS number during EMT attenuates fluctuations; in addition, boosting the transitions via ICS (i.e. ITR) also plays the similar role in noise buffering. The concept of adaptation sensitivity, previously mainly used for signal transductions (16,18), was introduced in this study to quantify the transient, adaptive dynamics in EMT populations. Such transient property were previously reported in breast cancer cell lines (15), and theoretically studied in the context of non-equilibrium statistical physics. Interestingly, the increase of ITR alone cannot improve adaptation persistently, and the robust adaptation in population dynamics requires both large ITR and multiple ICS, a result consistent with the learned single-cell trajectories in SCC. We reason that the transient peaks in highly-adaptive trajectories ensure adequate release of mesenchymal cells, with the short-lasting times impeding immune systems to efficiently capture and respond timely to metastasis. It is very interesting to note that ICS in EMT are associated with poor prognosis of cancer treatment according to clinical studies (23)– our findings between ICS number and adaptation may serve as the potential explanation from cell population dynamics.

In our study, more efficient algorithms to explore cell-cell similarities will likely improve QuanTC significantly in its speed and ability to learn transition trajectories. The agent-based multiscale model can be further improved by adding new interactions between genes and cell-cell communications over time, and the inclusion of other cell types, such as immune cells, may gain further insights into the functional role of ICS. Overall, our integrative approach provides an initial attempt to bridge single-cell data mining and multiscale modeling to investigate transitions and role of intermediate cell states in EMT.

DATA AVAILABILITY

All the data analyzed in this paper has been previously published and can be accessed from original publications. hESCs (GEO: GSE70741), SCC (GEO: GSE110357), mouse organogenesis (GEO: GSE87038), HNSCC (GEO: GSE103322) and mouse hematopoietic progenitors (GEO: GSE100037) datasets were downloaded from the Gene Expression Omnibus. The code for QuanTC algorithm is available at https://github.com/yutongo/QuanTC, and the simulation code for multi-scale model is available at https://github.com/yutongo/Multiscale-agent-based-model-of-EMT.

Supplementary Material

gkaa725_Supplemental_Files

ACKNOWLEDGEMENTS

The authors are grateful for the discussions and suggestions from Dr Axel Almet and Dr Chris Rackauckas. Q.N., Y.S., S.W. and P.Z. conceived the project; Y.S. and S.W. designed the QuanTC algorithm and conducted the data analyses; Y.S. performed the GRN modeling and validation; P.Z. performed the cell population modeling; Q.N, Y.S., P.Z. drafted the manuscript with the help from all the authors; Q.N. supervised the research and writing. There is no conflict of interest between the authors.

Contributor Information

Yutong Sha, Department of Mathematics, University of California, Irvine, Irvine, CA 92697, USA; The NSF-Simons Center for Multiscale Cell Fate Research, University of California, Irvine, Irvine, CA 92697, USA.

Shuxiong Wang, Department of Mathematics, University of California, Irvine, Irvine, CA 92697, USA.

Peijie Zhou, Department of Mathematics, University of California, Irvine, Irvine, CA 92697, USA.

Qing Nie, Department of Mathematics, University of California, Irvine, Irvine, CA 92697, USA; The NSF-Simons Center for Multiscale Cell Fate Research, University of California, Irvine, Irvine, CA 92697, USA; Department of Developmental and Cell Biology, University of California, Irvine, Irvine, CA 92697, USA.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online.

FUNDING

National Institutes of Health [U01AR073159, R01AR044882, U54CA217378] (in part); National Science Foundation [DMS1763272]; Simons Foundation [594598 to Q.N.].

Conflict of interest statement. None declared.

REFERENCES

  • 1. Nieto M.A., Huang R.Y., Jackson R.A., Thiery J.P.. Emt: 2016. Cell. 2016; 166:21–45. [DOI] [PubMed] [Google Scholar]
  • 2. Sha Y., Haensel D., Gutierrez G., Du H., Dai X., Nie Q.. Intermediate cell states in epithelial-to-mesenchymal transition. Phys. Biol. 2018; 16:021001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Zhang J., Tian X.-J., Zhang H., Teng Y., Li R., Bai F., Elankumaran S., Xing J.. TGF-β–induced epithelial-to-mesenchymal transition proceeds through stepwise activation of multiple feedback loops. Sci. Signaling. 2014; 7:ra91–ra91. [DOI] [PubMed] [Google Scholar]
  • 4. Huang R.Y., Wong M.K., Tan T.Z., Kuay K.T., Ng A.H., Chung V.Y., Chu Y.S., Matsumura N., Lai H.C., Lee Y.F. et al.. An EMT spectrum defines an anoikis-resistant and spheroidogenic intermediate mesenchymal state that is sensitive to e-cadherin restoration by a src-kinase inhibitor, saracatinib (AZD0530). Cell Death. Dis. 2013; 4:e915. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Lu M., Jolly M.K., Levine H., Onuchic J.N., Ben-Jacob E.. MicroRNA-based regulation of epithelial-hybrid-mesenchymal fate determination. Proc. Natl. Acad. Sci. U.S.A. 2013; 110:18144–18149. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Tian X.J., Zhang H., Xing J.. Coupled reversible and irreversible bistable switches underlying TGFbeta-induced epithelial to mesenchymal transition. Biophys. J. 2013; 105:1079–1089. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Hong T., Watanabe K., Ta C.H., Villarreal-Ponce A., Nie Q., Dai X.. An Ovol2-Zeb1 mutual inhibitory circuit governs bidirectional and multi-step transition between epithelial and mesenchymal states. PLoS Comput. Biol. 2015; 11:e1004569. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Steinway S.N., Zanudo J.G., Ding W., Rountree C.B., Feith D.J., Loughran T.P. Jr, Albert R.. Network modeling of TGFbeta signaling in hepatocellular carcinoma epithelial-to-mesenchymal transition reveals joint sonic hedgehog and Wnt pathway activation. Cancer Res. 2014; 74:5963–5977. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Boareto M., Jolly M.K., Goldman A., Pietila M., Mani S.A., Sengupta S., Ben-Jacob E., Levine H., Onuchic J.N.. Notch-Jagged signalling can give rise to clusters of cells exhibiting a hybrid epithelial/mesenchymal phenotype. J. R. Soc. Interface. 2016; 13:20151106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Jia D., Jolly M.K., Boareto M., Parsana P., Mooney S.M., Pienta K.J., Levine H., Ben-Jacob E.. OVOL guides the epithelial-hybrid-mesenchymal transition. Oncotarget. 2015; 6:15436–15448. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Jolly M.K., Tripathi S.C., Jia D., Mooney S.M., Celiktas M., Hanash S.M., Mani S.A., Pienta K.J., Ben-Jacob E., Levine H.. Stability of the hybrid epithelial/mesenchymal phenotype. Oncotarget. 2016; 7:27067–27084. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Jolly M.K., Boareto M., Debeb B.G., Aceto N., Farach-Carson M.C., Woodward W.A., Levine H.. Inflammatory breast cancer: a model for investigating cluster-based dissemination. NPJ Breast Cancer. 2017; 3:21. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Ta C.H., Nie Q., Hong T.. Controlling stochasticity in epithelial-mesenchymal transition through multiple intermediate cellular states. Discrete Continuous Dyn. Syst. Ser. B. 2016; 21:2275–2291. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Rackauckas C., Schilling T., Nie Q.. Mean-independent noise control of cell fates via intermediate states. iScience. 2018; 3:11–20. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Gupta P.B., Fillmore C.M., Jiang G., Shapira S.D., Tao K., Kuperwasser C., Lander E.S.. Stochastic state transitions give rise to phenotypic equilibrium in populations of cancer cells. Cell. 2011; 146:633–644. [DOI] [PubMed] [Google Scholar]
  • 16. Ma W., Trusina A., El-Samad H., Lim W.A., Tang C.. Defining network topologies that can achieve biochemical adaptation. Cell. 2009; 138:760–773. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Ben-Jacob E., Coffey D.S., Levine H.. Bacterial survival strategies suggest rethinking cancer cooperativity. Trends Microbiol. 2012; 20:403–410. [DOI] [PubMed] [Google Scholar]
  • 18. Qiao L., Zhao W., Tang C., Nie Q., Zhang L.. Network topologies that can achieve dual function of adaptation and noise attenuation. Cell Syst. 2019; 9:271–285. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Jolly M.K., Celia-Terrassa T.. Dynamics of phenotypic heterogeneity associated with EMT and stemness during cancer progression. J. Clin. Med. 2019; 8:1542. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Kiselev V.Y., Kirschner K., Schaub M.T., Andrews T., Yiu A., Chandra T., Natarajan K.N., Reik W., Barahona M., Green A.R. et al.. SC3: consensus clustering of single-cell RNA-seq data. Nat. Methods. 2017; 14:483–486. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Wang B., Zhu J., Pierson E., Ramazzotti D., Batzoglou S.. Visualization and analysis of single-cell RNA-seq data by kernel-based similarity learning. Nat. Methods. 2017; 14:414–416. [DOI] [PubMed] [Google Scholar]
  • 22. Satija R., Farrell J.A., Gennert D., Schier A.F., Regev A.. Spatial reconstruction of single-cell gene expression data. Nat. Biotechnol. 2015; 33:495–502. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. Pastushenko I., Blanpain C.. EMT transition states during tumor progression and metastasis. Trends Cell Biol. 2019; 29:212–226. [DOI] [PubMed] [Google Scholar]
  • 24. George J.T., Jolly M.K., Xu S., Somarelli J.A., Levine H.. Survival outcomes in cancer patients predicted by a partial EMT gene expression scoring metric. Cancer Res. 2017; 77:6415–6428. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Jia D., George J., Tripathi S., Kundnani D., Lu M., Hanash S., Onuchic J., Jolly M.K., Levine H.. Testing the gene expression classification of the EMT spectrum. Phys. Biol. 2018; 16:025002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Jia D., Li X., Bocci F., Tripathi S., Deng Y., Jolly M.K., Onuchic J.N., Levine H.. Quantifying cancer epithelial-mesenchymal plasticity and its association with stemness and immune response. J. Clin. Med. 2019; 8:725. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Font-Clos F., Zapperi S., La Porta C.A.M.. Topography of epithelial-mesenchymal plasticity. Proc. Natl. Acad. Sci. U.S.A. 2018; 115:5902–5907. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Zhu L., Lei J., Klei L., Devlin B., Roeder K.. Semisoft clustering of single-cell data. Proc. Natl. Acad. Sci. U.S.A. 2019; 116:466–471. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Guo M., Bao E.L., Wagner M., Whitsett J.A., Xu Y.. SLICE: determining cell differentiation and lineage based on single cell entropy. Nucleic Acids Res. 2016; 45:e54. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Street K., Risso D., Fletcher R.B., Das D., Ngai J., Yosef N., Purdom E., Dudoit S.. Slingshot: cell lineage and pseudotime inference for single-cell transcriptomics. BMC Genomics. 2018; 19:477. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Kuang D., Ding C., Park H.. Symmetric nonnegative matrix factorization for graph clustering. Proceedings of the 2012 SIAM International Conference on Data Mining. 2012; SIAM; 106–117. [Google Scholar]
  • 32. Kuang D., Yun S., Park H.. SymNMF: nonnegative low-rank approximation of a similarity matrix for graph clustering. J. Global. Optim. 2015; 62:545–574. [Google Scholar]
  • 33. Zhu Z.H., Li X., Liu K., Li Q.W.. Dropping symmetry for fast symmetric nonnegative matrix factorization. Advances in Neural Information Processing Systems. 2018; 5154–5164. [Google Scholar]
  • 34. Boutsidis C., Gallopoulos E.. SVD based initialization: A head start for nonnegative matrix factorization. Pattern Recognit. 2008; 41:1350–1362. [Google Scholar]
  • 35. Koren Y. Drawing graphs by eigenvectors: Theory and practice. Comput. Math Appl. 2005; 49:1867–1888. [Google Scholar]
  • 36. Li Q., Hutchins A.P., Chen Y., Li S., Shan Y., Liao B., Zheng D., Shi X., Li Y., Chan W.Y. et al.. A sequential EMT-MET mechanism drives the differentiation of human embryonic stem cells towards hepatocytes. Nat. Commun. 2017; 8:15166. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. Pastushenko I., Brisebarre A., Sifrim A., Fioramonti M., Revenco T., Boumahdi S., Van Keymeulen A., Brown D., Moers V., Lemaire S. et al.. Identification of the tumour transition states occurring during EMT. Nature. 2018; 556:463–468. [DOI] [PubMed] [Google Scholar]
  • 38. Tirosh I., Izar B., Prakadan S.M., Wadsworth M.H., Treacy D., Trombetta J.J., Rotem A., Rodman C., Lian C., Murphy G. et al.. Dissecting the multicellular ecosystem of metastatic melanoma by single-cell RNA-seq. Science. 2016; 352:189–196. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39. Butler A., Hoffman P., Smibert P., Papalexi E., Satija R.. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat. Biotechnol. 2018; 36:411–420. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40. Dong J., Hu Y., Fan X., Wu X., Mao Y., Hu B., Guo H., Wen L., Tang F.. Single-cell RNA-seq analysis unveils a prevalent epithelial/mesenchymal hybrid state during mouse organogenesis. Genome Biol. 2018; 19:31. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41. Puram S.V., Tirosh I., Parikh A.S., Patel A.P., Yizhak K., Gillespie S., Rodman C., Luo C.L., Mroz E.A., Emerick K.S. et al.. Single-cell transcriptomic analysis of primary and metastatic tumor ecosystems in head and neck cancer. Cell. 2017; 171:1611–1624. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42. Herman J.S., Sagar, Grun D.. FateID infers cell fate bias in multipotent progenitors from single-cell RNA-seq data. Nat. Methods. 2018; 15:379–386. [DOI] [PubMed] [Google Scholar]
  • 43. Ashburner M., Ball C.A., Blake J.A., Botstein D., Butler H., Cherry J.M., Davis A.P., Dolinski K., Dwight S.S., Eppig J.T. et al.. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 2000; 25:25–29. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44. The Gene Ontology, C. The Gene Ontology Resource: 20 years and still GOing strong. Nucleic Acids Res. 2019; 47:D330–D338. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45. Mi H., Muruganujan A., Ebert D., Huang X., Thomas P.D.. PANTHER version 14: more genomes, a new PANTHER GO-slim and improvements in enrichment analysis tools. Nucleic Acids Res. 2019; 47:D419–D426. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46. Cao J., Spielmann M., Qiu X., Huang X., Ibrahim D.M., Hill A.J., Zhang F., Mundlos S., Christiansen L., Steemers F.J. et al.. The single-cell transcriptional landscape of mammalian organogenesis. Nature. 2019; 566:496–502. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47. MacLean A.L., Hong T., Nie Q.. Exploring intermediate cell states through the lens of single cells. Curr. Opin. Syst. Biol. 2018; 9:32–41. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48. Liu X., Sun H., Qi J., Wang L., He S., Liu J., Feng C., Chen C., Li W., Guo Y. et al.. Sequential introduction of reprogramming factors reveals a time-sensitive requirement for individual factors and a sequential EMT-MET mechanism for optimal reprogramming. Nat. Cell Biol. 2013; 15:829–838. [DOI] [PubMed] [Google Scholar]
  • 49. Puisieux A., Brabletz T., Caramel J.. Oncogenic roles of EMT-inducing transcription factors. Nat. Cell Biol. 2014; 16:488–494. [DOI] [PubMed] [Google Scholar]
  • 50. Parsana P., Amend S.R., Hernandez J., Pienta K.J., Battle A.. Identifying global expression patterns and key regulators in epithelial to mesenchymal transition through multi-study integration. BMC Cancer. 2017; 17:447. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51. Zhao C., Wang Q., Wang B., Sun Q., He Z., Hong J., Kuehn F., Liu E., Zhang Z.. IGF-1 induces the epithelial-mesenchymal transition via Stat5 in hepatocellular carcinoma. Oncotarget. 2017; 8:111922–111930. [DOI] [PMC free article] [PubMed] [Google Scholar] [Retracted]
  • 52. Wang J.K., Wang W.J., Cai H.Y., Du B.B., Mai P., Zhang L.J., Ma W., Hu Y.G., Feng S.F., Miao G.Y.. MFAP2 promotes epithelial-mesenchymal transition in gastric cancer cells by activating TGF-beta/SMAD2/3 signaling pathway. Onco Targets Ther. 2018; 11:4001–4017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53. Zhang J., Tian X.J., Xing J.. Signal transduction pathways of EMT induced by TGF-beta, SHH, and WNT and their crosstalks. J. Clin. Med. 2016; 5:41. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54. Mojtahedi M., Skupin A., Zhou J., Castano I.G., Leong-Quong R.Y., Chang H., Trachana K., Giuliani A., Huang S.. Cell fate decision as high-dimensional critical state transition. PLoS Biol. 2016; 14:e2000640. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55. Chan T.E., Stumpf M.P.H., Babtie A.C.. Gene regulatory network inference from single-cell data using multivariate information measures. Cell Syst. 2017; 5:251–267. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56. Qiu X., Mao Q., Tang Y., Wang L., Chawla R., Pliner H.A., Trapnell C.. Reversed graph embedding resolves complex single-cell trajectories. Nat. Methods. 2017; 14:979–982. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57. Trapnell C., Cacchiarelli D., Grimsby J., Pokharel P., Li S., Morse M., Lennon N.J., Livak K.J., Mikkelsen T.S., Rinn J.L.. The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat. Biotechnol. 2014; 32:381–386. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58. Welch J.D., Hartemink A.J., Prins J.F.. SLICER: inferring branched, nonlinear cellular trajectories from single cell RNA-seq data. Genome Biol. 2016; 17:106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59. Zheng X., Jin S., Nie Q., Zou X.. scRCMF: Identification of cell subpopulations and transition states from Single-Cell transcriptomes. IEEE Trans. Biomed. Eng. 2020; 67:1418–1428. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60. Kang X., Wang J., Li C.. Exposing the underlying relationship of cancer metastasis to metabolism and epithelial-mesenchymal transitions. iScience. 2019; 21:754–772. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61. Karacosta L.G., Anchang B., Ignatiadis N., Kimmey S.C., Benson J.A., Shrager J.B., Tibshirani R., Bendall S.C., Plevritis S.K.. Mapping lung cancer epithelial-mesenchymal transition states and trajectories with single-cell resolution. Nat. Commun. 2019; 10:5587. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62. Tripathi S., Chakraborty P., Levine H., Jolly M.K.. A mechanism for epithelial-mesenchymal heterogeneity in a population of cancer cells. PLoS Comput. Biol. 2020; 16:e1007619. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63. Jia W., Tripathi S., Chakraborty P., Chedere A., Rangarajan A., Levine H., Jolly M.K.. Epigenetic feedback and stochastic partitioning during cell division can drive resistance to EMT. Oncotarget. 2020; 11:2611–2624. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64. Lovisa S., LeBleu V.S., Tampe B., Sugimoto H., Vadnagara K., Carstens J.L., Wu C.C., Hagos Y., Burckhardt B.C., Pentcheva-Hoang T. et al.. Epithelial-to-mesenchymal transition induces cell cycle arrest and parenchymal damage in renal fibrosis. Nat. Med. 2015; 21:998–1009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65. Xing J., Tian X.J.. Investigating epithelial-to-mesenchymal transition with integrated computational and experimental approaches. Phys. Biol. 2019; 16:031001. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

gkaa725_Supplemental_Files

Data Availability Statement

All the data analyzed in this paper has been previously published and can be accessed from original publications. hESCs (GEO: GSE70741), SCC (GEO: GSE110357), mouse organogenesis (GEO: GSE87038), HNSCC (GEO: GSE103322) and mouse hematopoietic progenitors (GEO: GSE100037) datasets were downloaded from the Gene Expression Omnibus. The code for QuanTC algorithm is available at https://github.com/yutongo/QuanTC, and the simulation code for multi-scale model is available at https://github.com/yutongo/Multiscale-agent-based-model-of-EMT.


Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press

RESOURCES